All of lore.kernel.org
 help / color / mirror / Atom feed
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-10 12:09 ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-11-10 12:09 UTC (permalink / raw)
  To: Ard Biesheuvel, akahiro.akashi-QSEj5FYQhm4dnm+yROfE0A
  Cc: Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland,
	james.morse-5wv7dgnIgG8, Bhupesh SHARMA

Hi Ard, Akashi

I have met an issue on an arm64 board using the latest master branch from Linus.

I think I have a dirty hack to avoid the issue, but would want more
opinions from you as it might break crashkernel dump on other arm64
machines.

1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is
always set in bootargs)

2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk
which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory
(by setting the EFI_MEMORY_WB flag for such efi memory descriptors
thus marking them as System RAM).

3. However this causes crashkernel booting on ACPI only machines to fail:

[    0.039205] ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
[    0.095098] Internal error: Oops: 96000021 [#1] SMP
[    0.100022] Modules linked in:
[    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
[    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
[    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
pstate: 60000045
[    0.132647] sp : ffff000008ccfb40
[    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
[    0.141354] x27: ffff0000088be820 x26: 0000000000000000
[    0.146718] x25: 000000000000001b x24: 0000000000000001
[    0.152083] x23: 0000000000000001 x22: ffff000009710027
[    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162812] x19: 000000000000001b x18: 0000000000000005
[    0.168176] x17: 0000000000000000 x16: 0000000000000000
[    0.173541] x15: 0000000000000000 x14: 000000000000038e
[    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
[    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
[    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
[    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
[    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223224] Call trace:
[    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[    0.232194] fa00: 0000000000000000 ffff000009710027
ffff0000095e3980 ffff000008ccfbe0
[    0.240106] fa20: 0000000000000001 ffff80000fe62c00
ffff000008ccfc50 0000000000000000
[    0.248018] fa40: ffff8000126d0140 000000000000005f
00000000ffffff76 0000000000000006
[    0.255931] fa60: ffffffffffffffff ffffffff00000000
000000000000038e 0000000000000000
[    0.263843] fa80: 0000000000000000 0000000000000000
0000000000000005 000000000000001b
[    0.271754] faa0: 0000000000000001 ffff000008ccfc50
ffff000009710027 0000000000000001
[    0.279667] fac0: 0000000000000001 000000000000001b
0000000000000000 ffff0000088be820
[    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
ffff00000849b4f8 ffff000008ccfb40
[    0.295491] fb00: ffff0000084a6764 0000000060000045
ffff000008ccfb40 ffff000008260a18
[    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
ffff000008ccfb40 ffff0000084a6764
[    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
[    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
[    0.394500] ---[ end trace c46ed37f9651c58e ]---
[    0.399160] Kernel panic - not syncing: Fatal exception
[    0.404437] Rebooting in 10 seconds..

4. On the primary kernel boot, I notice with efi=debug that while the
ACPI regions are properly recognized as Reclaim regions:

...
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
...
[    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

And appear correctly as early memory node ranges:

[    0.000000] Early memory node ranges
...
[    0.000000]   node   0: [mem 0x00000000396c0000-0x000000003975ffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x0000000039770000-0x00000000397affff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398a0000-0x00000000398bffff]

5. However when the crashkernel is boot'ed I see that although the
same regions are recognized as Reclaim regions they do not appear in
the "Early Memory node range" entries:

[  141.348355] Starting crashdump kernel...
[  141.352269] Bye!
...
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
...
[    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

...
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x000000000e800000-0x000000002e7fffff]
[    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
[    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]

( ^^^ No entry for ACPI Reclaim Memory regions)

6a. Also I see that during the primary kernel boot:

'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions.

6b. But during the crashkernel boot, ''acpi_os_ioremap' calls
'ioremap' for the ACPI Reclaim Memory regions and not the _cache
variant.

7. I think this is because of how the memblock regions are mapped in
'arch/arm64/mm/mmu.c':

I am not quite sure if I fully understand the trick we have presently
inside '__init map_mem(pgd_t *pgd)', but reading the comment below:

    /*
     * Take care not to create a writable alias for the
     * read-only text and rodata sections of the kernel image.
     * So temporarily mark them as NOMAP to skip mappings in
     * the following for-loop
     */

I think we are marking only the kernel text and crashkernel regions
with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS

8. Also, I think now the crashkernel handling changed by
e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
memblock regions explicitly in iomem), needs to be changed to handle
the change added by Ard to fix this issue on ACPI only machines.

I have a dirty hack in place, but I would like to have your opinions
about what can be a more concrete fix to this issue (as we mark these
regions as System RAM now rather than NOMAP) and I don't have a DTB
based machine to test on currently.

Please share your views.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-10 12:09 ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-11-10 12:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard, Akashi

I have met an issue on an arm64 board using the latest master branch from Linus.

I think I have a dirty hack to avoid the issue, but would want more
opinions from you as it might break crashkernel dump on other arm64
machines.

1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is
always set in bootargs)

2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk
which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory
(by setting the EFI_MEMORY_WB flag for such efi memory descriptors
thus marking them as System RAM).

3. However this causes crashkernel booting on ACPI only machines to fail:

[    0.039205] ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
[    0.095098] Internal error: Oops: 96000021 [#1] SMP
[    0.100022] Modules linked in:
[    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
[    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
[    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
pstate: 60000045
[    0.132647] sp : ffff000008ccfb40
[    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
[    0.141354] x27: ffff0000088be820 x26: 0000000000000000
[    0.146718] x25: 000000000000001b x24: 0000000000000001
[    0.152083] x23: 0000000000000001 x22: ffff000009710027
[    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162812] x19: 000000000000001b x18: 0000000000000005
[    0.168176] x17: 0000000000000000 x16: 0000000000000000
[    0.173541] x15: 0000000000000000 x14: 000000000000038e
[    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
[    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
[    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
[    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
[    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223224] Call trace:
[    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[    0.232194] fa00: 0000000000000000 ffff000009710027
ffff0000095e3980 ffff000008ccfbe0
[    0.240106] fa20: 0000000000000001 ffff80000fe62c00
ffff000008ccfc50 0000000000000000
[    0.248018] fa40: ffff8000126d0140 000000000000005f
00000000ffffff76 0000000000000006
[    0.255931] fa60: ffffffffffffffff ffffffff00000000
000000000000038e 0000000000000000
[    0.263843] fa80: 0000000000000000 0000000000000000
0000000000000005 000000000000001b
[    0.271754] faa0: 0000000000000001 ffff000008ccfc50
ffff000009710027 0000000000000001
[    0.279667] fac0: 0000000000000001 000000000000001b
0000000000000000 ffff0000088be820
[    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
ffff00000849b4f8 ffff000008ccfb40
[    0.295491] fb00: ffff0000084a6764 0000000060000045
ffff000008ccfb40 ffff000008260a18
[    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
ffff000008ccfb40 ffff0000084a6764
[    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
[    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
[    0.394500] ---[ end trace c46ed37f9651c58e ]---
[    0.399160] Kernel panic - not syncing: Fatal exception
[    0.404437] Rebooting in 10 seconds..

4. On the primary kernel boot, I notice with efi=debug that while the
ACPI regions are properly recognized as Reclaim regions:

...
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
...
[    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

And appear correctly as early memory node ranges:

[    0.000000] Early memory node ranges
...
[    0.000000]   node   0: [mem 0x00000000396c0000-0x000000003975ffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x0000000039770000-0x00000000397affff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398a0000-0x00000000398bffff]

5. However when the crashkernel is boot'ed I see that although the
same regions are recognized as Reclaim regions they do not appear in
the "Early Memory node range" entries:

[  141.348355] Starting crashdump kernel...
[  141.352269] Bye!
...
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
...
[    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

...
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x000000000e800000-0x000000002e7fffff]
[    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
[    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]

( ^^^ No entry for ACPI Reclaim Memory regions)

6a. Also I see that during the primary kernel boot:

'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions.

6b. But during the crashkernel boot, ''acpi_os_ioremap' calls
'ioremap' for the ACPI Reclaim Memory regions and not the _cache
variant.

7. I think this is because of how the memblock regions are mapped in
'arch/arm64/mm/mmu.c':

I am not quite sure if I fully understand the trick we have presently
inside '__init map_mem(pgd_t *pgd)', but reading the comment below:

    /*
     * Take care not to create a writable alias for the
     * read-only text and rodata sections of the kernel image.
     * So temporarily mark them as NOMAP to skip mappings in
     * the following for-loop
     */

I think we are marking only the kernel text and crashkernel regions
with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS

8. Also, I think now the crashkernel handling changed by
e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
memblock regions explicitly in iomem), needs to be changed to handle
the change added by Ard to fix this issue on ACPI only machines.

I have a dirty hack in place, but I would like to have your opinions
about what can be a more concrete fix to this issue (as we mark these
regions as System RAM now rather than NOMAP) and I don't have a DTB
based machine to test on currently.

Please share your views.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-10 12:09 ` Bhupesh Sharma
@ 2017-11-10 12:11     ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-11-10 12:11 UTC (permalink / raw)
  To: Ard Biesheuvel, takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A
  Cc: Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland,
	james.morse-5wv7dgnIgG8, Bhupesh SHARMA

Resent with Akashi's correct email address.

On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Hi Ard, Akashi
>
> I have met an issue on an arm64 board using the latest master branch from Linus.
>
> I think I have a dirty hack to avoid the issue, but would want more
> opinions from you as it might break crashkernel dump on other arm64
> machines.
>
> 1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is
> always set in bootargs)
>
> 2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
> ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk
> which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory
> (by setting the EFI_MEMORY_WB flag for such efi memory descriptors
> thus marking them as System RAM).
>
> 3. However this causes crashkernel booting on ACPI only machines to fail:
>
> [    0.039205] ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> [    0.100022] Modules linked in:
> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> pstate: 60000045
> [    0.132647] sp : ffff000008ccfb40
> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> [    0.223224] Call trace:
> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> [    0.232194] fa00: 0000000000000000 ffff000009710027
> ffff0000095e3980 ffff000008ccfbe0
> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> ffff000008ccfc50 0000000000000000
> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> 00000000ffffff76 0000000000000006
> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> 000000000000038e 0000000000000000
> [    0.263843] fa80: 0000000000000000 0000000000000000
> 0000000000000005 000000000000001b
> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> ffff000009710027 0000000000000001
> [    0.279667] fac0: 0000000000000001 000000000000001b
> 0000000000000000 ffff0000088be820
> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> ffff00000849b4f8 ffff000008ccfb40
> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> ffff000008ccfb40 ffff000008260a18
> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> ffff000008ccfb40 ffff0000084a6764
> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> [    0.399160] Kernel panic - not syncing: Fatal exception
> [    0.404437] Rebooting in 10 seconds..
>
> 4. On the primary kernel boot, I notice with efi=debug that while the
> ACPI regions are properly recognized as Reclaim regions:
>
> ...
> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> ...
> [    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>
> And appear correctly as early memory node ranges:
>
> [    0.000000] Early memory node ranges
> ...
> [    0.000000]   node   0: [mem 0x00000000396c0000-0x000000003975ffff]
> [    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
> [    0.000000]   node   0: [mem 0x0000000039770000-0x00000000397affff]
> [    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
> [    0.000000]   node   0: [mem 0x00000000398a0000-0x00000000398bffff]
>
> 5. However when the crashkernel is boot'ed I see that although the
> same regions are recognized as Reclaim regions they do not appear in
> the "Early Memory node range" entries:
>
> [  141.348355] Starting crashdump kernel...
> [  141.352269] Bye!
> ...
> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> ...
> [    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>
> ...
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x000000000e800000-0x000000002e7fffff]
> [    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
> [    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
> [    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
> [    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
> [    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]
>
> ( ^^^ No entry for ACPI Reclaim Memory regions)
>
> 6a. Also I see that during the primary kernel boot:
>
> 'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions.
>
> 6b. But during the crashkernel boot, ''acpi_os_ioremap' calls
> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> variant.
>
> 7. I think this is because of how the memblock regions are mapped in
> 'arch/arm64/mm/mmu.c':
>
> I am not quite sure if I fully understand the trick we have presently
> inside '__init map_mem(pgd_t *pgd)', but reading the comment below:
>
>     /*
>      * Take care not to create a writable alias for the
>      * read-only text and rodata sections of the kernel image.
>      * So temporarily mark them as NOMAP to skip mappings in
>      * the following for-loop
>      */
>
> I think we are marking only the kernel text and crashkernel regions
> with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS
>
> 8. Also, I think now the crashkernel handling changed by
> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
> memblock regions explicitly in iomem), needs to be changed to handle
> the change added by Ard to fix this issue on ACPI only machines.
>
> I have a dirty hack in place, but I would like to have your opinions
> about what can be a more concrete fix to this issue (as we mark these
> regions as System RAM now rather than NOMAP) and I don't have a DTB
> based machine to test on currently.
>
> Please share your views.
>
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-10 12:11     ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-11-10 12:11 UTC (permalink / raw)
  To: linux-arm-kernel

Resent with Akashi's correct email address.

On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> Hi Ard, Akashi
>
> I have met an issue on an arm64 board using the latest master branch from Linus.
>
> I think I have a dirty hack to avoid the issue, but would want more
> opinions from you as it might break crashkernel dump on other arm64
> machines.
>
> 1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is
> always set in bootargs)
>
> 2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
> ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk
> which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory
> (by setting the EFI_MEMORY_WB flag for such efi memory descriptors
> thus marking them as System RAM).
>
> 3. However this causes crashkernel booting on ACPI only machines to fail:
>
> [    0.039205] ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> [    0.100022] Modules linked in:
> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> pstate: 60000045
> [    0.132647] sp : ffff000008ccfb40
> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> [    0.223224] Call trace:
> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> [    0.232194] fa00: 0000000000000000 ffff000009710027
> ffff0000095e3980 ffff000008ccfbe0
> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> ffff000008ccfc50 0000000000000000
> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> 00000000ffffff76 0000000000000006
> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> 000000000000038e 0000000000000000
> [    0.263843] fa80: 0000000000000000 0000000000000000
> 0000000000000005 000000000000001b
> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> ffff000009710027 0000000000000001
> [    0.279667] fac0: 0000000000000001 000000000000001b
> 0000000000000000 ffff0000088be820
> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> ffff00000849b4f8 ffff000008ccfb40
> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> ffff000008ccfb40 ffff000008260a18
> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> ffff000008ccfb40 ffff0000084a6764
> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> [    0.399160] Kernel panic - not syncing: Fatal exception
> [    0.404437] Rebooting in 10 seconds..
>
> 4. On the primary kernel boot, I notice with efi=debug that while the
> ACPI regions are properly recognized as Reclaim regions:
>
> ...
> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> ...
> [    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>
> And appear correctly as early memory node ranges:
>
> [    0.000000] Early memory node ranges
> ...
> [    0.000000]   node   0: [mem 0x00000000396c0000-0x000000003975ffff]
> [    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
> [    0.000000]   node   0: [mem 0x0000000039770000-0x00000000397affff]
> [    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
> [    0.000000]   node   0: [mem 0x00000000398a0000-0x00000000398bffff]
>
> 5. However when the crashkernel is boot'ed I see that although the
> same regions are recognized as Reclaim regions they do not appear in
> the "Early Memory node range" entries:
>
> [  141.348355] Starting crashdump kernel...
> [  141.352269] Bye!
> ...
> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000039770000-0x0000397affff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> ...
> [    0.000000] efi:   0x0000398a0000-0x0000398bffff [ACPI Reclaim
> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>
> ...
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x000000000e800000-0x000000002e7fffff]
> [    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
> [    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
> [    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
> [    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
> [    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]
>
> ( ^^^ No entry for ACPI Reclaim Memory regions)
>
> 6a. Also I see that during the primary kernel boot:
>
> 'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions.
>
> 6b. But during the crashkernel boot, ''acpi_os_ioremap' calls
> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> variant.
>
> 7. I think this is because of how the memblock regions are mapped in
> 'arch/arm64/mm/mmu.c':
>
> I am not quite sure if I fully understand the trick we have presently
> inside '__init map_mem(pgd_t *pgd)', but reading the comment below:
>
>     /*
>      * Take care not to create a writable alias for the
>      * read-only text and rodata sections of the kernel image.
>      * So temporarily mark them as NOMAP to skip mappings in
>      * the following for-loop
>      */
>
> I think we are marking only the kernel text and crashkernel regions
> with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS
>
> 8. Also, I think now the crashkernel handling changed by
> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
> memblock regions explicitly in iomem), needs to be changed to handle
> the change added by Ard to fix this issue on ACPI only machines.
>
> I have a dirty hack in place, but I would like to have your opinions
> about what can be a more concrete fix to this issue (as we mark these
> regions as System RAM now rather than NOMAP) and I don't have a DTB
> based machine to test on currently.
>
> Please share your views.
>
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-10 12:11     ` Bhupesh Sharma
@ 2017-11-13  9:27         ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-11-13  9:27 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland,
	james.morse-5wv7dgnIgG8, Bhupesh SHARMA

Hi,

On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
> Resent with Akashi's correct email address.
> 
> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Hi Ard, Akashi
> >
> > I have met an issue on an arm64 board using the latest master branch from Linus.
  (snip)
> >
> > 8. Also, I think now the crashkernel handling changed by
> > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
> > memblock regions explicitly in iomem), needs to be changed to handle
> > the change added by Ard to fix this issue on ACPI only machines.
> >
> > I have a dirty hack in place, but I would like to have your opinions
> > about what can be a more concrete fix to this issue (as we mark these
> > regions as System RAM now rather than NOMAP) and I don't have a DTB
> > based machine to test on currently.

I don't know much about acpi reclaim regions,
can you please tell me how your change affects your panic case?

Thanks,
-Takahiro AKASHI


> > Please share your views.
> >
> > Regards,
> > Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-13  9:27         ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-11-13  9:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
> Resent with Akashi's correct email address.
> 
> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> > Hi Ard, Akashi
> >
> > I have met an issue on an arm64 board using the latest master branch from Linus.
  (snip)
> >
> > 8. Also, I think now the crashkernel handling changed by
> > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
> > memblock regions explicitly in iomem), needs to be changed to handle
> > the change added by Ard to fix this issue on ACPI only machines.
> >
> > I have a dirty hack in place, but I would like to have your opinions
> > about what can be a more concrete fix to this issue (as we mark these
> > regions as System RAM now rather than NOMAP) and I don't have a DTB
> > based machine to test on currently.

I don't know much about acpi reclaim regions,
can you please tell me how your change affects your panic case?

Thanks,
-Takahiro AKASHI


> > Please share your views.
> >
> > Regards,
> > Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-13  9:27         ` AKASHI Takahiro
@ 2017-11-14 11:20             ` Ard Biesheuvel
  -1 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-11-14 11:20 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	Bhupesh SHARMA

On 13 November 2017 at 09:27, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Hi,
>
> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
>> Resent with Akashi's correct email address.
>>
>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > Hi Ard, Akashi
>> >
>> > I have met an issue on an arm64 board using the latest master branch from Linus.
>   (snip)
>> >
>> > 8. Also, I think now the crashkernel handling changed by
>> > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
>> > memblock regions explicitly in iomem), needs to be changed to handle
>> > the change added by Ard to fix this issue on ACPI only machines.
>> >
>> > I have a dirty hack in place, but I would like to have your opinions
>> > about what can be a more concrete fix to this issue (as we mark these
>> > regions as System RAM now rather than NOMAP) and I don't have a DTB
>> > based machine to test on currently.
>
> I don't know much about acpi reclaim regions,
> can you please tell me how your change affects your panic case?
>

Does this help at all?

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 7768423b39d3..61d867647cca 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -213,7 +213,7 @@ static void __init request_standard_resources(void)

        for_each_memblock(memory, region) {
                res = alloc_bootmem_low(sizeof(*res));
-               if (memblock_is_nomap(region)) {
+               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
                        res->name  = "reserved";
                        res->flags = IORESOURCE_MEM;
                } else {

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-14 11:20             ` Ard Biesheuvel
  0 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-11-14 11:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 November 2017 at 09:27, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Hi,
>
> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
>> Resent with Akashi's correct email address.
>>
>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>> > Hi Ard, Akashi
>> >
>> > I have met an issue on an arm64 board using the latest master branch from Linus.
>   (snip)
>> >
>> > 8. Also, I think now the crashkernel handling changed by
>> > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
>> > memblock regions explicitly in iomem), needs to be changed to handle
>> > the change added by Ard to fix this issue on ACPI only machines.
>> >
>> > I have a dirty hack in place, but I would like to have your opinions
>> > about what can be a more concrete fix to this issue (as we mark these
>> > regions as System RAM now rather than NOMAP) and I don't have a DTB
>> > based machine to test on currently.
>
> I don't know much about acpi reclaim regions,
> can you please tell me how your change affects your panic case?
>

Does this help at all?

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 7768423b39d3..61d867647cca 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -213,7 +213,7 @@ static void __init request_standard_resources(void)

        for_each_memblock(memory, region) {
                res = alloc_bootmem_low(sizeof(*res));
-               if (memblock_is_nomap(region)) {
+               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
                        res->name  = "reserved";
                        res->flags = IORESOURCE_MEM;
                } else {

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-14 11:20             ` Ard Biesheuvel
@ 2017-11-15 10:58                 ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-11-15 10:58 UTC (permalink / raw)
  To: Ard Biesheuvel, AKASHI Takahiro, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	Bhupesh SHARMA

Hi Ard, Akashi,

On 11/14/2017 04:50 PM, Ard Biesheuvel wrote:
> On 13 November 2017 at 09:27, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> Hi,
>>
>> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
>>> Resent with Akashi's correct email address.
>>>
>>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>> Hi Ard, Akashi
>>>>
>>>> I have met an issue on an arm64 board using the latest master branch from Linus.
>>   (snip)
>>>>
>>>> 8. Also, I think now the crashkernel handling changed by
>>>> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
>>>> memblock regions explicitly in iomem), needs to be changed to handle
>>>> the change added by Ard to fix this issue on ACPI only machines.
>>>>
>>>> I have a dirty hack in place, but I would like to have your opinions
>>>> about what can be a more concrete fix to this issue (as we mark these
>>>> regions as System RAM now rather than NOMAP) and I don't have a DTB
>>>> based machine to test on currently.
>>
>> I don't know much about acpi reclaim regions,
>> can you please tell me how your change affects your panic case?

Sorry I was away yesterday and couldn't get back with the dirty hack 
details. But I see Ard has already proposed the following change and it 
looks similar to the change I did locally however that doesn't seem to 
fix the issue completely at my end so far.

Here are more details on the same ..

>
> Does this help at all?
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 7768423b39d3..61d867647cca 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -213,7 +213,7 @@ static void __init request_standard_resources(void)
>
>         for_each_memblock(memory, region) {
>                 res = alloc_bootmem_low(sizeof(*res));
> -               if (memblock_is_nomap(region)) {
> +               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
>                         res->name  = "reserved";
>                         res->flags = IORESOURCE_MEM;
>                 } else {
>

.. So, I tried using the 'memblock_is_reserved' check in ' 
request_standard_resources' however as 'memblock_is_reserved' expects a 
phy_addr as an input argument, I changed mine to something like this:

-               if (memblock_is_nomap(region)) {
+               if (memblock_is_nomap(region) || 
memblock_is_reserved(__pfn_to_phys(memblock_region_reserved_base_pfn(region)))) 
{

However, I see I am hitting a still hitting the issue and its quite 
peculiar one. First some more background on what is happening on this
Huawei Taishan arm64 board that I have:

1a. I see from the boot logs that one of the ACPI tables (DSDT) is at 
phy addr 0x39710000:

# dmesg | grep -i "DSDT"
[    0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI   HIP07 
00000000 INTL 20151124)

1b. This DSDT table is correctly marked as a ACPI Reclaim memory, 
however I see that just preceding this entry there also is a 'Boot Code' 
entry from address '0x0000396c0000-0x00003970ffff':

# dmesg | grep -B 2 -i "ACPI reclaim"
[    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code 
|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code 
|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim 
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

2. Now, I am not sure which kernel layer does the following changes (I 
am still trying to dig it out more), but I see that the 'Boot Code' and 
ACPI DSDT table regions are somehow merged into one memblock_region and 
appear as range '396c0000-3975ffff' in the '/proc/iomem' interface:

# cat /proc/iomem | grep -A 2 -B 2 39
00000000-3961ffff : System RAM
   00080000-00b6ffff : Kernel code
   00cb0000-0167ffff : Kernel data
   0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : System RAM
39760000-3976ffff : reserved
39770000-397affff : reserved
397b0000-3989ffff : reserved
398a0000-398bffff : reserved
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM

3. As to why this merged region appears as a System RAM area, rather 
than a RESERVED one, the following code path explains the same:

3a. The check we added in 'arch/arm64/kernel/setup.c' doesn't handle the 
ACPI DSDT table properly and mark it as 'RESERVED'. This is because 
'memblock_is_reserved' calls 'memblock_search' internally which is 
implemented currently as:

static int __init_memblock memblock_search(struct memblock_type *type, 
phys_addr_t addr)
{
	unsigned int left = 0, right = type->cnt;

	do {
		unsigned int mid = (right + left) / 2;

		if (addr < type->regions[mid].base)
			right = mid;
		else if (addr >= (type->regions[mid].base +
				  type->regions[mid].size))
			left = mid + 1;
		else
			return mid;
	} while (left < right);
	return -1;
}

3b. Since 'addr' being passed to 'memblock_search' calculated via 
'__pfn_to__phys(memblock_region_memory_base_pfn(region)' in this case is 
0x396c0000 (see iomem entry in point 2 above), so we never see that
this memblock is reserved for the ACPI DSDT entry at 0x39710000.

4. Now, when we run the kexec-tools to load a crashdump kernel, it 
doesn't find an entry for the ACPI DSDT table in the reserved range (but 
instead finds it as a System RAM range):

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname 
-r`.img --reuse-cmdline -d

...
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 00000000396c0000 - 000000003975ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 0000000039770000 - 00000000397affff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398a0000 - 00000000398bffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
..

5. Now when a crash is issued to boot the crashkernel, we see it panic 
while trying to access the acpi tables (note that the logs below have 
been snipped for clarity):

# echo c > /proc/sysrq-trigger

...
[  419.495621] Bye!
...
[    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code 
|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim 
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
...
[    0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI   HIP07 
00000000 INTL 20151124)
...
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000010200000-0x00000000301fffff]
[    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
[    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]
...
[    0.039309] ACPI: Core revision 20170728
[    0.044383] Unable to handle kernel paging request at virtual address 
ffff000009f10027
[    0.052386] Mem abort info:
[    0.055201]   Exception class = DABT (current EL), IL = 32 bits
[    0.061179]   SET = 0, FnV = 0
[    0.064258]   EA = 0, S1PTW = 0
[    0.067424] Data abort info:
[    0.070326]   ISV = 0, ISS = 0x00000021
[    0.074195]   CM = 0, WnR = 0
[    0.077187] swapper pgtable: 64k pages, 48-bit VAs, pgd = 
ffff000009650000
[    0.084133] [ffff000009f10027] *pgd=00000000301d0003, 
*pud=00000000301d0003, *pmd=00000000301c0003, *pte=00e8000039710707
[    0.095215] Internal error: Oops: 96000021 [#1] SMP
[    0.100139] Modules linked in:
[    0.103219] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0+ #30
[    0.109373] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.115356] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119929] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[    0.125117] pc : [<ffff0000084a862c>] lr : [<ffff00000849d3c0>] 
pstate: 60000045
[    0.132589] sp : ffff000008ccfb40
[    0.135930] x29: ffff000008ccfb40 x28: ffff000008a9c18c
[    0.141295] x27: ffff0000088be820 x26: 0000000000000000
[    0.146659] x25: 000000000000001b x24: 0000000000000001
[    0.152024] x23: 0000000000000001 x22: ffff000009f10027
[    0.157389] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162753] x19: 000000000000001b x18: 0000000000000005
[    0.168117] x17: 0000000000000000 x16: 0000000000000000
[    0.173481] x15: 0000000000000000 x14: 000000000000038e
[    0.178846] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184210] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189574] x9 : 000000000000005f x8 : ffff800014670140
[    0.194939] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200303] x5 : ffff800012d45000 x4 : 0000000000000001
[    0.205668] x3 : ffff000008ccfbe0 x2 : ffff0000095e3a00
[    0.211032] x1 : ffff000009f10027 x0 : 0000000000000000
[    0.216397] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223166] Call trace:
[    0.225629] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[    0.232136] fa00: 0000000000000000 ffff000009f10027 ffff0000095e3a00 
ffff000008ccfbe0
[    0.240048] fa20: 0000000000000001 ffff800012d45000 ffff000008ccfc50 
0000000000000000
[    0.247960] fa40: ffff800014670140 000000000000005f 00000000ffffff76 
0000000000000006
[    0.255872] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 
0000000000000000
[    0.263785] fa80: 0000000000000000 0000000000000000 0000000000000005 
000000000000001b
[    0.271697] faa0: 0000000000000001 ffff000008ccfc50 ffff000009f10027 
0000000000000001
[    0.279609] fac0: 0000000000000001 000000000000001b 0000000000000000 
ffff0000088be820
[    0.287521] fae0: ffff000008a9c18c ffff000008ccfb40 ffff00000849d3c0 
ffff000008ccfb40
[    0.295433] fb00: ffff0000084a862c 0000000060000045 ffff000008ccfb40 
ffff000008261918
[    0.303345] fb20: ffffffffffffffff ffff0000087f193c ffff000008ccfb40 
ffff0000084a862c
[    0.311258] [<ffff0000084a862c>] acpi_ns_lookup+0x25c/0x3c0
[    0.316885] [<ffff00000849d3c0>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323128] [<ffff0000084af374>] acpi_ps_build_named_op+0xc4/0x198
[    0.329371] [<ffff0000084af594>] acpi_ps_create_op+0x14c/0x270
[    0.335262] [<ffff0000084aee70>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341241] [<ffff0000084aff10>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347044] [<ffff0000084aacd8>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353726] [<ffff0000084aad60>] acpi_ns_parse_table+0x48/0x68
[    0.359616] [<ffff0000084aa194>] acpi_ns_load_table+0x4c/0xdc
[    0.365420] [<ffff0000084b51c0>] acpi_tb_load_namespace+0xe4/0x264
[    0.371664] [<ffff000008bafd64>] acpi_load_tables+0x48/0xc0
[    0.377292] [<ffff000008badfd0>] acpi_early_init+0x9c/0xd0
[    0.382832] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c

So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT 
table' ranges to be merged into a single region at 
'0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using 
'memblock_is_reserved'.

Any pointers?

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-15 10:58                 ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-11-15 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard, Akashi,

On 11/14/2017 04:50 PM, Ard Biesheuvel wrote:
> On 13 November 2017 at 09:27, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> Hi,
>>
>> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote:
>>> Resent with Akashi's correct email address.
>>>
>>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>>>> Hi Ard, Akashi
>>>>
>>>> I have met an issue on an arm64 board using the latest master branch from Linus.
>>   (snip)
>>>>
>>>> 8. Also, I think now the crashkernel handling changed by
>>>> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved
>>>> memblock regions explicitly in iomem), needs to be changed to handle
>>>> the change added by Ard to fix this issue on ACPI only machines.
>>>>
>>>> I have a dirty hack in place, but I would like to have your opinions
>>>> about what can be a more concrete fix to this issue (as we mark these
>>>> regions as System RAM now rather than NOMAP) and I don't have a DTB
>>>> based machine to test on currently.
>>
>> I don't know much about acpi reclaim regions,
>> can you please tell me how your change affects your panic case?

Sorry I was away yesterday and couldn't get back with the dirty hack 
details. But I see Ard has already proposed the following change and it 
looks similar to the change I did locally however that doesn't seem to 
fix the issue completely at my end so far.

Here are more details on the same ..

>
> Does this help at all?
>
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 7768423b39d3..61d867647cca 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -213,7 +213,7 @@ static void __init request_standard_resources(void)
>
>         for_each_memblock(memory, region) {
>                 res = alloc_bootmem_low(sizeof(*res));
> -               if (memblock_is_nomap(region)) {
> +               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
>                         res->name  = "reserved";
>                         res->flags = IORESOURCE_MEM;
>                 } else {
>

.. So, I tried using the 'memblock_is_reserved' check in ' 
request_standard_resources' however as 'memblock_is_reserved' expects a 
phy_addr as an input argument, I changed mine to something like this:

-               if (memblock_is_nomap(region)) {
+               if (memblock_is_nomap(region) || 
memblock_is_reserved(__pfn_to_phys(memblock_region_reserved_base_pfn(region)))) 
{

However, I see I am hitting a still hitting the issue and its quite 
peculiar one. First some more background on what is happening on this
Huawei Taishan arm64 board that I have:

1a. I see from the boot logs that one of the ACPI tables (DSDT) is at 
phy addr 0x39710000:

# dmesg | grep -i "DSDT"
[    0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI   HIP07 
00000000 INTL 20151124)

1b. This DSDT table is correctly marked as a ACPI Reclaim memory, 
however I see that just preceding this entry there also is a 'Boot Code' 
entry from address '0x0000396c0000-0x00003970ffff':

# dmesg | grep -B 2 -i "ACPI reclaim"
[    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code 
|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code 
|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim 
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]

2. Now, I am not sure which kernel layer does the following changes (I 
am still trying to dig it out more), but I see that the 'Boot Code' and 
ACPI DSDT table regions are somehow merged into one memblock_region and 
appear as range '396c0000-3975ffff' in the '/proc/iomem' interface:

# cat /proc/iomem | grep -A 2 -B 2 39
00000000-3961ffff : System RAM
   00080000-00b6ffff : Kernel code
   00cb0000-0167ffff : Kernel data
   0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : System RAM
39760000-3976ffff : reserved
39770000-397affff : reserved
397b0000-3989ffff : reserved
398a0000-398bffff : reserved
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM

3. As to why this merged region appears as a System RAM area, rather 
than a RESERVED one, the following code path explains the same:

3a. The check we added in 'arch/arm64/kernel/setup.c' doesn't handle the 
ACPI DSDT table properly and mark it as 'RESERVED'. This is because 
'memblock_is_reserved' calls 'memblock_search' internally which is 
implemented currently as:

static int __init_memblock memblock_search(struct memblock_type *type, 
phys_addr_t addr)
{
	unsigned int left = 0, right = type->cnt;

	do {
		unsigned int mid = (right + left) / 2;

		if (addr < type->regions[mid].base)
			right = mid;
		else if (addr >= (type->regions[mid].base +
				  type->regions[mid].size))
			left = mid + 1;
		else
			return mid;
	} while (left < right);
	return -1;
}

3b. Since 'addr' being passed to 'memblock_search' calculated via 
'__pfn_to__phys(memblock_region_memory_base_pfn(region)' in this case is 
0x396c0000 (see iomem entry in point 2 above), so we never see that
this memblock is reserved for the ACPI DSDT entry@0x39710000.

4. Now, when we run the kexec-tools to load a crashdump kernel, it 
doesn't find an entry for the ACPI DSDT table in the reserved range (but 
instead finds it as a System RAM range):

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname 
-r`.img --reuse-cmdline -d

...
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 00000000396c0000 - 000000003975ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 0000000039770000 - 00000000397affff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398a0000 - 00000000398bffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
..

5. Now when a crash is issued to boot the crashkernel, we see it panic 
while trying to access the acpi tables (note that the logs below have 
been snipped for clarity):

# echo c > /proc/sysrq-trigger

...
[  419.495621] Bye!
...
[    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code 
|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim 
Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
...
[    0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI   HIP07 
00000000 INTL 20151124)
...
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000010200000-0x00000000301fffff]
[    0.000000]   node   0: [mem 0x0000000039620000-0x00000000396bffff]
[    0.000000]   node   0: [mem 0x0000000039760000-0x000000003976ffff]
[    0.000000]   node   0: [mem 0x00000000397b0000-0x000000003989ffff]
[    0.000000]   node   0: [mem 0x00000000398c0000-0x0000000039d3ffff]
[    0.000000]   node   0: [mem 0x000000003ed30000-0x000000003ed5ffff]
...
[    0.039309] ACPI: Core revision 20170728
[    0.044383] Unable to handle kernel paging request at virtual address 
ffff000009f10027
[    0.052386] Mem abort info:
[    0.055201]   Exception class = DABT (current EL), IL = 32 bits
[    0.061179]   SET = 0, FnV = 0
[    0.064258]   EA = 0, S1PTW = 0
[    0.067424] Data abort info:
[    0.070326]   ISV = 0, ISS = 0x00000021
[    0.074195]   CM = 0, WnR = 0
[    0.077187] swapper pgtable: 64k pages, 48-bit VAs, pgd = 
ffff000009650000
[    0.084133] [ffff000009f10027] *pgd=00000000301d0003, 
*pud=00000000301d0003, *pmd=00000000301c0003, *pte=00e8000039710707
[    0.095215] Internal error: Oops: 96000021 [#1] SMP
[    0.100139] Modules linked in:
[    0.103219] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0+ #30
[    0.109373] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.115356] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119929] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[    0.125117] pc : [<ffff0000084a862c>] lr : [<ffff00000849d3c0>] 
pstate: 60000045
[    0.132589] sp : ffff000008ccfb40
[    0.135930] x29: ffff000008ccfb40 x28: ffff000008a9c18c
[    0.141295] x27: ffff0000088be820 x26: 0000000000000000
[    0.146659] x25: 000000000000001b x24: 0000000000000001
[    0.152024] x23: 0000000000000001 x22: ffff000009f10027
[    0.157389] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162753] x19: 000000000000001b x18: 0000000000000005
[    0.168117] x17: 0000000000000000 x16: 0000000000000000
[    0.173481] x15: 0000000000000000 x14: 000000000000038e
[    0.178846] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184210] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189574] x9 : 000000000000005f x8 : ffff800014670140
[    0.194939] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200303] x5 : ffff800012d45000 x4 : 0000000000000001
[    0.205668] x3 : ffff000008ccfbe0 x2 : ffff0000095e3a00
[    0.211032] x1 : ffff000009f10027 x0 : 0000000000000000
[    0.216397] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223166] Call trace:
[    0.225629] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[    0.232136] fa00: 0000000000000000 ffff000009f10027 ffff0000095e3a00 
ffff000008ccfbe0
[    0.240048] fa20: 0000000000000001 ffff800012d45000 ffff000008ccfc50 
0000000000000000
[    0.247960] fa40: ffff800014670140 000000000000005f 00000000ffffff76 
0000000000000006
[    0.255872] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 
0000000000000000
[    0.263785] fa80: 0000000000000000 0000000000000000 0000000000000005 
000000000000001b
[    0.271697] faa0: 0000000000000001 ffff000008ccfc50 ffff000009f10027 
0000000000000001
[    0.279609] fac0: 0000000000000001 000000000000001b 0000000000000000 
ffff0000088be820
[    0.287521] fae0: ffff000008a9c18c ffff000008ccfb40 ffff00000849d3c0 
ffff000008ccfb40
[    0.295433] fb00: ffff0000084a862c 0000000060000045 ffff000008ccfb40 
ffff000008261918
[    0.303345] fb20: ffffffffffffffff ffff0000087f193c ffff000008ccfb40 
ffff0000084a862c
[    0.311258] [<ffff0000084a862c>] acpi_ns_lookup+0x25c/0x3c0
[    0.316885] [<ffff00000849d3c0>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323128] [<ffff0000084af374>] acpi_ps_build_named_op+0xc4/0x198
[    0.329371] [<ffff0000084af594>] acpi_ps_create_op+0x14c/0x270
[    0.335262] [<ffff0000084aee70>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341241] [<ffff0000084aff10>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347044] [<ffff0000084aacd8>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353726] [<ffff0000084aad60>] acpi_ns_parse_table+0x48/0x68
[    0.359616] [<ffff0000084aa194>] acpi_ns_load_table+0x4c/0xdc
[    0.365420] [<ffff0000084b51c0>] acpi_tb_load_namespace+0xe4/0x264
[    0.371664] [<ffff000008bafd64>] acpi_load_tables+0x48/0xc0
[    0.377292] [<ffff000008badfd0>] acpi_early_init+0x9c/0xd0
[    0.382832] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c

So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT 
table' ranges to be merged into a single region at 
'0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using 
'memblock_is_reserved'.

Any pointers?

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-15 10:58                 ` Bhupesh Sharma
@ 2017-11-16  7:00                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-11-16  7:00 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	Bhupesh SHARMA

Bhupesh,

On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
> 
  (snip)

> # dmesg | grep -B 2 -i "ACPI reclaim"
> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
> |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
> |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
> |  |  |  |  |  |  |   |WB|WT|WC|UC]
> 
> 2. Now, I am not sure which kernel layer does the following changes (I am
> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
> DSDT table regions are somehow merged into one memblock_region and appear as
> range '396c0000-3975ffff' in the '/proc/iomem' interface:
> 
> # cat /proc/iomem | grep -A 2 -B 2 39
> 00000000-3961ffff : System RAM
>   00080000-00b6ffff : Kernel code
>   00cb0000-0167ffff : Kernel data
>   0e800000-2e7fffff : Crash kernel
> 39620000-396bffff : reserved
> 396c0000-3975ffff : System RAM
> 39760000-3976ffff : reserved
> 39770000-397affff : reserved
> 397b0000-3989ffff : reserved
> 398a0000-398bffff : reserved
> 398c0000-39d3ffff : reserved
> 39d40000-3ed2ffff : System RAM
> 
  (snip)
> 
> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
> table' ranges to be merged into a single region at
> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
> 'memblock_is_reserved'.

Simple:) The short answer is that memblock_add() does.

The long answer:
First, please note that memblock maintains two type of regions list,
"memory" and "reserved".

efi_init()
    reserve_regions()
        early_init_dt_add_memory_arch()
            memblock_add()
                memblock_add_range(memblock.memory)

The memory regions described in efi.memmap are added to "memory" list
with all the neighboring regions being merged into ones,
in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.

The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
reserve_regions(), which creates an isolated region since it now has
a different attribute.
Consequently only "Boot Code" and "ACPI Reclaim Memory" are
unified.

Look at request_standard_resources(). It handles only "memory" list,
and doesn't care about whether any arbitrary part of memory is in
"reserved" list or not.

Thanks,
-Takahiro AKASHI

> 
> Any pointers?
> 
> Regards,
> Bhupesh
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-16  7:00                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-11-16  7:00 UTC (permalink / raw)
  To: linux-arm-kernel

Bhupesh,

On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
> 
  (snip)

> # dmesg | grep -B 2 -i "ACPI reclaim"
> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
> |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
> |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
> |  |  |  |  |  |  |   |WB|WT|WC|UC]
> 
> 2. Now, I am not sure which kernel layer does the following changes (I am
> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
> DSDT table regions are somehow merged into one memblock_region and appear as
> range '396c0000-3975ffff' in the '/proc/iomem' interface:
> 
> # cat /proc/iomem | grep -A 2 -B 2 39
> 00000000-3961ffff : System RAM
>   00080000-00b6ffff : Kernel code
>   00cb0000-0167ffff : Kernel data
>   0e800000-2e7fffff : Crash kernel
> 39620000-396bffff : reserved
> 396c0000-3975ffff : System RAM
> 39760000-3976ffff : reserved
> 39770000-397affff : reserved
> 397b0000-3989ffff : reserved
> 398a0000-398bffff : reserved
> 398c0000-39d3ffff : reserved
> 39d40000-3ed2ffff : System RAM
> 
  (snip)
> 
> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
> table' ranges to be merged into a single region at
> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
> 'memblock_is_reserved'.

Simple:) The short answer is that memblock_add() does.

The long answer:
First, please note that memblock maintains two type of regions list,
"memory" and "reserved".

efi_init()
    reserve_regions()
        early_init_dt_add_memory_arch()
            memblock_add()
                memblock_add_range(memblock.memory)

The memory regions described in efi.memmap are added to "memory" list
with all the neighboring regions being merged into ones,
in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.

The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
reserve_regions(), which creates an isolated region since it now has
a different attribute.
Consequently only "Boot Code" and "ACPI Reclaim Memory" are
unified.

Look at request_standard_resources(). It handles only "memory" list,
and doesn't care about whether any arbitrary part of memory is in
"reserved" list or not.

Thanks,
-Takahiro AKASHI

> 
> Any pointers?
> 
> Regards,
> Bhupesh
> 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-15 10:58                 ` Bhupesh Sharma
@ 2017-11-24  8:47                     ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-11-24  8:47 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, AKASHI Takahiro, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	Bhupesh SHARMA

[snip]
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 7768423b39d3..61d867647cca 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void)
> > 
> >         for_each_memblock(memory, region) {
> >                 res = alloc_bootmem_low(sizeof(*res));
> > -               if (memblock_is_nomap(region)) {
> > +               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
> >                         res->name  = "reserved";
> >                         res->flags = IORESOURCE_MEM;
> >                 } else {
> > 
> 

Bhupesh, does insert resource work in efi_init/reserve_regions()?

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-24  8:47                     ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-11-24  8:47 UTC (permalink / raw)
  To: linux-arm-kernel

[snip]
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 7768423b39d3..61d867647cca 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void)
> > 
> >         for_each_memblock(memory, region) {
> >                 res = alloc_bootmem_low(sizeof(*res));
> > -               if (memblock_is_nomap(region)) {
> > +               if (memblock_is_nomap(region) || memblock_is_reserved(region)) {
> >                         res->name  = "reserved";
> >                         res->flags = IORESOURCE_MEM;
> >                 } else {
> > 
> 

Bhupesh, does insert resource work in efi_init/reserve_regions()?

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-16  7:00                     ` AKASHI Takahiro
@ 2017-11-26  8:29                         ` Bhupesh SHARMA
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-11-26  8:29 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	Bhupesh SHARMA

Hi Akashi,

On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Bhupesh,
>
> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>
>   (snip)
>
>> # dmesg | grep -B 2 -i "ACPI reclaim"
>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>> |  |  |  |  |   |WB|WT|WC|UC]
>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>> |  |  |  |   |WB|WT|WC|UC]
>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>
>> 2. Now, I am not sure which kernel layer does the following changes (I am
>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>> DSDT table regions are somehow merged into one memblock_region and appear as
>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>
>> # cat /proc/iomem | grep -A 2 -B 2 39
>> 00000000-3961ffff : System RAM
>>   00080000-00b6ffff : Kernel code
>>   00cb0000-0167ffff : Kernel data
>>   0e800000-2e7fffff : Crash kernel
>> 39620000-396bffff : reserved
>> 396c0000-3975ffff : System RAM
>> 39760000-3976ffff : reserved
>> 39770000-397affff : reserved
>> 397b0000-3989ffff : reserved
>> 398a0000-398bffff : reserved
>> 398c0000-39d3ffff : reserved
>> 39d40000-3ed2ffff : System RAM
>>
>   (snip)
>>
>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>> table' ranges to be merged into a single region at
>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>> 'memblock_is_reserved'.
>
> Simple:) The short answer is that memblock_add() does.
>
> The long answer:
> First, please note that memblock maintains two type of regions list,
> "memory" and "reserved".
>
> efi_init()
>     reserve_regions()
>         early_init_dt_add_memory_arch()
>             memblock_add()
>                 memblock_add_range(memblock.memory)
>
> The memory regions described in efi.memmap are added to "memory" list
> with all the neighboring regions being merged into ones,
> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>
> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
> reserve_regions(), which creates an isolated region since it now has
> a different attribute.
> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
> unified.
>
> Look at request_standard_resources(). It handles only "memory" list,
> and doesn't care about whether any arbitrary part of memory is in
> "reserved" list or not.

Thanks for the pointers. Now I did some experiments and traversed the
whole memblock path and I see
how these two regions get merged into a single region which is later
on recognized by
'request_standard_resources()' as a System RAM region rather than a
RESERVED region.

I recently reproduced this on a APM mustang with latest kernel as well
when acpi is used to boot the machine, which makes me believe that
this is a generic issue for arm64 machines with the 4.14 kernel and if
they use acpi=force as the boot method.

I am not sure, if a fix/or hack would be suitable for all underlying
arm64 machines, but I am trying one on the arm64 machines I have to
see if it fixes the issue.

@Ard:

Hi Ard,

I think to create and test a clean solution for all arm64 boards it
will take some time, in the meantime should we consider reverting the
commit [1] to make sure that acpi enabled arm64 machines can boot with
4.14?

Please let me know your opinion.

[1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
ACPI reclaim memory as MEMBLOCK_NOMAP)

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-11-26  8:29                         ` Bhupesh SHARMA
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-11-26  8:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>
>   (snip)
>
>> # dmesg | grep -B 2 -i "ACPI reclaim"
>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>> |  |  |  |  |   |WB|WT|WC|UC]
>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>> |  |  |  |   |WB|WT|WC|UC]
>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>
>> 2. Now, I am not sure which kernel layer does the following changes (I am
>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>> DSDT table regions are somehow merged into one memblock_region and appear as
>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>
>> # cat /proc/iomem | grep -A 2 -B 2 39
>> 00000000-3961ffff : System RAM
>>   00080000-00b6ffff : Kernel code
>>   00cb0000-0167ffff : Kernel data
>>   0e800000-2e7fffff : Crash kernel
>> 39620000-396bffff : reserved
>> 396c0000-3975ffff : System RAM
>> 39760000-3976ffff : reserved
>> 39770000-397affff : reserved
>> 397b0000-3989ffff : reserved
>> 398a0000-398bffff : reserved
>> 398c0000-39d3ffff : reserved
>> 39d40000-3ed2ffff : System RAM
>>
>   (snip)
>>
>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>> table' ranges to be merged into a single region at
>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>> 'memblock_is_reserved'.
>
> Simple:) The short answer is that memblock_add() does.
>
> The long answer:
> First, please note that memblock maintains two type of regions list,
> "memory" and "reserved".
>
> efi_init()
>     reserve_regions()
>         early_init_dt_add_memory_arch()
>             memblock_add()
>                 memblock_add_range(memblock.memory)
>
> The memory regions described in efi.memmap are added to "memory" list
> with all the neighboring regions being merged into ones,
> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>
> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
> reserve_regions(), which creates an isolated region since it now has
> a different attribute.
> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
> unified.
>
> Look at request_standard_resources(). It handles only "memory" list,
> and doesn't care about whether any arbitrary part of memory is in
> "reserved" list or not.

Thanks for the pointers. Now I did some experiments and traversed the
whole memblock path and I see
how these two regions get merged into a single region which is later
on recognized by
'request_standard_resources()' as a System RAM region rather than a
RESERVED region.

I recently reproduced this on a APM mustang with latest kernel as well
when acpi is used to boot the machine, which makes me believe that
this is a generic issue for arm64 machines with the 4.14 kernel and if
they use acpi=force as the boot method.

I am not sure, if a fix/or hack would be suitable for all underlying
arm64 machines, but I am trying one on the arm64 machines I have to
see if it fixes the issue.

@Ard:

Hi Ard,

I think to create and test a clean solution for all arm64 boards it
will take some time, in the meantime should we consider reverting the
commit [1] to make sure that acpi enabled arm64 machines can boot with
4.14?

Please let me know your opinion.

[1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
ACPI reclaim memory as MEMBLOCK_NOMAP)

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-11-26  8:29                         ` Bhupesh SHARMA
@ 2017-12-04 14:02                             ` Ard Biesheuvel
  -1 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-04 14:02 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: AKASHI Takahiro, Bhupesh Sharma, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse

On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Akashi,
>
> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> Bhupesh,
>>
>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>>
>>   (snip)
>>
>>> # dmesg | grep -B 2 -i "ACPI reclaim"
>>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>>> |  |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>>> |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>>
>>> 2. Now, I am not sure which kernel layer does the following changes (I am
>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>>> DSDT table regions are somehow merged into one memblock_region and appear as
>>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>>
>>> # cat /proc/iomem | grep -A 2 -B 2 39
>>> 00000000-3961ffff : System RAM
>>>   00080000-00b6ffff : Kernel code
>>>   00cb0000-0167ffff : Kernel data
>>>   0e800000-2e7fffff : Crash kernel
>>> 39620000-396bffff : reserved
>>> 396c0000-3975ffff : System RAM
>>> 39760000-3976ffff : reserved
>>> 39770000-397affff : reserved
>>> 397b0000-3989ffff : reserved
>>> 398a0000-398bffff : reserved
>>> 398c0000-39d3ffff : reserved
>>> 39d40000-3ed2ffff : System RAM
>>>
>>   (snip)
>>>
>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>>> table' ranges to be merged into a single region at
>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>>> 'memblock_is_reserved'.
>>
>> Simple:) The short answer is that memblock_add() does.
>>
>> The long answer:
>> First, please note that memblock maintains two type of regions list,
>> "memory" and "reserved".
>>
>> efi_init()
>>     reserve_regions()
>>         early_init_dt_add_memory_arch()
>>             memblock_add()
>>                 memblock_add_range(memblock.memory)
>>
>> The memory regions described in efi.memmap are added to "memory" list
>> with all the neighboring regions being merged into ones,
>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>>
>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
>> reserve_regions(), which creates an isolated region since it now has
>> a different attribute.
>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
>> unified.
>>
>> Look at request_standard_resources(). It handles only "memory" list,
>> and doesn't care about whether any arbitrary part of memory is in
>> "reserved" list or not.
>
> Thanks for the pointers. Now I did some experiments and traversed the
> whole memblock path and I see
> how these two regions get merged into a single region which is later
> on recognized by
> 'request_standard_resources()' as a System RAM region rather than a
> RESERVED region.
>
> I recently reproduced this on a APM mustang with latest kernel as well
> when acpi is used to boot the machine, which makes me believe that
> this is a generic issue for arm64 machines with the 4.14 kernel and if
> they use acpi=force as the boot method.
>
> I am not sure, if a fix/or hack would be suitable for all underlying
> arm64 machines, but I am trying one on the arm64 machines I have to
> see if it fixes the issue.
>
> @Ard:
>
> Hi Ard,
>
> I think to create and test a clean solution for all arm64 boards it
> will take some time, in the meantime should we consider reverting the
> commit [1] to make sure that acpi enabled arm64 machines can boot with
> 4.14?
>
> Please let me know your opinion.
>
> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
> ACPI reclaim memory as MEMBLOCK_NOMAP)
>

I don't think that is really going to help tbh.

ACPI reclaim regions are not the only regions that are
memblock_reserve()d and need to be reserved by the incoming kernel as
well. So as far as I can tell, this is a symptom of an underlying
issue that we will need to solve, and reverting the code that exposed
it will not make the bug go away.

-- 
Ard.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-04 14:02                             ` Ard Biesheuvel
  0 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-04 14:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux@gmail.com> wrote:
> Hi Akashi,
>
> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> Bhupesh,
>>
>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>>
>>   (snip)
>>
>>> # dmesg | grep -B 2 -i "ACPI reclaim"
>>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>>> |  |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>>> |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>>
>>> 2. Now, I am not sure which kernel layer does the following changes (I am
>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>>> DSDT table regions are somehow merged into one memblock_region and appear as
>>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>>
>>> # cat /proc/iomem | grep -A 2 -B 2 39
>>> 00000000-3961ffff : System RAM
>>>   00080000-00b6ffff : Kernel code
>>>   00cb0000-0167ffff : Kernel data
>>>   0e800000-2e7fffff : Crash kernel
>>> 39620000-396bffff : reserved
>>> 396c0000-3975ffff : System RAM
>>> 39760000-3976ffff : reserved
>>> 39770000-397affff : reserved
>>> 397b0000-3989ffff : reserved
>>> 398a0000-398bffff : reserved
>>> 398c0000-39d3ffff : reserved
>>> 39d40000-3ed2ffff : System RAM
>>>
>>   (snip)
>>>
>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>>> table' ranges to be merged into a single region at
>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>>> 'memblock_is_reserved'.
>>
>> Simple:) The short answer is that memblock_add() does.
>>
>> The long answer:
>> First, please note that memblock maintains two type of regions list,
>> "memory" and "reserved".
>>
>> efi_init()
>>     reserve_regions()
>>         early_init_dt_add_memory_arch()
>>             memblock_add()
>>                 memblock_add_range(memblock.memory)
>>
>> The memory regions described in efi.memmap are added to "memory" list
>> with all the neighboring regions being merged into ones,
>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>>
>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
>> reserve_regions(), which creates an isolated region since it now has
>> a different attribute.
>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
>> unified.
>>
>> Look at request_standard_resources(). It handles only "memory" list,
>> and doesn't care about whether any arbitrary part of memory is in
>> "reserved" list or not.
>
> Thanks for the pointers. Now I did some experiments and traversed the
> whole memblock path and I see
> how these two regions get merged into a single region which is later
> on recognized by
> 'request_standard_resources()' as a System RAM region rather than a
> RESERVED region.
>
> I recently reproduced this on a APM mustang with latest kernel as well
> when acpi is used to boot the machine, which makes me believe that
> this is a generic issue for arm64 machines with the 4.14 kernel and if
> they use acpi=force as the boot method.
>
> I am not sure, if a fix/or hack would be suitable for all underlying
> arm64 machines, but I am trying one on the arm64 machines I have to
> see if it fixes the issue.
>
> @Ard:
>
> Hi Ard,
>
> I think to create and test a clean solution for all arm64 boards it
> will take some time, in the meantime should we consider reverting the
> commit [1] to make sure that acpi enabled arm64 machines can boot with
> 4.14?
>
> Please let me know your opinion.
>
> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
> ACPI reclaim memory as MEMBLOCK_NOMAP)
>

I don't think that is really going to help tbh.

ACPI reclaim regions are not the only regions that are
memblock_reserve()d and need to be reserved by the incoming kernel as
well. So as far as I can tell, this is a symptom of an underlying
issue that we will need to solve, and reverting the code that exposed
it will not make the bug go away.

-- 
Ard.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-04 14:02                             ` Ard Biesheuvel
@ 2017-12-12 21:51                                 ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-12 21:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Bhupesh SHARMA, AKASHI Takahiro, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

Hi Ard, Akashi

On Mon, Dec 4, 2017 at 7:32 PM, Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Akashi,
>>
>> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> Bhupesh,
>>>
>>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>>>
>>>   (snip)
>>>
>>>> # dmesg | grep -B 2 -i "ACPI reclaim"
>>>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>>>> |  |  |  |  |   |WB|WT|WC|UC]
>>>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>>>> |  |  |  |   |WB|WT|WC|UC]
>>>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>>>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>>>
>>>> 2. Now, I am not sure which kernel layer does the following changes (I am
>>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>>>> DSDT table regions are somehow merged into one memblock_region and appear as
>>>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>>>
>>>> # cat /proc/iomem | grep -A 2 -B 2 39
>>>> 00000000-3961ffff : System RAM
>>>>   00080000-00b6ffff : Kernel code
>>>>   00cb0000-0167ffff : Kernel data
>>>>   0e800000-2e7fffff : Crash kernel
>>>> 39620000-396bffff : reserved
>>>> 396c0000-3975ffff : System RAM
>>>> 39760000-3976ffff : reserved
>>>> 39770000-397affff : reserved
>>>> 397b0000-3989ffff : reserved
>>>> 398a0000-398bffff : reserved
>>>> 398c0000-39d3ffff : reserved
>>>> 39d40000-3ed2ffff : System RAM
>>>>
>>>   (snip)
>>>>
>>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>>>> table' ranges to be merged into a single region at
>>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>>>> 'memblock_is_reserved'.
>>>
>>> Simple:) The short answer is that memblock_add() does.
>>>
>>> The long answer:
>>> First, please note that memblock maintains two type of regions list,
>>> "memory" and "reserved".
>>>
>>> efi_init()
>>>     reserve_regions()
>>>         early_init_dt_add_memory_arch()
>>>             memblock_add()
>>>                 memblock_add_range(memblock.memory)
>>>
>>> The memory regions described in efi.memmap are added to "memory" list
>>> with all the neighboring regions being merged into ones,
>>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>>>
>>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
>>> reserve_regions(), which creates an isolated region since it now has
>>> a different attribute.
>>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
>>> unified.
>>>
>>> Look at request_standard_resources(). It handles only "memory" list,
>>> and doesn't care about whether any arbitrary part of memory is in
>>> "reserved" list or not.
>>
>> Thanks for the pointers. Now I did some experiments and traversed the
>> whole memblock path and I see
>> how these two regions get merged into a single region which is later
>> on recognized by
>> 'request_standard_resources()' as a System RAM region rather than a
>> RESERVED region.
>>
>> I recently reproduced this on a APM mustang with latest kernel as well
>> when acpi is used to boot the machine, which makes me believe that
>> this is a generic issue for arm64 machines with the 4.14 kernel and if
>> they use acpi=force as the boot method.
>>
>> I am not sure, if a fix/or hack would be suitable for all underlying
>> arm64 machines, but I am trying one on the arm64 machines I have to
>> see if it fixes the issue.
>>
>> @Ard:
>>
>> Hi Ard,
>>
>> I think to create and test a clean solution for all arm64 boards it
>> will take some time, in the meantime should we consider reverting the
>> commit [1] to make sure that acpi enabled arm64 machines can boot with
>> 4.14?
>>
>> Please let me know your opinion.
>>
>> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
>> ACPI reclaim memory as MEMBLOCK_NOMAP)
>>
>
> I don't think that is really going to help tbh.
>
> ACPI reclaim regions are not the only regions that are
> memblock_reserve()d and need to be reserved by the incoming kernel as
> well. So as far as I can tell, this is a symptom of an underlying
> issue that we will need to solve, and reverting the code that exposed
> it will not make the bug go away.
>

Looking deeper into the issue, since the arm64 kexec-tools uses the
'linux,usable-memory-range' dt property to allow crash dump kernel to
identify its own usable memory and exclude, at its boot time, any
other memory areas that are part of the panicked kernel's memory.
(see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
, for details)

1). Now when 'kexec -p' is executed, this node is patched up only
with the crashkernel memory range:

                /* add linux,usable-memory-range */
                nodeoffset = fdt_path_offset(new_buf, "/chosen");
                result = fdt_setprop_range(new_buf, nodeoffset,
                                PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
                                address_cells, size_cells);

(see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
, for details)

2). This excludes the ACPI reclaim regions irrespective of whether
they are marked as System RAM or as RESERVED. As,
'linux,usable-memory-range' dt node is patched up only with
'crash_reserved_mem' and not 'system_memory_ranges'

3). As a result when the crashkernel boots up it doesn't find this
ACPI memory and crashes while trying to access the same:

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
-r`.img --reuse-cmdline -d

[snip..]

Reserved memory range
000000000e800000-000000002e7fffff (0)

Coredump memory ranges
0000000000000000-000000000e7fffff (0)
000000002e800000-000000003961ffff (0)
0000000039d40000-000000003ed2ffff (0)
000000003ed60000-000000003fbfffff (0)
0000001040000000-0000001ffbffffff (0)
0000002000000000-0000002ffbffffff (0)
0000009000000000-0000009ffbffffff (0)
000000a000000000-000000affbffffff (0)

4). So if we revert Ard's patch or just comment the fixing up of the
memory cap'ing passed to the crash kernel inside
'arch/arm64/mm/init.c' (see below):

static void __init fdt_enforce_memory_region(void)
{
        struct memblock_region reg = {
                .size = 0,
        };

        of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);

        if (reg.size)
                //memblock_cap_memory_range(reg.base, reg.size); /*
comment this out */
}

5). Both the above temporary solutions fix the problem.

6). However exposing all System RAM regions to the crashkernel is not
advisable and may cause the crashkernel or some crashkernel drivers to
fail.

6a). I am trying an approach now, where the ACPI reclaim regions are
added to '/proc/iomem' separately as ACPI reclaim regions by the
kernel code and on the other hand the user-space 'kexec-tools' will
pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
dt node 'linux,usable-memory-range'

6b). The kernel code currently looks like the following:

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 30ad2f085d1f..867bdec7c692 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
 {
     struct memblock_region *region;
     struct resource *res;
+    phys_addr_t addr_start, addr_end;

     kernel_code.start   = __pa_symbol(_text);
     kernel_code.end     = __pa_symbol(__init_begin - 1);
@@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
             res->name  = "reserved";
             res->flags = IORESOURCE_MEM;
         } else {
-            res->name  = "System RAM";
-            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+            addr_start =
__pfn_to_phys(memblock_region_reserved_base_pfn(region));
+            addr_end =
__pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
+            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
|| (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
+                res->name  = "ACPI reclaim region";
+                res->flags = IORESOURCE_MEM;
+            } else {
+                res->name  = "System RAM";
+                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+            }
         }
+
         res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
         res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;

@@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)

     request_standard_resources();

+    efi_memmap_unmap();
     early_ioremap_reset();

     if (acpi_disabled)
diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 80d1a885def5..a7c522eac640 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -259,7 +259,6 @@ void __init efi_init(void)

     reserve_regions();
     efi_esrt_init();
-    efi_memmap_unmap();

     memblock_reserve(params.mmap & PAGE_MASK,
              PAGE_ALIGN(params.mmap_size +


After this change the ACPI reclaim regions are properly recognized in
'/proc/iomem':

# cat /proc/iomem | grep -i ACPI
396c0000-3975ffff : ACPI reclaim region
39770000-397affff : ACPI reclaim region
398a0000-398bffff : ACPI reclaim region

6c). I am currently changing the 'kexec-tools' and will finish the
testing over the next few days.

I just wanted to know your opinion on this issue, so that I will be
able to propose a fix on the above lines.

Also Cc'ing kexec mailing list for more inputs on changes proposed to
kexec-tools.

Thanks,
Bhupesh

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-12 21:51                                 ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-12 21:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard, Akashi

On Mon, Dec 4, 2017 at 7:32 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux@gmail.com> wrote:
>> Hi Akashi,
>>
>> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>>> Bhupesh,
>>>
>>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>>>
>>>   (snip)
>>>
>>>> # dmesg | grep -B 2 -i "ACPI reclaim"
>>>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>>>> |  |  |  |  |   |WB|WT|WC|UC]
>>>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>>>> |  |  |  |   |WB|WT|WC|UC]
>>>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>>>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>>>
>>>> 2. Now, I am not sure which kernel layer does the following changes (I am
>>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>>>> DSDT table regions are somehow merged into one memblock_region and appear as
>>>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>>>
>>>> # cat /proc/iomem | grep -A 2 -B 2 39
>>>> 00000000-3961ffff : System RAM
>>>>   00080000-00b6ffff : Kernel code
>>>>   00cb0000-0167ffff : Kernel data
>>>>   0e800000-2e7fffff : Crash kernel
>>>> 39620000-396bffff : reserved
>>>> 396c0000-3975ffff : System RAM
>>>> 39760000-3976ffff : reserved
>>>> 39770000-397affff : reserved
>>>> 397b0000-3989ffff : reserved
>>>> 398a0000-398bffff : reserved
>>>> 398c0000-39d3ffff : reserved
>>>> 39d40000-3ed2ffff : System RAM
>>>>
>>>   (snip)
>>>>
>>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>>>> table' ranges to be merged into a single region at
>>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>>>> 'memblock_is_reserved'.
>>>
>>> Simple:) The short answer is that memblock_add() does.
>>>
>>> The long answer:
>>> First, please note that memblock maintains two type of regions list,
>>> "memory" and "reserved".
>>>
>>> efi_init()
>>>     reserve_regions()
>>>         early_init_dt_add_memory_arch()
>>>             memblock_add()
>>>                 memblock_add_range(memblock.memory)
>>>
>>> The memory regions described in efi.memmap are added to "memory" list
>>> with all the neighboring regions being merged into ones,
>>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>>>
>>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
>>> reserve_regions(), which creates an isolated region since it now has
>>> a different attribute.
>>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
>>> unified.
>>>
>>> Look at request_standard_resources(). It handles only "memory" list,
>>> and doesn't care about whether any arbitrary part of memory is in
>>> "reserved" list or not.
>>
>> Thanks for the pointers. Now I did some experiments and traversed the
>> whole memblock path and I see
>> how these two regions get merged into a single region which is later
>> on recognized by
>> 'request_standard_resources()' as a System RAM region rather than a
>> RESERVED region.
>>
>> I recently reproduced this on a APM mustang with latest kernel as well
>> when acpi is used to boot the machine, which makes me believe that
>> this is a generic issue for arm64 machines with the 4.14 kernel and if
>> they use acpi=force as the boot method.
>>
>> I am not sure, if a fix/or hack would be suitable for all underlying
>> arm64 machines, but I am trying one on the arm64 machines I have to
>> see if it fixes the issue.
>>
>> @Ard:
>>
>> Hi Ard,
>>
>> I think to create and test a clean solution for all arm64 boards it
>> will take some time, in the meantime should we consider reverting the
>> commit [1] to make sure that acpi enabled arm64 machines can boot with
>> 4.14?
>>
>> Please let me know your opinion.
>>
>> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
>> ACPI reclaim memory as MEMBLOCK_NOMAP)
>>
>
> I don't think that is really going to help tbh.
>
> ACPI reclaim regions are not the only regions that are
> memblock_reserve()d and need to be reserved by the incoming kernel as
> well. So as far as I can tell, this is a symptom of an underlying
> issue that we will need to solve, and reverting the code that exposed
> it will not make the bug go away.
>

Looking deeper into the issue, since the arm64 kexec-tools uses the
'linux,usable-memory-range' dt property to allow crash dump kernel to
identify its own usable memory and exclude, at its boot time, any
other memory areas that are part of the panicked kernel's memory.
(see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
, for details)

1). Now when 'kexec -p' is executed, this node is patched up only
with the crashkernel memory range:

                /* add linux,usable-memory-range */
                nodeoffset = fdt_path_offset(new_buf, "/chosen");
                result = fdt_setprop_range(new_buf, nodeoffset,
                                PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
                                address_cells, size_cells);

(see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
, for details)

2). This excludes the ACPI reclaim regions irrespective of whether
they are marked as System RAM or as RESERVED. As,
'linux,usable-memory-range' dt node is patched up only with
'crash_reserved_mem' and not 'system_memory_ranges'

3). As a result when the crashkernel boots up it doesn't find this
ACPI memory and crashes while trying to access the same:

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
-r`.img --reuse-cmdline -d

[snip..]

Reserved memory range
000000000e800000-000000002e7fffff (0)

Coredump memory ranges
0000000000000000-000000000e7fffff (0)
000000002e800000-000000003961ffff (0)
0000000039d40000-000000003ed2ffff (0)
000000003ed60000-000000003fbfffff (0)
0000001040000000-0000001ffbffffff (0)
0000002000000000-0000002ffbffffff (0)
0000009000000000-0000009ffbffffff (0)
000000a000000000-000000affbffffff (0)

4). So if we revert Ard's patch or just comment the fixing up of the
memory cap'ing passed to the crash kernel inside
'arch/arm64/mm/init.c' (see below):

static void __init fdt_enforce_memory_region(void)
{
        struct memblock_region reg = {
                .size = 0,
        };

        of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);

        if (reg.size)
                //memblock_cap_memory_range(reg.base, reg.size); /*
comment this out */
}

5). Both the above temporary solutions fix the problem.

6). However exposing all System RAM regions to the crashkernel is not
advisable and may cause the crashkernel or some crashkernel drivers to
fail.

6a). I am trying an approach now, where the ACPI reclaim regions are
added to '/proc/iomem' separately as ACPI reclaim regions by the
kernel code and on the other hand the user-space 'kexec-tools' will
pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
dt node 'linux,usable-memory-range'

6b). The kernel code currently looks like the following:

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 30ad2f085d1f..867bdec7c692 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
 {
     struct memblock_region *region;
     struct resource *res;
+    phys_addr_t addr_start, addr_end;

     kernel_code.start   = __pa_symbol(_text);
     kernel_code.end     = __pa_symbol(__init_begin - 1);
@@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
             res->name  = "reserved";
             res->flags = IORESOURCE_MEM;
         } else {
-            res->name  = "System RAM";
-            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+            addr_start =
__pfn_to_phys(memblock_region_reserved_base_pfn(region));
+            addr_end =
__pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
+            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
|| (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
+                res->name  = "ACPI reclaim region";
+                res->flags = IORESOURCE_MEM;
+            } else {
+                res->name  = "System RAM";
+                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+            }
         }
+
         res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
         res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;

@@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)

     request_standard_resources();

+    efi_memmap_unmap();
     early_ioremap_reset();

     if (acpi_disabled)
diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 80d1a885def5..a7c522eac640 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -259,7 +259,6 @@ void __init efi_init(void)

     reserve_regions();
     efi_esrt_init();
-    efi_memmap_unmap();

     memblock_reserve(params.mmap & PAGE_MASK,
              PAGE_ALIGN(params.mmap_size +


After this change the ACPI reclaim regions are properly recognized in
'/proc/iomem':

# cat /proc/iomem | grep -i ACPI
396c0000-3975ffff : ACPI reclaim region
39770000-397affff : ACPI reclaim region
398a0000-398bffff : ACPI reclaim region

6c). I am currently changing the 'kexec-tools' and will finish the
testing over the next few days.

I just wanted to know your opinion on this issue, so that I will be
able to propose a fix on the above lines.

Also Cc'ing kexec mailing list for more inputs on changes proposed to
kexec-tools.

Thanks,
Bhupesh

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-12 21:51                                 ` Bhupesh Sharma
@ 2017-12-13 10:26                                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-13 10:26 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

Bhupesh, Ard,

On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> Hi Ard, Akashi
> 
(snip)

> Looking deeper into the issue, since the arm64 kexec-tools uses the
> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> identify its own usable memory and exclude, at its boot time, any
> other memory areas that are part of the panicked kernel's memory.
> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> , for details)

Right.

> 1). Now when 'kexec -p' is executed, this node is patched up only
> with the crashkernel memory range:
> 
>                 /* add linux,usable-memory-range */
>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>                 result = fdt_setprop_range(new_buf, nodeoffset,
>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>                                 address_cells, size_cells);
> 
> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> , for details)
> 
> 2). This excludes the ACPI reclaim regions irrespective of whether
> they are marked as System RAM or as RESERVED. As,
> 'linux,usable-memory-range' dt node is patched up only with
> 'crash_reserved_mem' and not 'system_memory_ranges'
> 
> 3). As a result when the crashkernel boots up it doesn't find this
> ACPI memory and crashes while trying to access the same:
> 
> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> -r`.img --reuse-cmdline -d
> 
> [snip..]
> 
> Reserved memory range
> 000000000e800000-000000002e7fffff (0)
> 
> Coredump memory ranges
> 0000000000000000-000000000e7fffff (0)
> 000000002e800000-000000003961ffff (0)
> 0000000039d40000-000000003ed2ffff (0)
> 000000003ed60000-000000003fbfffff (0)
> 0000001040000000-0000001ffbffffff (0)
> 0000002000000000-0000002ffbffffff (0)
> 0000009000000000-0000009ffbffffff (0)
> 000000a000000000-000000affbffffff (0)
> 
> 4). So if we revert Ard's patch or just comment the fixing up of the
> memory cap'ing passed to the crash kernel inside
> 'arch/arm64/mm/init.c' (see below):
> 
> static void __init fdt_enforce_memory_region(void)
> {
>         struct memblock_region reg = {
>                 .size = 0,
>         };
> 
>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> 
>         if (reg.size)
>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> comment this out */
> }

Please just don't do that. It can cause a fatal damage on
memory contents of the *crashed* kernel.

> 5). Both the above temporary solutions fix the problem.
> 
> 6). However exposing all System RAM regions to the crashkernel is not
> advisable and may cause the crashkernel or some crashkernel drivers to
> fail.
> 
> 6a). I am trying an approach now, where the ACPI reclaim regions are
> added to '/proc/iomem' separately as ACPI reclaim regions by the
> kernel code and on the other hand the user-space 'kexec-tools' will
> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> dt node 'linux,usable-memory-range'

I still don't understand why we need to carry over the information
about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
such regions are free to be reused by the kernel after some point of
initialization. Why does crash dump kernel need to know about them?

(In other words, can or should we skip some part of ACPI-related init code
on crash dump kernel?)

Thanks,
-Takahiro AKASHI

> 6b). The kernel code currently looks like the following:
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 30ad2f085d1f..867bdec7c692 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>  {
>      struct memblock_region *region;
>      struct resource *res;
> +    phys_addr_t addr_start, addr_end;
> 
>      kernel_code.start   = __pa_symbol(_text);
>      kernel_code.end     = __pa_symbol(__init_begin - 1);
> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>              res->name  = "reserved";
>              res->flags = IORESOURCE_MEM;
>          } else {
> -            res->name  = "System RAM";
> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +            addr_start =
> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> +            addr_end =
> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
> +                res->name  = "ACPI reclaim region";
> +                res->flags = IORESOURCE_MEM;
> +            } else {
> +                res->name  = "System RAM";
> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +            }
>          }
> +
>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> 
> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
> 
>      request_standard_resources();
> 
> +    efi_memmap_unmap();
>      early_ioremap_reset();
> 
>      if (acpi_disabled)
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index 80d1a885def5..a7c522eac640 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -259,7 +259,6 @@ void __init efi_init(void)
> 
>      reserve_regions();
>      efi_esrt_init();
> -    efi_memmap_unmap();
> 
>      memblock_reserve(params.mmap & PAGE_MASK,
>               PAGE_ALIGN(params.mmap_size +
> 
> 
> After this change the ACPI reclaim regions are properly recognized in
> '/proc/iomem':
> 
> # cat /proc/iomem | grep -i ACPI
> 396c0000-3975ffff : ACPI reclaim region
> 39770000-397affff : ACPI reclaim region
> 398a0000-398bffff : ACPI reclaim region
> 
> 6c). I am currently changing the 'kexec-tools' and will finish the
> testing over the next few days.
> 
> I just wanted to know your opinion on this issue, so that I will be
> able to propose a fix on the above lines.
> 
> Also Cc'ing kexec mailing list for more inputs on changes proposed to
> kexec-tools.
> 
> Thanks,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-13 10:26                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-13 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

Bhupesh, Ard,

On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> Hi Ard, Akashi
> 
(snip)

> Looking deeper into the issue, since the arm64 kexec-tools uses the
> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> identify its own usable memory and exclude, at its boot time, any
> other memory areas that are part of the panicked kernel's memory.
> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> , for details)

Right.

> 1). Now when 'kexec -p' is executed, this node is patched up only
> with the crashkernel memory range:
> 
>                 /* add linux,usable-memory-range */
>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>                 result = fdt_setprop_range(new_buf, nodeoffset,
>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>                                 address_cells, size_cells);
> 
> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> , for details)
> 
> 2). This excludes the ACPI reclaim regions irrespective of whether
> they are marked as System RAM or as RESERVED. As,
> 'linux,usable-memory-range' dt node is patched up only with
> 'crash_reserved_mem' and not 'system_memory_ranges'
> 
> 3). As a result when the crashkernel boots up it doesn't find this
> ACPI memory and crashes while trying to access the same:
> 
> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> -r`.img --reuse-cmdline -d
> 
> [snip..]
> 
> Reserved memory range
> 000000000e800000-000000002e7fffff (0)
> 
> Coredump memory ranges
> 0000000000000000-000000000e7fffff (0)
> 000000002e800000-000000003961ffff (0)
> 0000000039d40000-000000003ed2ffff (0)
> 000000003ed60000-000000003fbfffff (0)
> 0000001040000000-0000001ffbffffff (0)
> 0000002000000000-0000002ffbffffff (0)
> 0000009000000000-0000009ffbffffff (0)
> 000000a000000000-000000affbffffff (0)
> 
> 4). So if we revert Ard's patch or just comment the fixing up of the
> memory cap'ing passed to the crash kernel inside
> 'arch/arm64/mm/init.c' (see below):
> 
> static void __init fdt_enforce_memory_region(void)
> {
>         struct memblock_region reg = {
>                 .size = 0,
>         };
> 
>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> 
>         if (reg.size)
>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> comment this out */
> }

Please just don't do that. It can cause a fatal damage on
memory contents of the *crashed* kernel.

> 5). Both the above temporary solutions fix the problem.
> 
> 6). However exposing all System RAM regions to the crashkernel is not
> advisable and may cause the crashkernel or some crashkernel drivers to
> fail.
> 
> 6a). I am trying an approach now, where the ACPI reclaim regions are
> added to '/proc/iomem' separately as ACPI reclaim regions by the
> kernel code and on the other hand the user-space 'kexec-tools' will
> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> dt node 'linux,usable-memory-range'

I still don't understand why we need to carry over the information
about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
such regions are free to be reused by the kernel after some point of
initialization. Why does crash dump kernel need to know about them?

(In other words, can or should we skip some part of ACPI-related init code
on crash dump kernel?)

Thanks,
-Takahiro AKASHI

> 6b). The kernel code currently looks like the following:
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 30ad2f085d1f..867bdec7c692 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>  {
>      struct memblock_region *region;
>      struct resource *res;
> +    phys_addr_t addr_start, addr_end;
> 
>      kernel_code.start   = __pa_symbol(_text);
>      kernel_code.end     = __pa_symbol(__init_begin - 1);
> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>              res->name  = "reserved";
>              res->flags = IORESOURCE_MEM;
>          } else {
> -            res->name  = "System RAM";
> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +            addr_start =
> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> +            addr_end =
> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
> +                res->name  = "ACPI reclaim region";
> +                res->flags = IORESOURCE_MEM;
> +            } else {
> +                res->name  = "System RAM";
> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +            }
>          }
> +
>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> 
> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
> 
>      request_standard_resources();
> 
> +    efi_memmap_unmap();
>      early_ioremap_reset();
> 
>      if (acpi_disabled)
> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> index 80d1a885def5..a7c522eac640 100644
> --- a/drivers/firmware/efi/arm-init.c
> +++ b/drivers/firmware/efi/arm-init.c
> @@ -259,7 +259,6 @@ void __init efi_init(void)
> 
>      reserve_regions();
>      efi_esrt_init();
> -    efi_memmap_unmap();
> 
>      memblock_reserve(params.mmap & PAGE_MASK,
>               PAGE_ALIGN(params.mmap_size +
> 
> 
> After this change the ACPI reclaim regions are properly recognized in
> '/proc/iomem':
> 
> # cat /proc/iomem | grep -i ACPI
> 396c0000-3975ffff : ACPI reclaim region
> 39770000-397affff : ACPI reclaim region
> 398a0000-398bffff : ACPI reclaim region
> 
> 6c). I am currently changing the 'kexec-tools' and will finish the
> testing over the next few days.
> 
> I just wanted to know your opinion on this issue, so that I will be
> able to propose a fix on the above lines.
> 
> Also Cc'ing kexec mailing list for more inputs on changes proposed to
> kexec-tools.
> 
> Thanks,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-13 10:26                                     ` AKASHI Takahiro
@ 2017-12-13 10:49                                         ` Ard Biesheuvel
  -1 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-13 10:49 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On 13 December 2017 at 10:26, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Bhupesh, Ard,
>
> On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> Hi Ard, Akashi
>>
> (snip)
>
>> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> identify its own usable memory and exclude, at its boot time, any
>> other memory areas that are part of the panicked kernel's memory.
>> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> , for details)
>
> Right.
>
>> 1). Now when 'kexec -p' is executed, this node is patched up only
>> with the crashkernel memory range:
>>
>>                 /* add linux,usable-memory-range */
>>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>                                 address_cells, size_cells);
>>
>> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> , for details)
>>
>> 2). This excludes the ACPI reclaim regions irrespective of whether
>> they are marked as System RAM or as RESERVED. As,
>> 'linux,usable-memory-range' dt node is patched up only with
>> 'crash_reserved_mem' and not 'system_memory_ranges'
>>
>> 3). As a result when the crashkernel boots up it doesn't find this
>> ACPI memory and crashes while trying to access the same:
>>
>> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> -r`.img --reuse-cmdline -d
>>
>> [snip..]
>>
>> Reserved memory range
>> 000000000e800000-000000002e7fffff (0)
>>
>> Coredump memory ranges
>> 0000000000000000-000000000e7fffff (0)
>> 000000002e800000-000000003961ffff (0)
>> 0000000039d40000-000000003ed2ffff (0)
>> 000000003ed60000-000000003fbfffff (0)
>> 0000001040000000-0000001ffbffffff (0)
>> 0000002000000000-0000002ffbffffff (0)
>> 0000009000000000-0000009ffbffffff (0)
>> 000000a000000000-000000affbffffff (0)
>>
>> 4). So if we revert Ard's patch or just comment the fixing up of the
>> memory cap'ing passed to the crash kernel inside
>> 'arch/arm64/mm/init.c' (see below):
>>
>> static void __init fdt_enforce_memory_region(void)
>> {
>>         struct memblock_region reg = {
>>                 .size = 0,
>>         };
>>
>>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>
>>         if (reg.size)
>>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> comment this out */
>> }
>
> Please just don't do that. It can cause a fatal damage on
> memory contents of the *crashed* kernel.
>
>> 5). Both the above temporary solutions fix the problem.
>>
>> 6). However exposing all System RAM regions to the crashkernel is not
>> advisable and may cause the crashkernel or some crashkernel drivers to
>> fail.
>>
>> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> kernel code and on the other hand the user-space 'kexec-tools' will
>> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> dt node 'linux,usable-memory-range'
>
> I still don't understand why we need to carry over the information
> about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> such regions are free to be reused by the kernel after some point of
> initialization. Why does crash dump kernel need to know about them?
>

Not really. According to the UEFI spec, they can be reclaimed after
the OS has initialized, i.e., when it has consumed the ACPI tables and
no longer needs them. Of course, in order to be able to boot a kexec
kernel, those regions needs to be preserved, which is why they are
memblock_reserve()'d now.

So it seems that kexec does not honour the memblock_reserve() table
when booting the next kernel.

> (In other words, can or should we skip some part of ACPI-related init code
> on crash dump kernel?)
>

I don't think so. And the change to the handling of ACPI reclaim
regions only revealed the bug, not created it (given that other
memblock_reserve regions may be affected as well)


>> 6b). The kernel code currently looks like the following:
>>
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 30ad2f085d1f..867bdec7c692 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>>  {
>>      struct memblock_region *region;
>>      struct resource *res;
>> +    phys_addr_t addr_start, addr_end;
>>
>>      kernel_code.start   = __pa_symbol(_text);
>>      kernel_code.end     = __pa_symbol(__init_begin - 1);
>> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>>              res->name  = "reserved";
>>              res->flags = IORESOURCE_MEM;
>>          } else {
>> -            res->name  = "System RAM";
>> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> +            addr_start =
>> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
>> +            addr_end =
>> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
>> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
>> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
>> +                res->name  = "ACPI reclaim region";
>> +                res->flags = IORESOURCE_MEM;
>> +            } else {
>> +                res->name  = "System RAM";
>> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> +            }
>>          }
>> +
>>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
>>
>> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
>>
>>      request_standard_resources();
>>
>> +    efi_memmap_unmap();
>>      early_ioremap_reset();
>>
>>      if (acpi_disabled)
>> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>> index 80d1a885def5..a7c522eac640 100644
>> --- a/drivers/firmware/efi/arm-init.c
>> +++ b/drivers/firmware/efi/arm-init.c
>> @@ -259,7 +259,6 @@ void __init efi_init(void)
>>
>>      reserve_regions();
>>      efi_esrt_init();
>> -    efi_memmap_unmap();
>>
>>      memblock_reserve(params.mmap & PAGE_MASK,
>>               PAGE_ALIGN(params.mmap_size +
>>
>>
>> After this change the ACPI reclaim regions are properly recognized in
>> '/proc/iomem':
>>
>> # cat /proc/iomem | grep -i ACPI
>> 396c0000-3975ffff : ACPI reclaim region
>> 39770000-397affff : ACPI reclaim region
>> 398a0000-398bffff : ACPI reclaim region
>>
>> 6c). I am currently changing the 'kexec-tools' and will finish the
>> testing over the next few days.
>>
>> I just wanted to know your opinion on this issue, so that I will be
>> able to propose a fix on the above lines.
>>
>> Also Cc'ing kexec mailing list for more inputs on changes proposed to
>> kexec-tools.
>>
>> Thanks,
>> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-13 10:49                                         ` Ard Biesheuvel
  0 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-13 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 December 2017 at 10:26, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh, Ard,
>
> On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> Hi Ard, Akashi
>>
> (snip)
>
>> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> identify its own usable memory and exclude, at its boot time, any
>> other memory areas that are part of the panicked kernel's memory.
>> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> , for details)
>
> Right.
>
>> 1). Now when 'kexec -p' is executed, this node is patched up only
>> with the crashkernel memory range:
>>
>>                 /* add linux,usable-memory-range */
>>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>                                 address_cells, size_cells);
>>
>> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> , for details)
>>
>> 2). This excludes the ACPI reclaim regions irrespective of whether
>> they are marked as System RAM or as RESERVED. As,
>> 'linux,usable-memory-range' dt node is patched up only with
>> 'crash_reserved_mem' and not 'system_memory_ranges'
>>
>> 3). As a result when the crashkernel boots up it doesn't find this
>> ACPI memory and crashes while trying to access the same:
>>
>> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> -r`.img --reuse-cmdline -d
>>
>> [snip..]
>>
>> Reserved memory range
>> 000000000e800000-000000002e7fffff (0)
>>
>> Coredump memory ranges
>> 0000000000000000-000000000e7fffff (0)
>> 000000002e800000-000000003961ffff (0)
>> 0000000039d40000-000000003ed2ffff (0)
>> 000000003ed60000-000000003fbfffff (0)
>> 0000001040000000-0000001ffbffffff (0)
>> 0000002000000000-0000002ffbffffff (0)
>> 0000009000000000-0000009ffbffffff (0)
>> 000000a000000000-000000affbffffff (0)
>>
>> 4). So if we revert Ard's patch or just comment the fixing up of the
>> memory cap'ing passed to the crash kernel inside
>> 'arch/arm64/mm/init.c' (see below):
>>
>> static void __init fdt_enforce_memory_region(void)
>> {
>>         struct memblock_region reg = {
>>                 .size = 0,
>>         };
>>
>>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>
>>         if (reg.size)
>>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> comment this out */
>> }
>
> Please just don't do that. It can cause a fatal damage on
> memory contents of the *crashed* kernel.
>
>> 5). Both the above temporary solutions fix the problem.
>>
>> 6). However exposing all System RAM regions to the crashkernel is not
>> advisable and may cause the crashkernel or some crashkernel drivers to
>> fail.
>>
>> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> kernel code and on the other hand the user-space 'kexec-tools' will
>> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> dt node 'linux,usable-memory-range'
>
> I still don't understand why we need to carry over the information
> about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> such regions are free to be reused by the kernel after some point of
> initialization. Why does crash dump kernel need to know about them?
>

Not really. According to the UEFI spec, they can be reclaimed after
the OS has initialized, i.e., when it has consumed the ACPI tables and
no longer needs them. Of course, in order to be able to boot a kexec
kernel, those regions needs to be preserved, which is why they are
memblock_reserve()'d now.

So it seems that kexec does not honour the memblock_reserve() table
when booting the next kernel.

> (In other words, can or should we skip some part of ACPI-related init code
> on crash dump kernel?)
>

I don't think so. And the change to the handling of ACPI reclaim
regions only revealed the bug, not created it (given that other
memblock_reserve regions may be affected as well)


>> 6b). The kernel code currently looks like the following:
>>
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 30ad2f085d1f..867bdec7c692 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>>  {
>>      struct memblock_region *region;
>>      struct resource *res;
>> +    phys_addr_t addr_start, addr_end;
>>
>>      kernel_code.start   = __pa_symbol(_text);
>>      kernel_code.end     = __pa_symbol(__init_begin - 1);
>> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>>              res->name  = "reserved";
>>              res->flags = IORESOURCE_MEM;
>>          } else {
>> -            res->name  = "System RAM";
>> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> +            addr_start =
>> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
>> +            addr_end =
>> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
>> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
>> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
>> +                res->name  = "ACPI reclaim region";
>> +                res->flags = IORESOURCE_MEM;
>> +            } else {
>> +                res->name  = "System RAM";
>> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> +            }
>>          }
>> +
>>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
>>
>> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
>>
>>      request_standard_resources();
>>
>> +    efi_memmap_unmap();
>>      early_ioremap_reset();
>>
>>      if (acpi_disabled)
>> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>> index 80d1a885def5..a7c522eac640 100644
>> --- a/drivers/firmware/efi/arm-init.c
>> +++ b/drivers/firmware/efi/arm-init.c
>> @@ -259,7 +259,6 @@ void __init efi_init(void)
>>
>>      reserve_regions();
>>      efi_esrt_init();
>> -    efi_memmap_unmap();
>>
>>      memblock_reserve(params.mmap & PAGE_MASK,
>>               PAGE_ALIGN(params.mmap_size +
>>
>>
>> After this change the ACPI reclaim regions are properly recognized in
>> '/proc/iomem':
>>
>> # cat /proc/iomem | grep -i ACPI
>> 396c0000-3975ffff : ACPI reclaim region
>> 39770000-397affff : ACPI reclaim region
>> 398a0000-398bffff : ACPI reclaim region
>>
>> 6c). I am currently changing the 'kexec-tools' and will finish the
>> testing over the next few days.
>>
>> I just wanted to know your opinion on this issue, so that I will be
>> able to propose a fix on the above lines.
>>
>> Also Cc'ing kexec mailing list for more inputs on changes proposed to
>> kexec-tools.
>>
>> Thanks,
>> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-13 10:49                                         ` Ard Biesheuvel
@ 2017-12-13 12:16                                             ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-13 12:16 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> On 13 December 2017 at 10:26, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > Bhupesh, Ard,
> >
> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> Hi Ard, Akashi
> >>
> > (snip)
> >
> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> identify its own usable memory and exclude, at its boot time, any
> >> other memory areas that are part of the panicked kernel's memory.
> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> , for details)
> >
> > Right.
> >
> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> with the crashkernel memory range:
> >>
> >>                 /* add linux,usable-memory-range */
> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>                                 address_cells, size_cells);
> >>
> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> , for details)
> >>
> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> they are marked as System RAM or as RESERVED. As,
> >> 'linux,usable-memory-range' dt node is patched up only with
> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>
> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> ACPI memory and crashes while trying to access the same:
> >>
> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> -r`.img --reuse-cmdline -d
> >>
> >> [snip..]
> >>
> >> Reserved memory range
> >> 000000000e800000-000000002e7fffff (0)
> >>
> >> Coredump memory ranges
> >> 0000000000000000-000000000e7fffff (0)
> >> 000000002e800000-000000003961ffff (0)
> >> 0000000039d40000-000000003ed2ffff (0)
> >> 000000003ed60000-000000003fbfffff (0)
> >> 0000001040000000-0000001ffbffffff (0)
> >> 0000002000000000-0000002ffbffffff (0)
> >> 0000009000000000-0000009ffbffffff (0)
> >> 000000a000000000-000000affbffffff (0)
> >>
> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> memory cap'ing passed to the crash kernel inside
> >> 'arch/arm64/mm/init.c' (see below):
> >>
> >> static void __init fdt_enforce_memory_region(void)
> >> {
> >>         struct memblock_region reg = {
> >>                 .size = 0,
> >>         };
> >>
> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>
> >>         if (reg.size)
> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> comment this out */
> >> }
> >
> > Please just don't do that. It can cause a fatal damage on
> > memory contents of the *crashed* kernel.
> >
> >> 5). Both the above temporary solutions fix the problem.
> >>
> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> fail.
> >>
> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> dt node 'linux,usable-memory-range'
> >
> > I still don't understand why we need to carry over the information
> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > such regions are free to be reused by the kernel after some point of
> > initialization. Why does crash dump kernel need to know about them?
> >
> 
> Not really. According to the UEFI spec, they can be reclaimed after
> the OS has initialized, i.e., when it has consumed the ACPI tables and
> no longer needs them. Of course, in order to be able to boot a kexec
> kernel, those regions needs to be preserved, which is why they are
> memblock_reserve()'d now.

For my better understandings, who is actually accessing such regions
during boot time, uefi itself or efistub?

> So it seems that kexec does not honour the memblock_reserve() table
> when booting the next kernel.

not really.

> > (In other words, can or should we skip some part of ACPI-related init code
> > on crash dump kernel?)
> >
> 
> I don't think so. And the change to the handling of ACPI reclaim
> regions only revealed the bug, not created it (given that other
> memblock_reserve regions may be affected as well)

As whether we should honor such reserved regions over kexec'ing
depends on each one's specific nature, we will have to take care one-by-one.
As a matter of fact, no information about "reserved" memblocks is
exposed to user space (via proc/iomem).

-Takahiro AKASHI


> 
> >> 6b). The kernel code currently looks like the following:
> >>
> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >> index 30ad2f085d1f..867bdec7c692 100644
> >> --- a/arch/arm64/kernel/setup.c
> >> +++ b/arch/arm64/kernel/setup.c
> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
> >>  {
> >>      struct memblock_region *region;
> >>      struct resource *res;
> >> +    phys_addr_t addr_start, addr_end;
> >>
> >>      kernel_code.start   = __pa_symbol(_text);
> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
> >>              res->name  = "reserved";
> >>              res->flags = IORESOURCE_MEM;
> >>          } else {
> >> -            res->name  = "System RAM";
> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> +            addr_start =
> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> >> +            addr_end =
> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
> >> +                res->name  = "ACPI reclaim region";
> >> +                res->flags = IORESOURCE_MEM;
> >> +            } else {
> >> +                res->name  = "System RAM";
> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> +            }
> >>          }
> >> +
> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> >>
> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
> >>
> >>      request_standard_resources();
> >>
> >> +    efi_memmap_unmap();
> >>      early_ioremap_reset();
> >>
> >>      if (acpi_disabled)
> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> >> index 80d1a885def5..a7c522eac640 100644
> >> --- a/drivers/firmware/efi/arm-init.c
> >> +++ b/drivers/firmware/efi/arm-init.c
> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
> >>
> >>      reserve_regions();
> >>      efi_esrt_init();
> >> -    efi_memmap_unmap();
> >>
> >>      memblock_reserve(params.mmap & PAGE_MASK,
> >>               PAGE_ALIGN(params.mmap_size +
> >>
> >>
> >> After this change the ACPI reclaim regions are properly recognized in
> >> '/proc/iomem':
> >>
> >> # cat /proc/iomem | grep -i ACPI
> >> 396c0000-3975ffff : ACPI reclaim region
> >> 39770000-397affff : ACPI reclaim region
> >> 398a0000-398bffff : ACPI reclaim region
> >>
> >> 6c). I am currently changing the 'kexec-tools' and will finish the
> >> testing over the next few days.
> >>
> >> I just wanted to know your opinion on this issue, so that I will be
> >> able to propose a fix on the above lines.
> >>
> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
> >> kexec-tools.
> >>
> >> Thanks,
> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-13 12:16                                             ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-13 12:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> On 13 December 2017 at 10:26, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > Bhupesh, Ard,
> >
> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> Hi Ard, Akashi
> >>
> > (snip)
> >
> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> identify its own usable memory and exclude, at its boot time, any
> >> other memory areas that are part of the panicked kernel's memory.
> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> , for details)
> >
> > Right.
> >
> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> with the crashkernel memory range:
> >>
> >>                 /* add linux,usable-memory-range */
> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>                                 address_cells, size_cells);
> >>
> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> , for details)
> >>
> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> they are marked as System RAM or as RESERVED. As,
> >> 'linux,usable-memory-range' dt node is patched up only with
> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>
> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> ACPI memory and crashes while trying to access the same:
> >>
> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> -r`.img --reuse-cmdline -d
> >>
> >> [snip..]
> >>
> >> Reserved memory range
> >> 000000000e800000-000000002e7fffff (0)
> >>
> >> Coredump memory ranges
> >> 0000000000000000-000000000e7fffff (0)
> >> 000000002e800000-000000003961ffff (0)
> >> 0000000039d40000-000000003ed2ffff (0)
> >> 000000003ed60000-000000003fbfffff (0)
> >> 0000001040000000-0000001ffbffffff (0)
> >> 0000002000000000-0000002ffbffffff (0)
> >> 0000009000000000-0000009ffbffffff (0)
> >> 000000a000000000-000000affbffffff (0)
> >>
> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> memory cap'ing passed to the crash kernel inside
> >> 'arch/arm64/mm/init.c' (see below):
> >>
> >> static void __init fdt_enforce_memory_region(void)
> >> {
> >>         struct memblock_region reg = {
> >>                 .size = 0,
> >>         };
> >>
> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>
> >>         if (reg.size)
> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> comment this out */
> >> }
> >
> > Please just don't do that. It can cause a fatal damage on
> > memory contents of the *crashed* kernel.
> >
> >> 5). Both the above temporary solutions fix the problem.
> >>
> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> fail.
> >>
> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> dt node 'linux,usable-memory-range'
> >
> > I still don't understand why we need to carry over the information
> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > such regions are free to be reused by the kernel after some point of
> > initialization. Why does crash dump kernel need to know about them?
> >
> 
> Not really. According to the UEFI spec, they can be reclaimed after
> the OS has initialized, i.e., when it has consumed the ACPI tables and
> no longer needs them. Of course, in order to be able to boot a kexec
> kernel, those regions needs to be preserved, which is why they are
> memblock_reserve()'d now.

For my better understandings, who is actually accessing such regions
during boot time, uefi itself or efistub?

> So it seems that kexec does not honour the memblock_reserve() table
> when booting the next kernel.

not really.

> > (In other words, can or should we skip some part of ACPI-related init code
> > on crash dump kernel?)
> >
> 
> I don't think so. And the change to the handling of ACPI reclaim
> regions only revealed the bug, not created it (given that other
> memblock_reserve regions may be affected as well)

As whether we should honor such reserved regions over kexec'ing
depends on each one's specific nature, we will have to take care one-by-one.
As a matter of fact, no information about "reserved" memblocks is
exposed to user space (via proc/iomem).

-Takahiro AKASHI


> 
> >> 6b). The kernel code currently looks like the following:
> >>
> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >> index 30ad2f085d1f..867bdec7c692 100644
> >> --- a/arch/arm64/kernel/setup.c
> >> +++ b/arch/arm64/kernel/setup.c
> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
> >>  {
> >>      struct memblock_region *region;
> >>      struct resource *res;
> >> +    phys_addr_t addr_start, addr_end;
> >>
> >>      kernel_code.start   = __pa_symbol(_text);
> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
> >>              res->name  = "reserved";
> >>              res->flags = IORESOURCE_MEM;
> >>          } else {
> >> -            res->name  = "System RAM";
> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> +            addr_start =
> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> >> +            addr_end =
> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
> >> +                res->name  = "ACPI reclaim region";
> >> +                res->flags = IORESOURCE_MEM;
> >> +            } else {
> >> +                res->name  = "System RAM";
> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> +            }
> >>          }
> >> +
> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> >>
> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
> >>
> >>      request_standard_resources();
> >>
> >> +    efi_memmap_unmap();
> >>      early_ioremap_reset();
> >>
> >>      if (acpi_disabled)
> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> >> index 80d1a885def5..a7c522eac640 100644
> >> --- a/drivers/firmware/efi/arm-init.c
> >> +++ b/drivers/firmware/efi/arm-init.c
> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
> >>
> >>      reserve_regions();
> >>      efi_esrt_init();
> >> -    efi_memmap_unmap();
> >>
> >>      memblock_reserve(params.mmap & PAGE_MASK,
> >>               PAGE_ALIGN(params.mmap_size +
> >>
> >>
> >> After this change the ACPI reclaim regions are properly recognized in
> >> '/proc/iomem':
> >>
> >> # cat /proc/iomem | grep -i ACPI
> >> 396c0000-3975ffff : ACPI reclaim region
> >> 39770000-397affff : ACPI reclaim region
> >> 398a0000-398bffff : ACPI reclaim region
> >>
> >> 6c). I am currently changing the 'kexec-tools' and will finish the
> >> testing over the next few days.
> >>
> >> I just wanted to know your opinion on this issue, so that I will be
> >> able to propose a fix on the above lines.
> >>
> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
> >> kexec-tools.
> >>
> >> Thanks,
> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-13 12:16                                             ` AKASHI Takahiro
@ 2017-12-13 12:17                                                 ` Ard Biesheuvel
  -1 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-13 12:17 UTC (permalink / raw)
  To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On 13 December 2017 at 12:16, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> On 13 December 2017 at 10:26, AKASHI Takahiro
>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > Bhupesh, Ard,
>> >
>> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> Hi Ard, Akashi
>> >>
>> > (snip)
>> >
>> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> identify its own usable memory and exclude, at its boot time, any
>> >> other memory areas that are part of the panicked kernel's memory.
>> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> , for details)
>> >
>> > Right.
>> >
>> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> with the crashkernel memory range:
>> >>
>> >>                 /* add linux,usable-memory-range */
>> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>                                 address_cells, size_cells);
>> >>
>> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> , for details)
>> >>
>> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> they are marked as System RAM or as RESERVED. As,
>> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>
>> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> ACPI memory and crashes while trying to access the same:
>> >>
>> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> -r`.img --reuse-cmdline -d
>> >>
>> >> [snip..]
>> >>
>> >> Reserved memory range
>> >> 000000000e800000-000000002e7fffff (0)
>> >>
>> >> Coredump memory ranges
>> >> 0000000000000000-000000000e7fffff (0)
>> >> 000000002e800000-000000003961ffff (0)
>> >> 0000000039d40000-000000003ed2ffff (0)
>> >> 000000003ed60000-000000003fbfffff (0)
>> >> 0000001040000000-0000001ffbffffff (0)
>> >> 0000002000000000-0000002ffbffffff (0)
>> >> 0000009000000000-0000009ffbffffff (0)
>> >> 000000a000000000-000000affbffffff (0)
>> >>
>> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> memory cap'ing passed to the crash kernel inside
>> >> 'arch/arm64/mm/init.c' (see below):
>> >>
>> >> static void __init fdt_enforce_memory_region(void)
>> >> {
>> >>         struct memblock_region reg = {
>> >>                 .size = 0,
>> >>         };
>> >>
>> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>
>> >>         if (reg.size)
>> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> comment this out */
>> >> }
>> >
>> > Please just don't do that. It can cause a fatal damage on
>> > memory contents of the *crashed* kernel.
>> >
>> >> 5). Both the above temporary solutions fix the problem.
>> >>
>> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> fail.
>> >>
>> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> dt node 'linux,usable-memory-range'
>> >
>> > I still don't understand why we need to carry over the information
>> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > such regions are free to be reused by the kernel after some point of
>> > initialization. Why does crash dump kernel need to know about them?
>> >
>>
>> Not really. According to the UEFI spec, they can be reclaimed after
>> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> no longer needs them. Of course, in order to be able to boot a kexec
>> kernel, those regions needs to be preserved, which is why they are
>> memblock_reserve()'d now.
>
> For my better understandings, who is actually accessing such regions
> during boot time, uefi itself or efistub?
>

No, only the kernel. This is where the ACPI tables are stored. For
instance, on QEMU we have

 ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
 ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
  01000013)
 ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
BXPC 00000001)
 ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
BXPC 00000001)
 ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
BXPC 00000001)
 ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
BXPC 00000001)
 ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
BXPC 00000001)
 ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
BXPC 00000001)
 ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
BXPC 00000001)

covered by

 efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
 ...
 efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]


>> So it seems that kexec does not honour the memblock_reserve() table
>> when booting the next kernel.
>
> not really.
>
>> > (In other words, can or should we skip some part of ACPI-related init code
>> > on crash dump kernel?)
>> >
>>
>> I don't think so. And the change to the handling of ACPI reclaim
>> regions only revealed the bug, not created it (given that other
>> memblock_reserve regions may be affected as well)
>
> As whether we should honor such reserved regions over kexec'ing
> depends on each one's specific nature, we will have to take care one-by-one.
> As a matter of fact, no information about "reserved" memblocks is
> exposed to user space (via proc/iomem).
>

That is why I suggested (somewhere in this thread?) to not expose them
as 'System RAM'. Do you think that could solve this?

>
>>
>> >> 6b). The kernel code currently looks like the following:
>> >>
>> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> >> index 30ad2f085d1f..867bdec7c692 100644
>> >> --- a/arch/arm64/kernel/setup.c
>> >> +++ b/arch/arm64/kernel/setup.c
>> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>> >>  {
>> >>      struct memblock_region *region;
>> >>      struct resource *res;
>> >> +    phys_addr_t addr_start, addr_end;
>> >>
>> >>      kernel_code.start   = __pa_symbol(_text);
>> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
>> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>> >>              res->name  = "reserved";
>> >>              res->flags = IORESOURCE_MEM;
>> >>          } else {
>> >> -            res->name  = "System RAM";
>> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> >> +            addr_start =
>> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
>> >> +            addr_end =
>> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
>> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
>> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
>> >> +                res->name  = "ACPI reclaim region";
>> >> +                res->flags = IORESOURCE_MEM;
>> >> +            } else {
>> >> +                res->name  = "System RAM";
>> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> >> +            }
>> >>          }
>> >> +
>> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
>> >>
>> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
>> >>
>> >>      request_standard_resources();
>> >>
>> >> +    efi_memmap_unmap();
>> >>      early_ioremap_reset();
>> >>
>> >>      if (acpi_disabled)
>> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>> >> index 80d1a885def5..a7c522eac640 100644
>> >> --- a/drivers/firmware/efi/arm-init.c
>> >> +++ b/drivers/firmware/efi/arm-init.c
>> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
>> >>
>> >>      reserve_regions();
>> >>      efi_esrt_init();
>> >> -    efi_memmap_unmap();
>> >>
>> >>      memblock_reserve(params.mmap & PAGE_MASK,
>> >>               PAGE_ALIGN(params.mmap_size +
>> >>
>> >>
>> >> After this change the ACPI reclaim regions are properly recognized in
>> >> '/proc/iomem':
>> >>
>> >> # cat /proc/iomem | grep -i ACPI
>> >> 396c0000-3975ffff : ACPI reclaim region
>> >> 39770000-397affff : ACPI reclaim region
>> >> 398a0000-398bffff : ACPI reclaim region
>> >>
>> >> 6c). I am currently changing the 'kexec-tools' and will finish the
>> >> testing over the next few days.
>> >>
>> >> I just wanted to know your opinion on this issue, so that I will be
>> >> able to propose a fix on the above lines.
>> >>
>> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
>> >> kexec-tools.
>> >>
>> >> Thanks,
>> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-13 12:17                                                 ` Ard Biesheuvel
  0 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-13 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 December 2017 at 12:16, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> On 13 December 2017 at 10:26, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > Bhupesh, Ard,
>> >
>> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> Hi Ard, Akashi
>> >>
>> > (snip)
>> >
>> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> identify its own usable memory and exclude, at its boot time, any
>> >> other memory areas that are part of the panicked kernel's memory.
>> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> , for details)
>> >
>> > Right.
>> >
>> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> with the crashkernel memory range:
>> >>
>> >>                 /* add linux,usable-memory-range */
>> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>                                 address_cells, size_cells);
>> >>
>> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> , for details)
>> >>
>> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> they are marked as System RAM or as RESERVED. As,
>> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>
>> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> ACPI memory and crashes while trying to access the same:
>> >>
>> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> -r`.img --reuse-cmdline -d
>> >>
>> >> [snip..]
>> >>
>> >> Reserved memory range
>> >> 000000000e800000-000000002e7fffff (0)
>> >>
>> >> Coredump memory ranges
>> >> 0000000000000000-000000000e7fffff (0)
>> >> 000000002e800000-000000003961ffff (0)
>> >> 0000000039d40000-000000003ed2ffff (0)
>> >> 000000003ed60000-000000003fbfffff (0)
>> >> 0000001040000000-0000001ffbffffff (0)
>> >> 0000002000000000-0000002ffbffffff (0)
>> >> 0000009000000000-0000009ffbffffff (0)
>> >> 000000a000000000-000000affbffffff (0)
>> >>
>> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> memory cap'ing passed to the crash kernel inside
>> >> 'arch/arm64/mm/init.c' (see below):
>> >>
>> >> static void __init fdt_enforce_memory_region(void)
>> >> {
>> >>         struct memblock_region reg = {
>> >>                 .size = 0,
>> >>         };
>> >>
>> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>
>> >>         if (reg.size)
>> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> comment this out */
>> >> }
>> >
>> > Please just don't do that. It can cause a fatal damage on
>> > memory contents of the *crashed* kernel.
>> >
>> >> 5). Both the above temporary solutions fix the problem.
>> >>
>> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> fail.
>> >>
>> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> dt node 'linux,usable-memory-range'
>> >
>> > I still don't understand why we need to carry over the information
>> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > such regions are free to be reused by the kernel after some point of
>> > initialization. Why does crash dump kernel need to know about them?
>> >
>>
>> Not really. According to the UEFI spec, they can be reclaimed after
>> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> no longer needs them. Of course, in order to be able to boot a kexec
>> kernel, those regions needs to be preserved, which is why they are
>> memblock_reserve()'d now.
>
> For my better understandings, who is actually accessing such regions
> during boot time, uefi itself or efistub?
>

No, only the kernel. This is where the ACPI tables are stored. For
instance, on QEMU we have

 ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
 ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
  01000013)
 ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
BXPC 00000001)
 ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
BXPC 00000001)
 ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
BXPC 00000001)
 ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
BXPC 00000001)
 ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
BXPC 00000001)
 ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
BXPC 00000001)
 ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
BXPC 00000001)

covered by

 efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
 ...
 efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]


>> So it seems that kexec does not honour the memblock_reserve() table
>> when booting the next kernel.
>
> not really.
>
>> > (In other words, can or should we skip some part of ACPI-related init code
>> > on crash dump kernel?)
>> >
>>
>> I don't think so. And the change to the handling of ACPI reclaim
>> regions only revealed the bug, not created it (given that other
>> memblock_reserve regions may be affected as well)
>
> As whether we should honor such reserved regions over kexec'ing
> depends on each one's specific nature, we will have to take care one-by-one.
> As a matter of fact, no information about "reserved" memblocks is
> exposed to user space (via proc/iomem).
>

That is why I suggested (somewhere in this thread?) to not expose them
as 'System RAM'. Do you think that could solve this?

>
>>
>> >> 6b). The kernel code currently looks like the following:
>> >>
>> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> >> index 30ad2f085d1f..867bdec7c692 100644
>> >> --- a/arch/arm64/kernel/setup.c
>> >> +++ b/arch/arm64/kernel/setup.c
>> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>> >>  {
>> >>      struct memblock_region *region;
>> >>      struct resource *res;
>> >> +    phys_addr_t addr_start, addr_end;
>> >>
>> >>      kernel_code.start   = __pa_symbol(_text);
>> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
>> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>> >>              res->name  = "reserved";
>> >>              res->flags = IORESOURCE_MEM;
>> >>          } else {
>> >> -            res->name  = "System RAM";
>> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> >> +            addr_start =
>> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
>> >> +            addr_end =
>> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
>> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
>> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
>> >> +                res->name  = "ACPI reclaim region";
>> >> +                res->flags = IORESOURCE_MEM;
>> >> +            } else {
>> >> +                res->name  = "System RAM";
>> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> >> +            }
>> >>          }
>> >> +
>> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
>> >>
>> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
>> >>
>> >>      request_standard_resources();
>> >>
>> >> +    efi_memmap_unmap();
>> >>      early_ioremap_reset();
>> >>
>> >>      if (acpi_disabled)
>> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>> >> index 80d1a885def5..a7c522eac640 100644
>> >> --- a/drivers/firmware/efi/arm-init.c
>> >> +++ b/drivers/firmware/efi/arm-init.c
>> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
>> >>
>> >>      reserve_regions();
>> >>      efi_esrt_init();
>> >> -    efi_memmap_unmap();
>> >>
>> >>      memblock_reserve(params.mmap & PAGE_MASK,
>> >>               PAGE_ALIGN(params.mmap_size +
>> >>
>> >>
>> >> After this change the ACPI reclaim regions are properly recognized in
>> >> '/proc/iomem':
>> >>
>> >> # cat /proc/iomem | grep -i ACPI
>> >> 396c0000-3975ffff : ACPI reclaim region
>> >> 39770000-397affff : ACPI reclaim region
>> >> 398a0000-398bffff : ACPI reclaim region
>> >>
>> >> 6c). I am currently changing the 'kexec-tools' and will finish the
>> >> testing over the next few days.
>> >>
>> >> I just wanted to know your opinion on this issue, so that I will be
>> >> able to propose a fix on the above lines.
>> >>
>> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
>> >> kexec-tools.
>> >>
>> >> Thanks,
>> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-13 12:17                                                 ` Ard Biesheuvel
@ 2017-12-13 19:22                                                     ` Bhupesh SHARMA
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-12-13 19:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: AKASHI Takahiro, Bhupesh Sharma, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

Hi Ard, Akashi,

On Wed, Dec 13, 2017 at 5:47 PM, Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 13 December 2017 at 12:16, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>>> On 13 December 2017 at 10:26, AKASHI Takahiro
>>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> > Bhupesh, Ard,
>>> >
>>> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>>> >> Hi Ard, Akashi
>>> >>
>>> > (snip)
>>> >
>>> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>>> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>>> >> identify its own usable memory and exclude, at its boot time, any
>>> >> other memory areas that are part of the panicked kernel's memory.
>>> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>>> >> , for details)
>>> >
>>> > Right.
>>> >
>>> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>>> >> with the crashkernel memory range:
>>> >>
>>> >>                 /* add linux,usable-memory-range */
>>> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>> >>                                 address_cells, size_cells);
>>> >>
>>> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>>> >> , for details)
>>> >>
>>> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>>> >> they are marked as System RAM or as RESERVED. As,
>>> >> 'linux,usable-memory-range' dt node is patched up only with
>>> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>>> >>
>>> >> 3). As a result when the crashkernel boots up it doesn't find this
>>> >> ACPI memory and crashes while trying to access the same:
>>> >>
>>> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>>> >> -r`.img --reuse-cmdline -d
>>> >>
>>> >> [snip..]
>>> >>
>>> >> Reserved memory range
>>> >> 000000000e800000-000000002e7fffff (0)
>>> >>
>>> >> Coredump memory ranges
>>> >> 0000000000000000-000000000e7fffff (0)
>>> >> 000000002e800000-000000003961ffff (0)
>>> >> 0000000039d40000-000000003ed2ffff (0)
>>> >> 000000003ed60000-000000003fbfffff (0)
>>> >> 0000001040000000-0000001ffbffffff (0)
>>> >> 0000002000000000-0000002ffbffffff (0)
>>> >> 0000009000000000-0000009ffbffffff (0)
>>> >> 000000a000000000-000000affbffffff (0)
>>> >>
>>> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>>> >> memory cap'ing passed to the crash kernel inside
>>> >> 'arch/arm64/mm/init.c' (see below):
>>> >>
>>> >> static void __init fdt_enforce_memory_region(void)
>>> >> {
>>> >>         struct memblock_region reg = {
>>> >>                 .size = 0,
>>> >>         };
>>> >>
>>> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>> >>
>>> >>         if (reg.size)
>>> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>>> >> comment this out */
>>> >> }
>>> >
>>> > Please just don't do that. It can cause a fatal damage on
>>> > memory contents of the *crashed* kernel.
>>> >
>>> >> 5). Both the above temporary solutions fix the problem.
>>> >>
>>> >> 6). However exposing all System RAM regions to the crashkernel is not
>>> >> advisable and may cause the crashkernel or some crashkernel drivers to
>>> >> fail.
>>> >>
>>> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>>> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>>> >> kernel code and on the other hand the user-space 'kexec-tools' will
>>> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>>> >> dt node 'linux,usable-memory-range'
>>> >
>>> > I still don't understand why we need to carry over the information
>>> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>>> > such regions are free to be reused by the kernel after some point of
>>> > initialization. Why does crash dump kernel need to know about them?
>>> >
>>>
>>> Not really. According to the UEFI spec, they can be reclaimed after
>>> the OS has initialized, i.e., when it has consumed the ACPI tables and
>>> no longer needs them. Of course, in order to be able to boot a kexec
>>> kernel, those regions needs to be preserved, which is why they are
>>> memblock_reserve()'d now.
>>
>> For my better understandings, who is actually accessing such regions
>> during boot time, uefi itself or efistub?
>>
>
> No, only the kernel. This is where the ACPI tables are stored. For
> instance, on QEMU we have
>
>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>   01000013)
>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> BXPC 00000001)
>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> BXPC 00000001)
>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> BXPC 00000001)
>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> BXPC 00000001)
>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> BXPC 00000001)
>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> BXPC 00000001)
>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> BXPC 00000001)
>
> covered by
>
>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>  ...
>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>
>
>>> So it seems that kexec does not honour the memblock_reserve() table
>>> when booting the next kernel.
>>
>> not really.
>>
>>> > (In other words, can or should we skip some part of ACPI-related init code
>>> > on crash dump kernel?)
>>> >
>>>
>>> I don't think so. And the change to the handling of ACPI reclaim
>>> regions only revealed the bug, not created it (given that other
>>> memblock_reserve regions may be affected as well)
>>
>> As whether we should honor such reserved regions over kexec'ing
>> depends on each one's specific nature, we will have to take care one-by-one.
>> As a matter of fact, no information about "reserved" memblocks is
>> exposed to user space (via proc/iomem).
>>
>
> That is why I suggested (somewhere in this thread?) to not expose them
> as 'System RAM'. Do you think that could solve this?

I agree. So how about my proposal (please see my last reply) - to
expose these regions as "ACPI reclaim regions" in /proc/iomem.

Please note that we already have several instances where the driver
regions are already explicitly labelled by different concise names
across /proc/iomem, for e.g.:

# cat /proc/iomem | grep -i serial

  1c021000-1c02101f : serial

If we expose only the ACPI reclaim regions to the crashkernel (along
with the normal crash kernel memory range), we avoid exposing all
System RAM or reserved regions to the crashkernel which may cause
issues with crashkernel boot or crash coredump save operations.

And we can also accordingly modify the 'kexec-tools' to pick these
regions along with the normal crash kernel memory range and append
them to the 'linux,usable-memory-range' dt node, so that the crash
kernel can operate on them.

If you think this ok, I can try to send a RFC patch later this week.

Please let me know.

Regards,
Bhupesh


>>>
>>> >> 6b). The kernel code currently looks like the following:
>>> >>
>>> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> >> index 30ad2f085d1f..867bdec7c692 100644
>>> >> --- a/arch/arm64/kernel/setup.c
>>> >> +++ b/arch/arm64/kernel/setup.c
>>> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>>> >>  {
>>> >>      struct memblock_region *region;
>>> >>      struct resource *res;
>>> >> +    phys_addr_t addr_start, addr_end;
>>> >>
>>> >>      kernel_code.start   = __pa_symbol(_text);
>>> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
>>> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>>> >>              res->name  = "reserved";
>>> >>              res->flags = IORESOURCE_MEM;
>>> >>          } else {
>>> >> -            res->name  = "System RAM";
>>> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>> >> +            addr_start =
>>> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
>>> >> +            addr_end =
>>> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
>>> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
>>> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
>>> >> +                res->name  = "ACPI reclaim region";
>>> >> +                res->flags = IORESOURCE_MEM;
>>> >> +            } else {
>>> >> +                res->name  = "System RAM";
>>> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>> >> +            }
>>> >>          }
>>> >> +
>>> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>>> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
>>> >>
>>> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
>>> >>
>>> >>      request_standard_resources();
>>> >>
>>> >> +    efi_memmap_unmap();
>>> >>      early_ioremap_reset();
>>> >>
>>> >>      if (acpi_disabled)
>>> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>>> >> index 80d1a885def5..a7c522eac640 100644
>>> >> --- a/drivers/firmware/efi/arm-init.c
>>> >> +++ b/drivers/firmware/efi/arm-init.c
>>> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
>>> >>
>>> >>      reserve_regions();
>>> >>      efi_esrt_init();
>>> >> -    efi_memmap_unmap();
>>> >>
>>> >>      memblock_reserve(params.mmap & PAGE_MASK,
>>> >>               PAGE_ALIGN(params.mmap_size +
>>> >>
>>> >>
>>> >> After this change the ACPI reclaim regions are properly recognized in
>>> >> '/proc/iomem':
>>> >>
>>> >> # cat /proc/iomem | grep -i ACPI
>>> >> 396c0000-3975ffff : ACPI reclaim region
>>> >> 39770000-397affff : ACPI reclaim region
>>> >> 398a0000-398bffff : ACPI reclaim region
>>> >>
>>> >> 6c). I am currently changing the 'kexec-tools' and will finish the
>>> >> testing over the next few days.
>>> >>
>>> >> I just wanted to know your opinion on this issue, so that I will be
>>> >> able to propose a fix on the above lines.
>>> >>
>>> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
>>> >> kexec-tools.
>>> >>
>>> >> Thanks,
>>> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-13 19:22                                                     ` Bhupesh SHARMA
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-12-13 19:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard, Akashi,

On Wed, Dec 13, 2017 at 5:47 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 13 December 2017 at 12:16, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>>> On 13 December 2017 at 10:26, AKASHI Takahiro
>>> <takahiro.akashi@linaro.org> wrote:
>>> > Bhupesh, Ard,
>>> >
>>> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>>> >> Hi Ard, Akashi
>>> >>
>>> > (snip)
>>> >
>>> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>>> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>>> >> identify its own usable memory and exclude, at its boot time, any
>>> >> other memory areas that are part of the panicked kernel's memory.
>>> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>>> >> , for details)
>>> >
>>> > Right.
>>> >
>>> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>>> >> with the crashkernel memory range:
>>> >>
>>> >>                 /* add linux,usable-memory-range */
>>> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>> >>                                 address_cells, size_cells);
>>> >>
>>> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>>> >> , for details)
>>> >>
>>> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>>> >> they are marked as System RAM or as RESERVED. As,
>>> >> 'linux,usable-memory-range' dt node is patched up only with
>>> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>>> >>
>>> >> 3). As a result when the crashkernel boots up it doesn't find this
>>> >> ACPI memory and crashes while trying to access the same:
>>> >>
>>> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>>> >> -r`.img --reuse-cmdline -d
>>> >>
>>> >> [snip..]
>>> >>
>>> >> Reserved memory range
>>> >> 000000000e800000-000000002e7fffff (0)
>>> >>
>>> >> Coredump memory ranges
>>> >> 0000000000000000-000000000e7fffff (0)
>>> >> 000000002e800000-000000003961ffff (0)
>>> >> 0000000039d40000-000000003ed2ffff (0)
>>> >> 000000003ed60000-000000003fbfffff (0)
>>> >> 0000001040000000-0000001ffbffffff (0)
>>> >> 0000002000000000-0000002ffbffffff (0)
>>> >> 0000009000000000-0000009ffbffffff (0)
>>> >> 000000a000000000-000000affbffffff (0)
>>> >>
>>> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>>> >> memory cap'ing passed to the crash kernel inside
>>> >> 'arch/arm64/mm/init.c' (see below):
>>> >>
>>> >> static void __init fdt_enforce_memory_region(void)
>>> >> {
>>> >>         struct memblock_region reg = {
>>> >>                 .size = 0,
>>> >>         };
>>> >>
>>> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>> >>
>>> >>         if (reg.size)
>>> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>>> >> comment this out */
>>> >> }
>>> >
>>> > Please just don't do that. It can cause a fatal damage on
>>> > memory contents of the *crashed* kernel.
>>> >
>>> >> 5). Both the above temporary solutions fix the problem.
>>> >>
>>> >> 6). However exposing all System RAM regions to the crashkernel is not
>>> >> advisable and may cause the crashkernel or some crashkernel drivers to
>>> >> fail.
>>> >>
>>> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>>> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>>> >> kernel code and on the other hand the user-space 'kexec-tools' will
>>> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>>> >> dt node 'linux,usable-memory-range'
>>> >
>>> > I still don't understand why we need to carry over the information
>>> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>>> > such regions are free to be reused by the kernel after some point of
>>> > initialization. Why does crash dump kernel need to know about them?
>>> >
>>>
>>> Not really. According to the UEFI spec, they can be reclaimed after
>>> the OS has initialized, i.e., when it has consumed the ACPI tables and
>>> no longer needs them. Of course, in order to be able to boot a kexec
>>> kernel, those regions needs to be preserved, which is why they are
>>> memblock_reserve()'d now.
>>
>> For my better understandings, who is actually accessing such regions
>> during boot time, uefi itself or efistub?
>>
>
> No, only the kernel. This is where the ACPI tables are stored. For
> instance, on QEMU we have
>
>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>   01000013)
>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> BXPC 00000001)
>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> BXPC 00000001)
>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> BXPC 00000001)
>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> BXPC 00000001)
>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> BXPC 00000001)
>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> BXPC 00000001)
>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> BXPC 00000001)
>
> covered by
>
>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>  ...
>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>
>
>>> So it seems that kexec does not honour the memblock_reserve() table
>>> when booting the next kernel.
>>
>> not really.
>>
>>> > (In other words, can or should we skip some part of ACPI-related init code
>>> > on crash dump kernel?)
>>> >
>>>
>>> I don't think so. And the change to the handling of ACPI reclaim
>>> regions only revealed the bug, not created it (given that other
>>> memblock_reserve regions may be affected as well)
>>
>> As whether we should honor such reserved regions over kexec'ing
>> depends on each one's specific nature, we will have to take care one-by-one.
>> As a matter of fact, no information about "reserved" memblocks is
>> exposed to user space (via proc/iomem).
>>
>
> That is why I suggested (somewhere in this thread?) to not expose them
> as 'System RAM'. Do you think that could solve this?

I agree. So how about my proposal (please see my last reply) - to
expose these regions as "ACPI reclaim regions" in /proc/iomem.

Please note that we already have several instances where the driver
regions are already explicitly labelled by different concise names
across /proc/iomem, for e.g.:

# cat /proc/iomem | grep -i serial

  1c021000-1c02101f : serial

If we expose only the ACPI reclaim regions to the crashkernel (along
with the normal crash kernel memory range), we avoid exposing all
System RAM or reserved regions to the crashkernel which may cause
issues with crashkernel boot or crash coredump save operations.

And we can also accordingly modify the 'kexec-tools' to pick these
regions along with the normal crash kernel memory range and append
them to the 'linux,usable-memory-range' dt node, so that the crash
kernel can operate on them.

If you think this ok, I can try to send a RFC patch later this week.

Please let me know.

Regards,
Bhupesh


>>>
>>> >> 6b). The kernel code currently looks like the following:
>>> >>
>>> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> >> index 30ad2f085d1f..867bdec7c692 100644
>>> >> --- a/arch/arm64/kernel/setup.c
>>> >> +++ b/arch/arm64/kernel/setup.c
>>> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
>>> >>  {
>>> >>      struct memblock_region *region;
>>> >>      struct resource *res;
>>> >> +    phys_addr_t addr_start, addr_end;
>>> >>
>>> >>      kernel_code.start   = __pa_symbol(_text);
>>> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
>>> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
>>> >>              res->name  = "reserved";
>>> >>              res->flags = IORESOURCE_MEM;
>>> >>          } else {
>>> >> -            res->name  = "System RAM";
>>> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>> >> +            addr_start =
>>> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
>>> >> +            addr_end =
>>> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
>>> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
>>> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
>>> >> +                res->name  = "ACPI reclaim region";
>>> >> +                res->flags = IORESOURCE_MEM;
>>> >> +            } else {
>>> >> +                res->name  = "System RAM";
>>> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>> >> +            }
>>> >>          }
>>> >> +
>>> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
>>> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
>>> >>
>>> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
>>> >>
>>> >>      request_standard_resources();
>>> >>
>>> >> +    efi_memmap_unmap();
>>> >>      early_ioremap_reset();
>>> >>
>>> >>      if (acpi_disabled)
>>> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
>>> >> index 80d1a885def5..a7c522eac640 100644
>>> >> --- a/drivers/firmware/efi/arm-init.c
>>> >> +++ b/drivers/firmware/efi/arm-init.c
>>> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
>>> >>
>>> >>      reserve_regions();
>>> >>      efi_esrt_init();
>>> >> -    efi_memmap_unmap();
>>> >>
>>> >>      memblock_reserve(params.mmap & PAGE_MASK,
>>> >>               PAGE_ALIGN(params.mmap_size +
>>> >>
>>> >>
>>> >> After this change the ACPI reclaim regions are properly recognized in
>>> >> '/proc/iomem':
>>> >>
>>> >> # cat /proc/iomem | grep -i ACPI
>>> >> 396c0000-3975ffff : ACPI reclaim region
>>> >> 39770000-397affff : ACPI reclaim region
>>> >> 398a0000-398bffff : ACPI reclaim region
>>> >>
>>> >> 6c). I am currently changing the 'kexec-tools' and will finish the
>>> >> testing over the next few days.
>>> >>
>>> >> I just wanted to know your opinion on this issue, so that I will be
>>> >> able to propose a fix on the above lines.
>>> >>
>>> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
>>> >> kexec-tools.
>>> >>
>>> >> Thanks,
>>> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-13 12:17                                                 ` Ard Biesheuvel
@ 2017-12-15  8:59                                                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-15  8:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> On 13 December 2017 at 12:16, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > Bhupesh, Ard,
> >> >
> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> Hi Ard, Akashi
> >> >>
> >> > (snip)
> >> >
> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> , for details)
> >> >
> >> > Right.
> >> >
> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> with the crashkernel memory range:
> >> >>
> >> >>                 /* add linux,usable-memory-range */
> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>                                 address_cells, size_cells);
> >> >>
> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> , for details)
> >> >>
> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> they are marked as System RAM or as RESERVED. As,
> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>
> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> ACPI memory and crashes while trying to access the same:
> >> >>
> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> -r`.img --reuse-cmdline -d
> >> >>
> >> >> [snip..]
> >> >>
> >> >> Reserved memory range
> >> >> 000000000e800000-000000002e7fffff (0)
> >> >>
> >> >> Coredump memory ranges
> >> >> 0000000000000000-000000000e7fffff (0)
> >> >> 000000002e800000-000000003961ffff (0)
> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> 000000a000000000-000000affbffffff (0)
> >> >>
> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> memory cap'ing passed to the crash kernel inside
> >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>
> >> >> static void __init fdt_enforce_memory_region(void)
> >> >> {
> >> >>         struct memblock_region reg = {
> >> >>                 .size = 0,
> >> >>         };
> >> >>
> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>
> >> >>         if (reg.size)
> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> comment this out */
> >> >> }
> >> >
> >> > Please just don't do that. It can cause a fatal damage on
> >> > memory contents of the *crashed* kernel.
> >> >
> >> >> 5). Both the above temporary solutions fix the problem.
> >> >>
> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> fail.
> >> >>
> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> dt node 'linux,usable-memory-range'
> >> >
> >> > I still don't understand why we need to carry over the information
> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> > such regions are free to be reused by the kernel after some point of
> >> > initialization. Why does crash dump kernel need to know about them?
> >> >
> >>
> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> kernel, those regions needs to be preserved, which is why they are
> >> memblock_reserve()'d now.
> >
> > For my better understandings, who is actually accessing such regions
> > during boot time, uefi itself or efistub?
> >
> 
> No, only the kernel. This is where the ACPI tables are stored. For
> instance, on QEMU we have
> 
>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>   01000013)
>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> BXPC 00000001)
>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> BXPC 00000001)
>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> BXPC 00000001)
>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> BXPC 00000001)
>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> BXPC 00000001)
>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> BXPC 00000001)
>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> BXPC 00000001)
> 
> covered by
> 
>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>  ...
>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]

OK. I mistakenly understood those regions could be freed after exiting
UEFI boot services.

> 
> >> So it seems that kexec does not honour the memblock_reserve() table
> >> when booting the next kernel.
> >
> > not really.
> >
> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> > on crash dump kernel?)
> >> >
> >>
> >> I don't think so. And the change to the handling of ACPI reclaim
> >> regions only revealed the bug, not created it (given that other
> >> memblock_reserve regions may be affected as well)
> >
> > As whether we should honor such reserved regions over kexec'ing
> > depends on each one's specific nature, we will have to take care one-by-one.
> > As a matter of fact, no information about "reserved" memblocks is
> > exposed to user space (via proc/iomem).
> >
> 
> That is why I suggested (somewhere in this thread?) to not expose them
> as 'System RAM'. Do you think that could solve this?

Memblock-reserv'ing them is necessary to prevent their corruption and
marking them under another name in /proc/iomem would also be good in order
not to allocate them as part of crash kernel's memory.

But I'm not still convinced that we should export them in useable-
memory-range to crash dump kernel. They will be accessed through
acpi_os_map_memory() and so won't be required to be part of system ram
(or memblocks), I guess.
	-> Bhupesh?

Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
via a kernel command line parameter, "memmap=".

Thanks,
-Takahiro AKASHI


> >
> >>
> >> >> 6b). The kernel code currently looks like the following:
> >> >>
> >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >> >> index 30ad2f085d1f..867bdec7c692 100644
> >> >> --- a/arch/arm64/kernel/setup.c
> >> >> +++ b/arch/arm64/kernel/setup.c
> >> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
> >> >>  {
> >> >>      struct memblock_region *region;
> >> >>      struct resource *res;
> >> >> +    phys_addr_t addr_start, addr_end;
> >> >>
> >> >>      kernel_code.start   = __pa_symbol(_text);
> >> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
> >> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
> >> >>              res->name  = "reserved";
> >> >>              res->flags = IORESOURCE_MEM;
> >> >>          } else {
> >> >> -            res->name  = "System RAM";
> >> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> >> +            addr_start =
> >> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> >> >> +            addr_end =
> >> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> >> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
> >> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
> >> >> +                res->name  = "ACPI reclaim region";
> >> >> +                res->flags = IORESOURCE_MEM;
> >> >> +            } else {
> >> >> +                res->name  = "System RAM";
> >> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> >> +            }
> >> >>          }
> >> >> +
> >> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> >> >>
> >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
> >> >>
> >> >>      request_standard_resources();
> >> >>
> >> >> +    efi_memmap_unmap();
> >> >>      early_ioremap_reset();
> >> >>
> >> >>      if (acpi_disabled)
> >> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> >> >> index 80d1a885def5..a7c522eac640 100644
> >> >> --- a/drivers/firmware/efi/arm-init.c
> >> >> +++ b/drivers/firmware/efi/arm-init.c
> >> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
> >> >>
> >> >>      reserve_regions();
> >> >>      efi_esrt_init();
> >> >> -    efi_memmap_unmap();
> >> >>
> >> >>      memblock_reserve(params.mmap & PAGE_MASK,
> >> >>               PAGE_ALIGN(params.mmap_size +
> >> >>
> >> >>
> >> >> After this change the ACPI reclaim regions are properly recognized in
> >> >> '/proc/iomem':
> >> >>
> >> >> # cat /proc/iomem | grep -i ACPI
> >> >> 396c0000-3975ffff : ACPI reclaim region
> >> >> 39770000-397affff : ACPI reclaim region
> >> >> 398a0000-398bffff : ACPI reclaim region
> >> >>
> >> >> 6c). I am currently changing the 'kexec-tools' and will finish the
> >> >> testing over the next few days.
> >> >>
> >> >> I just wanted to know your opinion on this issue, so that I will be
> >> >> able to propose a fix on the above lines.
> >> >>
> >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
> >> >> kexec-tools.
> >> >>
> >> >> Thanks,
> >> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-15  8:59                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-15  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> On 13 December 2017 at 12:16, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> <takahiro.akashi@linaro.org> wrote:
> >> > Bhupesh, Ard,
> >> >
> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> Hi Ard, Akashi
> >> >>
> >> > (snip)
> >> >
> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> , for details)
> >> >
> >> > Right.
> >> >
> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> with the crashkernel memory range:
> >> >>
> >> >>                 /* add linux,usable-memory-range */
> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>                                 address_cells, size_cells);
> >> >>
> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> , for details)
> >> >>
> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> they are marked as System RAM or as RESERVED. As,
> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>
> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> ACPI memory and crashes while trying to access the same:
> >> >>
> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> -r`.img --reuse-cmdline -d
> >> >>
> >> >> [snip..]
> >> >>
> >> >> Reserved memory range
> >> >> 000000000e800000-000000002e7fffff (0)
> >> >>
> >> >> Coredump memory ranges
> >> >> 0000000000000000-000000000e7fffff (0)
> >> >> 000000002e800000-000000003961ffff (0)
> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> 000000a000000000-000000affbffffff (0)
> >> >>
> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> memory cap'ing passed to the crash kernel inside
> >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>
> >> >> static void __init fdt_enforce_memory_region(void)
> >> >> {
> >> >>         struct memblock_region reg = {
> >> >>                 .size = 0,
> >> >>         };
> >> >>
> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>
> >> >>         if (reg.size)
> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> comment this out */
> >> >> }
> >> >
> >> > Please just don't do that. It can cause a fatal damage on
> >> > memory contents of the *crashed* kernel.
> >> >
> >> >> 5). Both the above temporary solutions fix the problem.
> >> >>
> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> fail.
> >> >>
> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> dt node 'linux,usable-memory-range'
> >> >
> >> > I still don't understand why we need to carry over the information
> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> > such regions are free to be reused by the kernel after some point of
> >> > initialization. Why does crash dump kernel need to know about them?
> >> >
> >>
> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> kernel, those regions needs to be preserved, which is why they are
> >> memblock_reserve()'d now.
> >
> > For my better understandings, who is actually accessing such regions
> > during boot time, uefi itself or efistub?
> >
> 
> No, only the kernel. This is where the ACPI tables are stored. For
> instance, on QEMU we have
> 
>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>   01000013)
>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> BXPC 00000001)
>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> BXPC 00000001)
>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> BXPC 00000001)
>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> BXPC 00000001)
>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> BXPC 00000001)
>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> BXPC 00000001)
>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> BXPC 00000001)
> 
> covered by
> 
>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>  ...
>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]

OK. I mistakenly understood those regions could be freed after exiting
UEFI boot services.

> 
> >> So it seems that kexec does not honour the memblock_reserve() table
> >> when booting the next kernel.
> >
> > not really.
> >
> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> > on crash dump kernel?)
> >> >
> >>
> >> I don't think so. And the change to the handling of ACPI reclaim
> >> regions only revealed the bug, not created it (given that other
> >> memblock_reserve regions may be affected as well)
> >
> > As whether we should honor such reserved regions over kexec'ing
> > depends on each one's specific nature, we will have to take care one-by-one.
> > As a matter of fact, no information about "reserved" memblocks is
> > exposed to user space (via proc/iomem).
> >
> 
> That is why I suggested (somewhere in this thread?) to not expose them
> as 'System RAM'. Do you think that could solve this?

Memblock-reserv'ing them is necessary to prevent their corruption and
marking them under another name in /proc/iomem would also be good in order
not to allocate them as part of crash kernel's memory.

But I'm not still convinced that we should export them in useable-
memory-range to crash dump kernel. They will be accessed through
acpi_os_map_memory() and so won't be required to be part of system ram
(or memblocks), I guess.
	-> Bhupesh?

Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
via a kernel command line parameter, "memmap=".

Thanks,
-Takahiro AKASHI


> >
> >>
> >> >> 6b). The kernel code currently looks like the following:
> >> >>
> >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> >> >> index 30ad2f085d1f..867bdec7c692 100644
> >> >> --- a/arch/arm64/kernel/setup.c
> >> >> +++ b/arch/arm64/kernel/setup.c
> >> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void)
> >> >>  {
> >> >>      struct memblock_region *region;
> >> >>      struct resource *res;
> >> >> +    phys_addr_t addr_start, addr_end;
> >> >>
> >> >>      kernel_code.start   = __pa_symbol(_text);
> >> >>      kernel_code.end     = __pa_symbol(__init_begin - 1);
> >> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void)
> >> >>              res->name  = "reserved";
> >> >>              res->flags = IORESOURCE_MEM;
> >> >>          } else {
> >> >> -            res->name  = "System RAM";
> >> >> -            res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> >> +            addr_start =
> >> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> >> >> +            addr_end =
> >> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> >> >> +            if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY)
> >> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) {
> >> >> +                res->name  = "ACPI reclaim region";
> >> >> +                res->flags = IORESOURCE_MEM;
> >> >> +            } else {
> >> >> +                res->name  = "System RAM";
> >> >> +                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >> >> +            }
> >> >>          }
> >> >> +
> >> >>          res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> >> >>          res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> >> >>
> >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p)
> >> >>
> >> >>      request_standard_resources();
> >> >>
> >> >> +    efi_memmap_unmap();
> >> >>      early_ioremap_reset();
> >> >>
> >> >>      if (acpi_disabled)
> >> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
> >> >> index 80d1a885def5..a7c522eac640 100644
> >> >> --- a/drivers/firmware/efi/arm-init.c
> >> >> +++ b/drivers/firmware/efi/arm-init.c
> >> >> @@ -259,7 +259,6 @@ void __init efi_init(void)
> >> >>
> >> >>      reserve_regions();
> >> >>      efi_esrt_init();
> >> >> -    efi_memmap_unmap();
> >> >>
> >> >>      memblock_reserve(params.mmap & PAGE_MASK,
> >> >>               PAGE_ALIGN(params.mmap_size +
> >> >>
> >> >>
> >> >> After this change the ACPI reclaim regions are properly recognized in
> >> >> '/proc/iomem':
> >> >>
> >> >> # cat /proc/iomem | grep -i ACPI
> >> >> 396c0000-3975ffff : ACPI reclaim region
> >> >> 39770000-397affff : ACPI reclaim region
> >> >> 398a0000-398bffff : ACPI reclaim region
> >> >>
> >> >> 6c). I am currently changing the 'kexec-tools' and will finish the
> >> >> testing over the next few days.
> >> >>
> >> >> I just wanted to know your opinion on this issue, so that I will be
> >> >> able to propose a fix on the above lines.
> >> >>
> >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to
> >> >> kexec-tools.
> >> >>
> >> >> Thanks,
> >> >> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-15  8:59                                                     ` AKASHI Takahiro
@ 2017-12-15  9:35                                                       ` Ard Biesheuvel
  -1 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-15  9:35 UTC (permalink / raw)
  To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On 15 December 2017 at 09:59, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> > Bhupesh, Ard,
>> >> >
>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >> Hi Ard, Akashi
>> >> >>
>> >> > (snip)
>> >> >
>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >> , for details)
>> >> >
>> >> > Right.
>> >> >
>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >> with the crashkernel memory range:
>> >> >>
>> >> >>                 /* add linux,usable-memory-range */
>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >>                                 address_cells, size_cells);
>> >> >>
>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >> , for details)
>> >> >>
>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >> they are marked as System RAM or as RESERVED. As,
>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >>
>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >> ACPI memory and crashes while trying to access the same:
>> >> >>
>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >> -r`.img --reuse-cmdline -d
>> >> >>
>> >> >> [snip..]
>> >> >>
>> >> >> Reserved memory range
>> >> >> 000000000e800000-000000002e7fffff (0)
>> >> >>
>> >> >> Coredump memory ranges
>> >> >> 0000000000000000-000000000e7fffff (0)
>> >> >> 000000002e800000-000000003961ffff (0)
>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >> 000000a000000000-000000affbffffff (0)
>> >> >>
>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >> memory cap'ing passed to the crash kernel inside
>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >>
>> >> >> static void __init fdt_enforce_memory_region(void)
>> >> >> {
>> >> >>         struct memblock_region reg = {
>> >> >>                 .size = 0,
>> >> >>         };
>> >> >>
>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >>
>> >> >>         if (reg.size)
>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >> comment this out */
>> >> >> }
>> >> >
>> >> > Please just don't do that. It can cause a fatal damage on
>> >> > memory contents of the *crashed* kernel.
>> >> >
>> >> >> 5). Both the above temporary solutions fix the problem.
>> >> >>
>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >> fail.
>> >> >>
>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >> dt node 'linux,usable-memory-range'
>> >> >
>> >> > I still don't understand why we need to carry over the information
>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> > such regions are free to be reused by the kernel after some point of
>> >> > initialization. Why does crash dump kernel need to know about them?
>> >> >
>> >>
>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> kernel, those regions needs to be preserved, which is why they are
>> >> memblock_reserve()'d now.
>> >
>> > For my better understandings, who is actually accessing such regions
>> > during boot time, uefi itself or efistub?
>> >
>>
>> No, only the kernel. This is where the ACPI tables are stored. For
>> instance, on QEMU we have
>>
>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>>   01000013)
>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> BXPC 00000001)
>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> BXPC 00000001)
>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> BXPC 00000001)
>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> BXPC 00000001)
>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> BXPC 00000001)
>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> BXPC 00000001)
>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> BXPC 00000001)
>>
>> covered by
>>
>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>>  ...
>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>
> OK. I mistakenly understood those regions could be freed after exiting
> UEFI boot services.
>
>>
>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >> when booting the next kernel.
>> >
>> > not really.
>> >
>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> > on crash dump kernel?)
>> >> >
>> >>
>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >> regions only revealed the bug, not created it (given that other
>> >> memblock_reserve regions may be affected as well)
>> >
>> > As whether we should honor such reserved regions over kexec'ing
>> > depends on each one's specific nature, we will have to take care one-by-one.
>> > As a matter of fact, no information about "reserved" memblocks is
>> > exposed to user space (via proc/iomem).
>> >
>>
>> That is why I suggested (somewhere in this thread?) to not expose them
>> as 'System RAM'. Do you think that could solve this?
>
> Memblock-reserv'ing them is necessary to prevent their corruption and
> marking them under another name in /proc/iomem would also be good in order
> not to allocate them as part of crash kernel's memory.
>

I agree. However, this may not be entirely trivial, since iterating
over the memblock_reserved table and creating iomem entries may result
in collisions.

> But I'm not still convinced that we should export them in useable-
> memory-range to crash dump kernel. They will be accessed through
> acpi_os_map_memory() and so won't be required to be part of system ram
> (or memblocks), I guess.

Agreed. They will be covered by the linear mapping in the boot kernel,
and be mapped explicitly via ioremap_cache() in the kexec kernel,
which is exactly what we want in this case.

> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> via a kernel command line parameter, "memmap=".
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-15  9:35                                                       ` Ard Biesheuvel
  0 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-15  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 December 2017 at 09:59, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > Bhupesh, Ard,
>> >> >
>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >> Hi Ard, Akashi
>> >> >>
>> >> > (snip)
>> >> >
>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >> , for details)
>> >> >
>> >> > Right.
>> >> >
>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >> with the crashkernel memory range:
>> >> >>
>> >> >>                 /* add linux,usable-memory-range */
>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >>                                 address_cells, size_cells);
>> >> >>
>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >> , for details)
>> >> >>
>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >> they are marked as System RAM or as RESERVED. As,
>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >>
>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >> ACPI memory and crashes while trying to access the same:
>> >> >>
>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >> -r`.img --reuse-cmdline -d
>> >> >>
>> >> >> [snip..]
>> >> >>
>> >> >> Reserved memory range
>> >> >> 000000000e800000-000000002e7fffff (0)
>> >> >>
>> >> >> Coredump memory ranges
>> >> >> 0000000000000000-000000000e7fffff (0)
>> >> >> 000000002e800000-000000003961ffff (0)
>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >> 000000a000000000-000000affbffffff (0)
>> >> >>
>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >> memory cap'ing passed to the crash kernel inside
>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >>
>> >> >> static void __init fdt_enforce_memory_region(void)
>> >> >> {
>> >> >>         struct memblock_region reg = {
>> >> >>                 .size = 0,
>> >> >>         };
>> >> >>
>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >>
>> >> >>         if (reg.size)
>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >> comment this out */
>> >> >> }
>> >> >
>> >> > Please just don't do that. It can cause a fatal damage on
>> >> > memory contents of the *crashed* kernel.
>> >> >
>> >> >> 5). Both the above temporary solutions fix the problem.
>> >> >>
>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >> fail.
>> >> >>
>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >> dt node 'linux,usable-memory-range'
>> >> >
>> >> > I still don't understand why we need to carry over the information
>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> > such regions are free to be reused by the kernel after some point of
>> >> > initialization. Why does crash dump kernel need to know about them?
>> >> >
>> >>
>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> kernel, those regions needs to be preserved, which is why they are
>> >> memblock_reserve()'d now.
>> >
>> > For my better understandings, who is actually accessing such regions
>> > during boot time, uefi itself or efistub?
>> >
>>
>> No, only the kernel. This is where the ACPI tables are stored. For
>> instance, on QEMU we have
>>
>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>>   01000013)
>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> BXPC 00000001)
>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> BXPC 00000001)
>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> BXPC 00000001)
>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> BXPC 00000001)
>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> BXPC 00000001)
>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> BXPC 00000001)
>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> BXPC 00000001)
>>
>> covered by
>>
>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>>  ...
>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>
> OK. I mistakenly understood those regions could be freed after exiting
> UEFI boot services.
>
>>
>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >> when booting the next kernel.
>> >
>> > not really.
>> >
>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> > on crash dump kernel?)
>> >> >
>> >>
>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >> regions only revealed the bug, not created it (given that other
>> >> memblock_reserve regions may be affected as well)
>> >
>> > As whether we should honor such reserved regions over kexec'ing
>> > depends on each one's specific nature, we will have to take care one-by-one.
>> > As a matter of fact, no information about "reserved" memblocks is
>> > exposed to user space (via proc/iomem).
>> >
>>
>> That is why I suggested (somewhere in this thread?) to not expose them
>> as 'System RAM'. Do you think that could solve this?
>
> Memblock-reserv'ing them is necessary to prevent their corruption and
> marking them under another name in /proc/iomem would also be good in order
> not to allocate them as part of crash kernel's memory.
>

I agree. However, this may not be entirely trivial, since iterating
over the memblock_reserved table and creating iomem entries may result
in collisions.

> But I'm not still convinced that we should export them in useable-
> memory-range to crash dump kernel. They will be accessed through
> acpi_os_map_memory() and so won't be required to be part of system ram
> (or memblocks), I guess.

Agreed. They will be covered by the linear mapping in the boot kernel,
and be mapped explicitly via ioremap_cache() in the kexec kernel,
which is exactly what we want in this case.

> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> via a kernel command line parameter, "memmap=".
>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-15  9:35                                                       ` Ard Biesheuvel
@ 2017-12-17 21:01                                                           ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-17 21:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: AKASHI Takahiro, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 15 December 2017 at 09:59, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>>> On 13 December 2017 at 12:16, AKASHI Takahiro
>>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> >> > Bhupesh, Ard,
>>> >> >
>>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>>> >> >> Hi Ard, Akashi
>>> >> >>
>>> >> > (snip)
>>> >> >
>>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>>> >> >> identify its own usable memory and exclude, at its boot time, any
>>> >> >> other memory areas that are part of the panicked kernel's memory.
>>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>>> >> >> , for details)
>>> >> >
>>> >> > Right.
>>> >> >
>>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>>> >> >> with the crashkernel memory range:
>>> >> >>
>>> >> >>                 /* add linux,usable-memory-range */
>>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>> >> >>                                 address_cells, size_cells);
>>> >> >>
>>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>>> >> >> , for details)
>>> >> >>
>>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>>> >> >> they are marked as System RAM or as RESERVED. As,
>>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>>> >> >>
>>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>>> >> >> ACPI memory and crashes while trying to access the same:
>>> >> >>
>>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>>> >> >> -r`.img --reuse-cmdline -d
>>> >> >>
>>> >> >> [snip..]
>>> >> >>
>>> >> >> Reserved memory range
>>> >> >> 000000000e800000-000000002e7fffff (0)
>>> >> >>
>>> >> >> Coredump memory ranges
>>> >> >> 0000000000000000-000000000e7fffff (0)
>>> >> >> 000000002e800000-000000003961ffff (0)
>>> >> >> 0000000039d40000-000000003ed2ffff (0)
>>> >> >> 000000003ed60000-000000003fbfffff (0)
>>> >> >> 0000001040000000-0000001ffbffffff (0)
>>> >> >> 0000002000000000-0000002ffbffffff (0)
>>> >> >> 0000009000000000-0000009ffbffffff (0)
>>> >> >> 000000a000000000-000000affbffffff (0)
>>> >> >>
>>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>>> >> >> memory cap'ing passed to the crash kernel inside
>>> >> >> 'arch/arm64/mm/init.c' (see below):
>>> >> >>
>>> >> >> static void __init fdt_enforce_memory_region(void)
>>> >> >> {
>>> >> >>         struct memblock_region reg = {
>>> >> >>                 .size = 0,
>>> >> >>         };
>>> >> >>
>>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>> >> >>
>>> >> >>         if (reg.size)
>>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>>> >> >> comment this out */
>>> >> >> }
>>> >> >
>>> >> > Please just don't do that. It can cause a fatal damage on
>>> >> > memory contents of the *crashed* kernel.
>>> >> >
>>> >> >> 5). Both the above temporary solutions fix the problem.
>>> >> >>
>>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>>> >> >> fail.
>>> >> >>
>>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>>> >> >> dt node 'linux,usable-memory-range'
>>> >> >
>>> >> > I still don't understand why we need to carry over the information
>>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>>> >> > such regions are free to be reused by the kernel after some point of
>>> >> > initialization. Why does crash dump kernel need to know about them?
>>> >> >
>>> >>
>>> >> Not really. According to the UEFI spec, they can be reclaimed after
>>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>>> >> no longer needs them. Of course, in order to be able to boot a kexec
>>> >> kernel, those regions needs to be preserved, which is why they are
>>> >> memblock_reserve()'d now.
>>> >
>>> > For my better understandings, who is actually accessing such regions
>>> > during boot time, uefi itself or efistub?
>>> >
>>>
>>> No, only the kernel. This is where the ACPI tables are stored. For
>>> instance, on QEMU we have
>>>
>>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>>>   01000013)
>>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>>> BXPC 00000001)
>>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>>> BXPC 00000001)
>>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>>> BXPC 00000001)
>>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>>> BXPC 00000001)
>>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>>> BXPC 00000001)
>>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>>> BXPC 00000001)
>>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>>> BXPC 00000001)
>>>
>>> covered by
>>>
>>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>>>  ...
>>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>>
>> OK. I mistakenly understood those regions could be freed after exiting
>> UEFI boot services.
>>
>>>
>>> >> So it seems that kexec does not honour the memblock_reserve() table
>>> >> when booting the next kernel.
>>> >
>>> > not really.
>>> >
>>> >> > (In other words, can or should we skip some part of ACPI-related init code
>>> >> > on crash dump kernel?)
>>> >> >
>>> >>
>>> >> I don't think so. And the change to the handling of ACPI reclaim
>>> >> regions only revealed the bug, not created it (given that other
>>> >> memblock_reserve regions may be affected as well)
>>> >
>>> > As whether we should honor such reserved regions over kexec'ing
>>> > depends on each one's specific nature, we will have to take care one-by-one.
>>> > As a matter of fact, no information about "reserved" memblocks is
>>> > exposed to user space (via proc/iomem).
>>> >
>>>
>>> That is why I suggested (somewhere in this thread?) to not expose them
>>> as 'System RAM'. Do you think that could solve this?
>>
>> Memblock-reserv'ing them is necessary to prevent their corruption and
>> marking them under another name in /proc/iomem would also be good in order
>> not to allocate them as part of crash kernel's memory.
>>
>
> I agree. However, this may not be entirely trivial, since iterating
> over the memblock_reserved table and creating iomem entries may result
> in collisions.

I found a method (using the patch I shared earlier in this thread) to mark these
entries as 'ACPI reclaim memory' ranges rather than System RAM or
reserved regions.

>> But I'm not still convinced that we should export them in useable-
>> memory-range to crash dump kernel. They will be accessed through
>> acpi_os_map_memory() and so won't be required to be part of system ram
>> (or memblocks), I guess.
>
> Agreed. They will be covered by the linear mapping in the boot kernel,
> and be mapped explicitly via ioremap_cache() in the kexec kernel,
> which is exactly what we want in this case.

Now this is what is confusing me. I don't see the above happening.

I see that the primary kernel boots up and adds the ACPI regions via:
acpi_os_ioremap
    -> ioremap_cache

But during the crashkernel boot, ''acpi_os_ioremap' calls
'ioremap' for the ACPI Reclaim Memory regions and not the _cache
variant.

And it fails while accessing the ACPI tables:

[    0.039205] ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
[    0.095098] Internal error: Oops: 96000021 [#1] SMP
[    0.100022] Modules linked in:
[    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
[    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
[    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
pstate: 60000045
[    0.132647] sp : ffff000008ccfb40
[    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
[    0.141354] x27: ffff0000088be820 x26: 0000000000000000
[    0.146718] x25: 000000000000001b x24: 0000000000000001
[    0.152083] x23: 0000000000000001 x22: ffff000009710027
[    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162812] x19: 000000000000001b x18: 0000000000000005
[    0.168176] x17: 0000000000000000 x16: 0000000000000000
[    0.173541] x15: 0000000000000000 x14: 000000000000038e
[    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
[    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
[    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
[    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
[    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223224] Call trace:
[    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[    0.232194] fa00: 0000000000000000 ffff000009710027
ffff0000095e3980 ffff000008ccfbe0
[    0.240106] fa20: 0000000000000001 ffff80000fe62c00
ffff000008ccfc50 0000000000000000
[    0.248018] fa40: ffff8000126d0140 000000000000005f
00000000ffffff76 0000000000000006
[    0.255931] fa60: ffffffffffffffff ffffffff00000000
000000000000038e 0000000000000000
[    0.263843] fa80: 0000000000000000 0000000000000000
0000000000000005 000000000000001b
[    0.271754] faa0: 0000000000000001 ffff000008ccfc50
ffff000009710027 0000000000000001
[    0.279667] fac0: 0000000000000001 000000000000001b
0000000000000000 ffff0000088be820
[    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
ffff00000849b4f8 ffff000008ccfb40
[    0.295491] fb00: ffff0000084a6764 0000000060000045
ffff000008ccfb40 ffff000008260a18
[    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
ffff000008ccfb40 ffff0000084a6764
[    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
[    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
[    0.394500] ---[ end trace c46ed37f9651c58e ]---
[    0.399160] Kernel panic - not syncing: Fatal exception
[    0.404437] Rebooting in 10 seconds.

So, I think the linear mapping done by the primary kernel does not
make these accessible in the crash kernel directly.

Any pointers?

Regards,
Bhupesh

>> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> via a kernel command line parameter, "memmap=".
>>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-17 21:01                                                           ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-17 21:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 15 December 2017 at 09:59, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>>> On 13 December 2017 at 12:16, AKASHI Takahiro
>>> <takahiro.akashi@linaro.org> wrote:
>>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>>> >> <takahiro.akashi@linaro.org> wrote:
>>> >> > Bhupesh, Ard,
>>> >> >
>>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>>> >> >> Hi Ard, Akashi
>>> >> >>
>>> >> > (snip)
>>> >> >
>>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>>> >> >> identify its own usable memory and exclude, at its boot time, any
>>> >> >> other memory areas that are part of the panicked kernel's memory.
>>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>>> >> >> , for details)
>>> >> >
>>> >> > Right.
>>> >> >
>>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>>> >> >> with the crashkernel memory range:
>>> >> >>
>>> >> >>                 /* add linux,usable-memory-range */
>>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>> >> >>                                 address_cells, size_cells);
>>> >> >>
>>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>>> >> >> , for details)
>>> >> >>
>>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>>> >> >> they are marked as System RAM or as RESERVED. As,
>>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>>> >> >>
>>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>>> >> >> ACPI memory and crashes while trying to access the same:
>>> >> >>
>>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>>> >> >> -r`.img --reuse-cmdline -d
>>> >> >>
>>> >> >> [snip..]
>>> >> >>
>>> >> >> Reserved memory range
>>> >> >> 000000000e800000-000000002e7fffff (0)
>>> >> >>
>>> >> >> Coredump memory ranges
>>> >> >> 0000000000000000-000000000e7fffff (0)
>>> >> >> 000000002e800000-000000003961ffff (0)
>>> >> >> 0000000039d40000-000000003ed2ffff (0)
>>> >> >> 000000003ed60000-000000003fbfffff (0)
>>> >> >> 0000001040000000-0000001ffbffffff (0)
>>> >> >> 0000002000000000-0000002ffbffffff (0)
>>> >> >> 0000009000000000-0000009ffbffffff (0)
>>> >> >> 000000a000000000-000000affbffffff (0)
>>> >> >>
>>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>>> >> >> memory cap'ing passed to the crash kernel inside
>>> >> >> 'arch/arm64/mm/init.c' (see below):
>>> >> >>
>>> >> >> static void __init fdt_enforce_memory_region(void)
>>> >> >> {
>>> >> >>         struct memblock_region reg = {
>>> >> >>                 .size = 0,
>>> >> >>         };
>>> >> >>
>>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>> >> >>
>>> >> >>         if (reg.size)
>>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>>> >> >> comment this out */
>>> >> >> }
>>> >> >
>>> >> > Please just don't do that. It can cause a fatal damage on
>>> >> > memory contents of the *crashed* kernel.
>>> >> >
>>> >> >> 5). Both the above temporary solutions fix the problem.
>>> >> >>
>>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>>> >> >> fail.
>>> >> >>
>>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>>> >> >> dt node 'linux,usable-memory-range'
>>> >> >
>>> >> > I still don't understand why we need to carry over the information
>>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>>> >> > such regions are free to be reused by the kernel after some point of
>>> >> > initialization. Why does crash dump kernel need to know about them?
>>> >> >
>>> >>
>>> >> Not really. According to the UEFI spec, they can be reclaimed after
>>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>>> >> no longer needs them. Of course, in order to be able to boot a kexec
>>> >> kernel, those regions needs to be preserved, which is why they are
>>> >> memblock_reserve()'d now.
>>> >
>>> > For my better understandings, who is actually accessing such regions
>>> > during boot time, uefi itself or efistub?
>>> >
>>>
>>> No, only the kernel. This is where the ACPI tables are stored. For
>>> instance, on QEMU we have
>>>
>>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>>>   01000013)
>>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>>> BXPC 00000001)
>>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>>> BXPC 00000001)
>>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>>> BXPC 00000001)
>>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>>> BXPC 00000001)
>>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>>> BXPC 00000001)
>>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>>> BXPC 00000001)
>>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>>> BXPC 00000001)
>>>
>>> covered by
>>>
>>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>>>  ...
>>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>>
>> OK. I mistakenly understood those regions could be freed after exiting
>> UEFI boot services.
>>
>>>
>>> >> So it seems that kexec does not honour the memblock_reserve() table
>>> >> when booting the next kernel.
>>> >
>>> > not really.
>>> >
>>> >> > (In other words, can or should we skip some part of ACPI-related init code
>>> >> > on crash dump kernel?)
>>> >> >
>>> >>
>>> >> I don't think so. And the change to the handling of ACPI reclaim
>>> >> regions only revealed the bug, not created it (given that other
>>> >> memblock_reserve regions may be affected as well)
>>> >
>>> > As whether we should honor such reserved regions over kexec'ing
>>> > depends on each one's specific nature, we will have to take care one-by-one.
>>> > As a matter of fact, no information about "reserved" memblocks is
>>> > exposed to user space (via proc/iomem).
>>> >
>>>
>>> That is why I suggested (somewhere in this thread?) to not expose them
>>> as 'System RAM'. Do you think that could solve this?
>>
>> Memblock-reserv'ing them is necessary to prevent their corruption and
>> marking them under another name in /proc/iomem would also be good in order
>> not to allocate them as part of crash kernel's memory.
>>
>
> I agree. However, this may not be entirely trivial, since iterating
> over the memblock_reserved table and creating iomem entries may result
> in collisions.

I found a method (using the patch I shared earlier in this thread) to mark these
entries as 'ACPI reclaim memory' ranges rather than System RAM or
reserved regions.

>> But I'm not still convinced that we should export them in useable-
>> memory-range to crash dump kernel. They will be accessed through
>> acpi_os_map_memory() and so won't be required to be part of system ram
>> (or memblocks), I guess.
>
> Agreed. They will be covered by the linear mapping in the boot kernel,
> and be mapped explicitly via ioremap_cache() in the kexec kernel,
> which is exactly what we want in this case.

Now this is what is confusing me. I don't see the above happening.

I see that the primary kernel boots up and adds the ACPI regions via:
acpi_os_ioremap
    -> ioremap_cache

But during the crashkernel boot, ''acpi_os_ioremap' calls
'ioremap' for the ACPI Reclaim Memory regions and not the _cache
variant.

And it fails while accessing the ACPI tables:

[    0.039205] ACPI: Core revision 20170728
pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
[    0.095098] Internal error: Oops: 96000021 [#1] SMP
[    0.100022] Modules linked in:
[    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
[    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
[    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
[    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
[    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
pstate: 60000045
[    0.132647] sp : ffff000008ccfb40
[    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
[    0.141354] x27: ffff0000088be820 x26: 0000000000000000
[    0.146718] x25: 000000000000001b x24: 0000000000000001
[    0.152083] x23: 0000000000000001 x22: ffff000009710027
[    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
[    0.162812] x19: 000000000000001b x18: 0000000000000005
[    0.168176] x17: 0000000000000000 x16: 0000000000000000
[    0.173541] x15: 0000000000000000 x14: 000000000000038e
[    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
[    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
[    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
[    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
[    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
[    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
[    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
[    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
[    0.223224] Call trace:
[    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
[    0.232194] fa00: 0000000000000000 ffff000009710027
ffff0000095e3980 ffff000008ccfbe0
[    0.240106] fa20: 0000000000000001 ffff80000fe62c00
ffff000008ccfc50 0000000000000000
[    0.248018] fa40: ffff8000126d0140 000000000000005f
00000000ffffff76 0000000000000006
[    0.255931] fa60: ffffffffffffffff ffffffff00000000
000000000000038e 0000000000000000
[    0.263843] fa80: 0000000000000000 0000000000000000
0000000000000005 000000000000001b
[    0.271754] faa0: 0000000000000001 ffff000008ccfc50
ffff000009710027 0000000000000001
[    0.279667] fac0: 0000000000000001 000000000000001b
0000000000000000 ffff0000088be820
[    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
ffff00000849b4f8 ffff000008ccfb40
[    0.295491] fb00: ffff0000084a6764 0000000060000045
ffff000008ccfb40 ffff000008260a18
[    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
ffff000008ccfb40 ffff0000084a6764
[    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
[    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
[    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
[    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
[    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
[    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
[    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
[    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
[    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
[    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
[    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
[    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
[    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
[    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
[    0.394500] ---[ end trace c46ed37f9651c58e ]---
[    0.399160] Kernel panic - not syncing: Fatal exception
[    0.404437] Rebooting in 10 seconds.

So, I think the linear mapping done by the primary kernel does not
make these accessible in the crash kernel directly.

Any pointers?

Regards,
Bhupesh

>> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> via a kernel command line parameter, "memmap=".
>>

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-17 21:01                                                           ` Bhupesh Sharma
  (?)
  (?)
@ 2017-12-18  5:16                                                             ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:16 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
to kexec@lists.infradead.org

Also add linux-acpi list
On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 15 December 2017 at 09:59, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> <takahiro.akashi@linaro.org> wrote:
> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> >> <takahiro.akashi@linaro.org> wrote:
> >>> >> > Bhupesh, Ard,
> >>> >> >
> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> >> >> Hi Ard, Akashi
> >>> >> >>
> >>> >> > (snip)
> >>> >> >
> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> >> >> , for details)
> >>> >> >
> >>> >> > Right.
> >>> >> >
> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> >> >> with the crashkernel memory range:
> >>> >> >>
> >>> >> >>                 /* add linux,usable-memory-range */
> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> >> >>                                 address_cells, size_cells);
> >>> >> >>
> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> >> >> , for details)
> >>> >> >>
> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> >> >>
> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> >> >> ACPI memory and crashes while trying to access the same:
> >>> >> >>
> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> >> >> -r`.img --reuse-cmdline -d
> >>> >> >>
> >>> >> >> [snip..]
> >>> >> >>
> >>> >> >> Reserved memory range
> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >>> >> >>
> >>> >> >> Coredump memory ranges
> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >>> >> >> 000000002e800000-000000003961ffff (0)
> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >>> >> >> 000000a000000000-000000affbffffff (0)
> >>> >> >>
> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> >> >> memory cap'ing passed to the crash kernel inside
> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >>> >> >>
> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >>> >> >> {
> >>> >> >>         struct memblock_region reg = {
> >>> >> >>                 .size = 0,
> >>> >> >>         };
> >>> >> >>
> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> >> >>
> >>> >> >>         if (reg.size)
> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> >> >> comment this out */
> >>> >> >> }
> >>> >> >
> >>> >> > Please just don't do that. It can cause a fatal damage on
> >>> >> > memory contents of the *crashed* kernel.
> >>> >> >
> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >>> >> >>
> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> >> >> fail.
> >>> >> >>
> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> >> >> dt node 'linux,usable-memory-range'
> >>> >> >
> >>> >> > I still don't understand why we need to carry over the information
> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> >> > such regions are free to be reused by the kernel after some point of
> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >>> >> >
> >>> >>
> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> >> kernel, those regions needs to be preserved, which is why they are
> >>> >> memblock_reserve()'d now.
> >>> >
> >>> > For my better understandings, who is actually accessing such regions
> >>> > during boot time, uefi itself or efistub?
> >>> >
> >>>
> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >>> instance, on QEMU we have
> >>>
> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>>   01000013)
> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> BXPC 00000001)
> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> BXPC 00000001)
> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> BXPC 00000001)
> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> BXPC 00000001)
> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> BXPC 00000001)
> >>>
> >>> covered by
> >>>
> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>>  ...
> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>
> >> OK. I mistakenly understood those regions could be freed after exiting
> >> UEFI boot services.
> >>
> >>>
> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >>> >> when booting the next kernel.
> >>> >
> >>> > not really.
> >>> >
> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> >> > on crash dump kernel?)
> >>> >> >
> >>> >>
> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >>> >> regions only revealed the bug, not created it (given that other
> >>> >> memblock_reserve regions may be affected as well)
> >>> >
> >>> > As whether we should honor such reserved regions over kexec'ing
> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > As a matter of fact, no information about "reserved" memblocks is
> >>> > exposed to user space (via proc/iomem).
> >>> >
> >>>
> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >>> as 'System RAM'. Do you think that could solve this?
> >>
> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> marking them under another name in /proc/iomem would also be good in order
> >> not to allocate them as part of crash kernel's memory.
> >>
> >
> > I agree. However, this may not be entirely trivial, since iterating
> > over the memblock_reserved table and creating iomem entries may result
> > in collisions.
> 
> I found a method (using the patch I shared earlier in this thread) to mark these
> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> reserved regions.
> 
> >> But I'm not still convinced that we should export them in useable-
> >> memory-range to crash dump kernel. They will be accessed through
> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> (or memblocks), I guess.
> >
> > Agreed. They will be covered by the linear mapping in the boot kernel,
> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > which is exactly what we want in this case.
> 
> Now this is what is confusing me. I don't see the above happening.
> 
> I see that the primary kernel boots up and adds the ACPI regions via:
> acpi_os_ioremap
>     -> ioremap_cache
> 
> But during the crashkernel boot, ''acpi_os_ioremap' calls
> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> variant.
> 
> And it fails while accessing the ACPI tables:
> 
> [    0.039205] ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> [    0.100022] Modules linked in:
> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> pstate: 60000045
> [    0.132647] sp : ffff000008ccfb40
> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> [    0.223224] Call trace:
> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> [    0.232194] fa00: 0000000000000000 ffff000009710027
> ffff0000095e3980 ffff000008ccfbe0
> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> ffff000008ccfc50 0000000000000000
> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> 00000000ffffff76 0000000000000006
> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> 000000000000038e 0000000000000000
> [    0.263843] fa80: 0000000000000000 0000000000000000
> 0000000000000005 000000000000001b
> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> ffff000009710027 0000000000000001
> [    0.279667] fac0: 0000000000000001 000000000000001b
> 0000000000000000 ffff0000088be820
> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> ffff00000849b4f8 ffff000008ccfb40
> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> ffff000008ccfb40 ffff000008260a18
> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> ffff000008ccfb40 ffff0000084a6764
> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> [    0.399160] Kernel panic - not syncing: Fatal exception
> [    0.404437] Rebooting in 10 seconds.
> 
> So, I think the linear mapping done by the primary kernel does not
> make these accessible in the crash kernel directly.
> 
> Any pointers?

Can you get the code line number for acpi_ns_lookup+0x25c?

> 
> Regards,
> Bhupesh
> 
> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> via a kernel command line parameter, "memmap=".
> >>
> _______________________________________________
> kexec mailing list -- kexec@lists.fedoraproject.org
> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:16                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:16 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
to kexec@lists.infradead.org

Also add linux-acpi list
On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 15 December 2017 at 09:59, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> <takahiro.akashi@linaro.org> wrote:
> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> >> <takahiro.akashi@linaro.org> wrote:
> >>> >> > Bhupesh, Ard,
> >>> >> >
> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> >> >> Hi Ard, Akashi
> >>> >> >>
> >>> >> > (snip)
> >>> >> >
> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> >> >> , for details)
> >>> >> >
> >>> >> > Right.
> >>> >> >
> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> >> >> with the crashkernel memory range:
> >>> >> >>
> >>> >> >>                 /* add linux,usable-memory-range */
> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> >> >>                                 address_cells, size_cells);
> >>> >> >>
> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> >> >> , for details)
> >>> >> >>
> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> >> >>
> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> >> >> ACPI memory and crashes while trying to access the same:
> >>> >> >>
> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> >> >> -r`.img --reuse-cmdline -d
> >>> >> >>
> >>> >> >> [snip..]
> >>> >> >>
> >>> >> >> Reserved memory range
> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >>> >> >>
> >>> >> >> Coredump memory ranges
> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >>> >> >> 000000002e800000-000000003961ffff (0)
> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >>> >> >> 000000a000000000-000000affbffffff (0)
> >>> >> >>
> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> >> >> memory cap'ing passed to the crash kernel inside
> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >>> >> >>
> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >>> >> >> {
> >>> >> >>         struct memblock_region reg = {
> >>> >> >>                 .size = 0,
> >>> >> >>         };
> >>> >> >>
> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> >> >>
> >>> >> >>         if (reg.size)
> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> >> >> comment this out */
> >>> >> >> }
> >>> >> >
> >>> >> > Please just don't do that. It can cause a fatal damage on
> >>> >> > memory contents of the *crashed* kernel.
> >>> >> >
> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >>> >> >>
> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> >> >> fail.
> >>> >> >>
> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> >> >> dt node 'linux,usable-memory-range'
> >>> >> >
> >>> >> > I still don't understand why we need to carry over the information
> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> >> > such regions are free to be reused by the kernel after some point of
> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >>> >> >
> >>> >>
> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> >> kernel, those regions needs to be preserved, which is why they are
> >>> >> memblock_reserve()'d now.
> >>> >
> >>> > For my better understandings, who is actually accessing such regions
> >>> > during boot time, uefi itself or efistub?
> >>> >
> >>>
> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >>> instance, on QEMU we have
> >>>
> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>>   01000013)
> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> BXPC 00000001)
> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> BXPC 00000001)
> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> BXPC 00000001)
> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> BXPC 00000001)
> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> BXPC 00000001)
> >>>
> >>> covered by
> >>>
> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>>  ...
> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>
> >> OK. I mistakenly understood those regions could be freed after exiting
> >> UEFI boot services.
> >>
> >>>
> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >>> >> when booting the next kernel.
> >>> >
> >>> > not really.
> >>> >
> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> >> > on crash dump kernel?)
> >>> >> >
> >>> >>
> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >>> >> regions only revealed the bug, not created it (given that other
> >>> >> memblock_reserve regions may be affected as well)
> >>> >
> >>> > As whether we should honor such reserved regions over kexec'ing
> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > As a matter of fact, no information about "reserved" memblocks is
> >>> > exposed to user space (via proc/iomem).
> >>> >
> >>>
> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >>> as 'System RAM'. Do you think that could solve this?
> >>
> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> marking them under another name in /proc/iomem would also be good in order
> >> not to allocate them as part of crash kernel's memory.
> >>
> >
> > I agree. However, this may not be entirely trivial, since iterating
> > over the memblock_reserved table and creating iomem entries may result
> > in collisions.
> 
> I found a method (using the patch I shared earlier in this thread) to mark these
> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> reserved regions.
> 
> >> But I'm not still convinced that we should export them in useable-
> >> memory-range to crash dump kernel. They will be accessed through
> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> (or memblocks), I guess.
> >
> > Agreed. They will be covered by the linear mapping in the boot kernel,
> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > which is exactly what we want in this case.
> 
> Now this is what is confusing me. I don't see the above happening.
> 
> I see that the primary kernel boots up and adds the ACPI regions via:
> acpi_os_ioremap
>     -> ioremap_cache
> 
> But during the crashkernel boot, ''acpi_os_ioremap' calls
> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> variant.
> 
> And it fails while accessing the ACPI tables:
> 
> [    0.039205] ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> [    0.100022] Modules linked in:
> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> pstate: 60000045
> [    0.132647] sp : ffff000008ccfb40
> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> [    0.223224] Call trace:
> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> [    0.232194] fa00: 0000000000000000 ffff000009710027
> ffff0000095e3980 ffff000008ccfbe0
> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> ffff000008ccfc50 0000000000000000
> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> 00000000ffffff76 0000000000000006
> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> 000000000000038e 0000000000000000
> [    0.263843] fa80: 0000000000000000 0000000000000000
> 0000000000000005 000000000000001b
> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> ffff000009710027 0000000000000001
> [    0.279667] fac0: 0000000000000001 000000000000001b
> 0000000000000000 ffff0000088be820
> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> ffff00000849b4f8 ffff000008ccfb40
> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> ffff000008ccfb40 ffff000008260a18
> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> ffff000008ccfb40 ffff0000084a6764
> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> [    0.399160] Kernel panic - not syncing: Fatal exception
> [    0.404437] Rebooting in 10 seconds.
> 
> So, I think the linear mapping done by the primary kernel does not
> make these accessible in the crash kernel directly.
> 
> Any pointers?

Can you get the code line number for acpi_ns_lookup+0x25c?

> 
> Regards,
> Bhupesh
> 
> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> via a kernel command line parameter, "memmap=".
> >>
> _______________________________________________
> kexec mailing list -- kexec@lists.fedoraproject.org
> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:16                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:16 UTC (permalink / raw)
  To: linux-arm-kernel

kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
to kexec at lists.infradead.org

Also add linux-acpi list
On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 15 December 2017 at 09:59, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> <takahiro.akashi@linaro.org> wrote:
> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> >> <takahiro.akashi@linaro.org> wrote:
> >>> >> > Bhupesh, Ard,
> >>> >> >
> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> >> >> Hi Ard, Akashi
> >>> >> >>
> >>> >> > (snip)
> >>> >> >
> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> >> >> , for details)
> >>> >> >
> >>> >> > Right.
> >>> >> >
> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> >> >> with the crashkernel memory range:
> >>> >> >>
> >>> >> >>                 /* add linux,usable-memory-range */
> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> >> >>                                 address_cells, size_cells);
> >>> >> >>
> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> >> >> , for details)
> >>> >> >>
> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> >> >>
> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> >> >> ACPI memory and crashes while trying to access the same:
> >>> >> >>
> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> >> >> -r`.img --reuse-cmdline -d
> >>> >> >>
> >>> >> >> [snip..]
> >>> >> >>
> >>> >> >> Reserved memory range
> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >>> >> >>
> >>> >> >> Coredump memory ranges
> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >>> >> >> 000000002e800000-000000003961ffff (0)
> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >>> >> >> 000000a000000000-000000affbffffff (0)
> >>> >> >>
> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> >> >> memory cap'ing passed to the crash kernel inside
> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >>> >> >>
> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >>> >> >> {
> >>> >> >>         struct memblock_region reg = {
> >>> >> >>                 .size = 0,
> >>> >> >>         };
> >>> >> >>
> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> >> >>
> >>> >> >>         if (reg.size)
> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> >> >> comment this out */
> >>> >> >> }
> >>> >> >
> >>> >> > Please just don't do that. It can cause a fatal damage on
> >>> >> > memory contents of the *crashed* kernel.
> >>> >> >
> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >>> >> >>
> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> >> >> fail.
> >>> >> >>
> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> >> >> dt node 'linux,usable-memory-range'
> >>> >> >
> >>> >> > I still don't understand why we need to carry over the information
> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> >> > such regions are free to be reused by the kernel after some point of
> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >>> >> >
> >>> >>
> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> >> kernel, those regions needs to be preserved, which is why they are
> >>> >> memblock_reserve()'d now.
> >>> >
> >>> > For my better understandings, who is actually accessing such regions
> >>> > during boot time, uefi itself or efistub?
> >>> >
> >>>
> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >>> instance, on QEMU we have
> >>>
> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>>   01000013)
> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> BXPC 00000001)
> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> BXPC 00000001)
> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> BXPC 00000001)
> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> BXPC 00000001)
> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> BXPC 00000001)
> >>>
> >>> covered by
> >>>
> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>>  ...
> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>
> >> OK. I mistakenly understood those regions could be freed after exiting
> >> UEFI boot services.
> >>
> >>>
> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >>> >> when booting the next kernel.
> >>> >
> >>> > not really.
> >>> >
> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> >> > on crash dump kernel?)
> >>> >> >
> >>> >>
> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >>> >> regions only revealed the bug, not created it (given that other
> >>> >> memblock_reserve regions may be affected as well)
> >>> >
> >>> > As whether we should honor such reserved regions over kexec'ing
> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > As a matter of fact, no information about "reserved" memblocks is
> >>> > exposed to user space (via proc/iomem).
> >>> >
> >>>
> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >>> as 'System RAM'. Do you think that could solve this?
> >>
> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> marking them under another name in /proc/iomem would also be good in order
> >> not to allocate them as part of crash kernel's memory.
> >>
> >
> > I agree. However, this may not be entirely trivial, since iterating
> > over the memblock_reserved table and creating iomem entries may result
> > in collisions.
> 
> I found a method (using the patch I shared earlier in this thread) to mark these
> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> reserved regions.
> 
> >> But I'm not still convinced that we should export them in useable-
> >> memory-range to crash dump kernel. They will be accessed through
> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> (or memblocks), I guess.
> >
> > Agreed. They will be covered by the linear mapping in the boot kernel,
> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > which is exactly what we want in this case.
> 
> Now this is what is confusing me. I don't see the above happening.
> 
> I see that the primary kernel boots up and adds the ACPI regions via:
> acpi_os_ioremap
>     -> ioremap_cache
> 
> But during the crashkernel boot, ''acpi_os_ioremap' calls
> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> variant.
> 
> And it fails while accessing the ACPI tables:
> 
> [    0.039205] ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> [    0.100022] Modules linked in:
> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> pstate: 60000045
> [    0.132647] sp : ffff000008ccfb40
> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> [    0.223224] Call trace:
> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> [    0.232194] fa00: 0000000000000000 ffff000009710027
> ffff0000095e3980 ffff000008ccfbe0
> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> ffff000008ccfc50 0000000000000000
> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> 00000000ffffff76 0000000000000006
> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> 000000000000038e 0000000000000000
> [    0.263843] fa80: 0000000000000000 0000000000000000
> 0000000000000005 000000000000001b
> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> ffff000009710027 0000000000000001
> [    0.279667] fac0: 0000000000000001 000000000000001b
> 0000000000000000 ffff0000088be820
> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> ffff00000849b4f8 ffff000008ccfb40
> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> ffff000008ccfb40 ffff000008260a18
> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> ffff000008ccfb40 ffff0000084a6764
> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> [    0.399160] Kernel panic - not syncing: Fatal exception
> [    0.404437] Rebooting in 10 seconds.
> 
> So, I think the linear mapping done by the primary kernel does not
> make these accessible in the crash kernel directly.
> 
> Any pointers?

Can you get the code line number for acpi_ns_lookup+0x25c?

> 
> Regards,
> Bhupesh
> 
> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> via a kernel command line parameter, "memmap=".
> >>
> _______________________________________________
> kexec mailing list -- kexec at lists.fedoraproject.org
> To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:16                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:16 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, AKASHI Takahiro, Matt Fleming,
	Ard Biesheuvel, kexec, linux-kernel, linux-acpi, James Morse,
	Bhupesh SHARMA, linux-arm-kernel

kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
to kexec@lists.infradead.org

Also add linux-acpi list
On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 15 December 2017 at 09:59, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> <takahiro.akashi@linaro.org> wrote:
> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> >> <takahiro.akashi@linaro.org> wrote:
> >>> >> > Bhupesh, Ard,
> >>> >> >
> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> >> >> Hi Ard, Akashi
> >>> >> >>
> >>> >> > (snip)
> >>> >> >
> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> >> >> , for details)
> >>> >> >
> >>> >> > Right.
> >>> >> >
> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> >> >> with the crashkernel memory range:
> >>> >> >>
> >>> >> >>                 /* add linux,usable-memory-range */
> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> >> >>                                 address_cells, size_cells);
> >>> >> >>
> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> >> >> , for details)
> >>> >> >>
> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> >> >>
> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> >> >> ACPI memory and crashes while trying to access the same:
> >>> >> >>
> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> >> >> -r`.img --reuse-cmdline -d
> >>> >> >>
> >>> >> >> [snip..]
> >>> >> >>
> >>> >> >> Reserved memory range
> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >>> >> >>
> >>> >> >> Coredump memory ranges
> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >>> >> >> 000000002e800000-000000003961ffff (0)
> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >>> >> >> 000000a000000000-000000affbffffff (0)
> >>> >> >>
> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> >> >> memory cap'ing passed to the crash kernel inside
> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >>> >> >>
> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >>> >> >> {
> >>> >> >>         struct memblock_region reg = {
> >>> >> >>                 .size = 0,
> >>> >> >>         };
> >>> >> >>
> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> >> >>
> >>> >> >>         if (reg.size)
> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> >> >> comment this out */
> >>> >> >> }
> >>> >> >
> >>> >> > Please just don't do that. It can cause a fatal damage on
> >>> >> > memory contents of the *crashed* kernel.
> >>> >> >
> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >>> >> >>
> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> >> >> fail.
> >>> >> >>
> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> >> >> dt node 'linux,usable-memory-range'
> >>> >> >
> >>> >> > I still don't understand why we need to carry over the information
> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> >> > such regions are free to be reused by the kernel after some point of
> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >>> >> >
> >>> >>
> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> >> kernel, those regions needs to be preserved, which is why they are
> >>> >> memblock_reserve()'d now.
> >>> >
> >>> > For my better understandings, who is actually accessing such regions
> >>> > during boot time, uefi itself or efistub?
> >>> >
> >>>
> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >>> instance, on QEMU we have
> >>>
> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>>   01000013)
> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> BXPC 00000001)
> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> BXPC 00000001)
> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> BXPC 00000001)
> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> BXPC 00000001)
> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> BXPC 00000001)
> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> BXPC 00000001)
> >>>
> >>> covered by
> >>>
> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>>  ...
> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>
> >> OK. I mistakenly understood those regions could be freed after exiting
> >> UEFI boot services.
> >>
> >>>
> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >>> >> when booting the next kernel.
> >>> >
> >>> > not really.
> >>> >
> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> >> > on crash dump kernel?)
> >>> >> >
> >>> >>
> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >>> >> regions only revealed the bug, not created it (given that other
> >>> >> memblock_reserve regions may be affected as well)
> >>> >
> >>> > As whether we should honor such reserved regions over kexec'ing
> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > As a matter of fact, no information about "reserved" memblocks is
> >>> > exposed to user space (via proc/iomem).
> >>> >
> >>>
> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >>> as 'System RAM'. Do you think that could solve this?
> >>
> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> marking them under another name in /proc/iomem would also be good in order
> >> not to allocate them as part of crash kernel's memory.
> >>
> >
> > I agree. However, this may not be entirely trivial, since iterating
> > over the memblock_reserved table and creating iomem entries may result
> > in collisions.
> 
> I found a method (using the patch I shared earlier in this thread) to mark these
> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> reserved regions.
> 
> >> But I'm not still convinced that we should export them in useable-
> >> memory-range to crash dump kernel. They will be accessed through
> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> (or memblocks), I guess.
> >
> > Agreed. They will be covered by the linear mapping in the boot kernel,
> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > which is exactly what we want in this case.
> 
> Now this is what is confusing me. I don't see the above happening.
> 
> I see that the primary kernel boots up and adds the ACPI regions via:
> acpi_os_ioremap
>     -> ioremap_cache
> 
> But during the crashkernel boot, ''acpi_os_ioremap' calls
> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> variant.
> 
> And it fails while accessing the ACPI tables:
> 
> [    0.039205] ACPI: Core revision 20170728
> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> [    0.100022] Modules linked in:
> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> pstate: 60000045
> [    0.132647] sp : ffff000008ccfb40
> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> [    0.223224] Call trace:
> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> [    0.232194] fa00: 0000000000000000 ffff000009710027
> ffff0000095e3980 ffff000008ccfbe0
> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> ffff000008ccfc50 0000000000000000
> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> 00000000ffffff76 0000000000000006
> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> 000000000000038e 0000000000000000
> [    0.263843] fa80: 0000000000000000 0000000000000000
> 0000000000000005 000000000000001b
> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> ffff000009710027 0000000000000001
> [    0.279667] fac0: 0000000000000001 000000000000001b
> 0000000000000000 ffff0000088be820
> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> ffff00000849b4f8 ffff000008ccfb40
> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> ffff000008ccfb40 ffff000008260a18
> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> ffff000008ccfb40 ffff0000084a6764
> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> [    0.399160] Kernel panic - not syncing: Fatal exception
> [    0.404437] Rebooting in 10 seconds.
> 
> So, I think the linear mapping done by the primary kernel does not
> make these accessible in the crash kernel directly.
> 
> Any pointers?

Can you get the code line number for acpi_ns_lookup+0x25c?

> 
> Regards,
> Bhupesh
> 
> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> via a kernel command line parameter, "memmap=".
> >>
> _______________________________________________
> kexec mailing list -- kexec@lists.fedoraproject.org
> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-15  8:59                                                     ` AKASHI Takahiro
@ 2017-12-18  5:40                                                       ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:40 UTC (permalink / raw)
  To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > On 13 December 2017 at 12:16, AKASHI Takahiro
> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > >> > Bhupesh, Ard,
> > >> >
> > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > >> >> Hi Ard, Akashi
> > >> >>
> > >> > (snip)
> > >> >
> > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > >> >> identify its own usable memory and exclude, at its boot time, any
> > >> >> other memory areas that are part of the panicked kernel's memory.
> > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > >> >> , for details)
> > >> >
> > >> > Right.
> > >> >
> > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > >> >> with the crashkernel memory range:
> > >> >>
> > >> >>                 /* add linux,usable-memory-range */
> > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > >> >>                                 address_cells, size_cells);
> > >> >>
> > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > >> >> , for details)
> > >> >>
> > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > >> >> they are marked as System RAM or as RESERVED. As,
> > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > >> >>
> > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > >> >> ACPI memory and crashes while trying to access the same:
> > >> >>
> > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > >> >> -r`.img --reuse-cmdline -d
> > >> >>
> > >> >> [snip..]
> > >> >>
> > >> >> Reserved memory range
> > >> >> 000000000e800000-000000002e7fffff (0)
> > >> >>
> > >> >> Coredump memory ranges
> > >> >> 0000000000000000-000000000e7fffff (0)
> > >> >> 000000002e800000-000000003961ffff (0)
> > >> >> 0000000039d40000-000000003ed2ffff (0)
> > >> >> 000000003ed60000-000000003fbfffff (0)
> > >> >> 0000001040000000-0000001ffbffffff (0)
> > >> >> 0000002000000000-0000002ffbffffff (0)
> > >> >> 0000009000000000-0000009ffbffffff (0)
> > >> >> 000000a000000000-000000affbffffff (0)
> > >> >>
> > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > >> >> memory cap'ing passed to the crash kernel inside
> > >> >> 'arch/arm64/mm/init.c' (see below):
> > >> >>
> > >> >> static void __init fdt_enforce_memory_region(void)
> > >> >> {
> > >> >>         struct memblock_region reg = {
> > >> >>                 .size = 0,
> > >> >>         };
> > >> >>
> > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > >> >>
> > >> >>         if (reg.size)
> > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > >> >> comment this out */
> > >> >> }
> > >> >
> > >> > Please just don't do that. It can cause a fatal damage on
> > >> > memory contents of the *crashed* kernel.
> > >> >
> > >> >> 5). Both the above temporary solutions fix the problem.
> > >> >>
> > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > >> >> fail.
> > >> >>
> > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > >> >> dt node 'linux,usable-memory-range'
> > >> >
> > >> > I still don't understand why we need to carry over the information
> > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > >> > such regions are free to be reused by the kernel after some point of
> > >> > initialization. Why does crash dump kernel need to know about them?
> > >> >
> > >>
> > >> Not really. According to the UEFI spec, they can be reclaimed after
> > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > >> no longer needs them. Of course, in order to be able to boot a kexec
> > >> kernel, those regions needs to be preserved, which is why they are
> > >> memblock_reserve()'d now.
> > >
> > > For my better understandings, who is actually accessing such regions
> > > during boot time, uefi itself or efistub?
> > >
> > 
> > No, only the kernel. This is where the ACPI tables are stored. For
> > instance, on QEMU we have
> > 
> >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >   01000013)
> >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > BXPC 00000001)
> >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > BXPC 00000001)
> >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > BXPC 00000001)
> >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > BXPC 00000001)
> >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > BXPC 00000001)
> >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > BXPC 00000001)
> >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > BXPC 00000001)
> > 
> > covered by
> > 
> >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >  ...
> >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> 
> OK. I mistakenly understood those regions could be freed after exiting
> UEFI boot services.
> 
> > 
> > >> So it seems that kexec does not honour the memblock_reserve() table
> > >> when booting the next kernel.
> > >
> > > not really.
> > >
> > >> > (In other words, can or should we skip some part of ACPI-related init code
> > >> > on crash dump kernel?)
> > >> >
> > >>
> > >> I don't think so. And the change to the handling of ACPI reclaim
> > >> regions only revealed the bug, not created it (given that other
> > >> memblock_reserve regions may be affected as well)
> > >
> > > As whether we should honor such reserved regions over kexec'ing
> > > depends on each one's specific nature, we will have to take care one-by-one.
> > > As a matter of fact, no information about "reserved" memblocks is
> > > exposed to user space (via proc/iomem).
> > >
> > 
> > That is why I suggested (somewhere in this thread?) to not expose them
> > as 'System RAM'. Do you think that could solve this?
> 
> Memblock-reserv'ing them is necessary to prevent their corruption and
> marking them under another name in /proc/iomem would also be good in order
> not to allocate them as part of crash kernel's memory.
> 
> But I'm not still convinced that we should export them in useable-
> memory-range to crash dump kernel. They will be accessed through
> acpi_os_map_memory() and so won't be required to be part of system ram
> (or memblocks), I guess.
> 	-> Bhupesh?

I forgot how arm64 kernel retrieve the memory ranges and initialize
them.  If no "e820" like interfaces shouldn't kernel reinitialize all
the memory according to the efi memmap?  For kdump kernel anything other
than usable memory (which is from the dt node instead) should be
reinitialized according to efi passed info, no?

> 
> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> via a kernel command line parameter, "memmap=".

memmap= is only used in old kexec-tools, now we are passing them via
e820 table.

[snip]

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:40                                                       ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > On 13 December 2017 at 12:16, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > >> <takahiro.akashi@linaro.org> wrote:
> > >> > Bhupesh, Ard,
> > >> >
> > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > >> >> Hi Ard, Akashi
> > >> >>
> > >> > (snip)
> > >> >
> > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > >> >> identify its own usable memory and exclude, at its boot time, any
> > >> >> other memory areas that are part of the panicked kernel's memory.
> > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > >> >> , for details)
> > >> >
> > >> > Right.
> > >> >
> > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > >> >> with the crashkernel memory range:
> > >> >>
> > >> >>                 /* add linux,usable-memory-range */
> > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > >> >>                                 address_cells, size_cells);
> > >> >>
> > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > >> >> , for details)
> > >> >>
> > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > >> >> they are marked as System RAM or as RESERVED. As,
> > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > >> >>
> > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > >> >> ACPI memory and crashes while trying to access the same:
> > >> >>
> > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > >> >> -r`.img --reuse-cmdline -d
> > >> >>
> > >> >> [snip..]
> > >> >>
> > >> >> Reserved memory range
> > >> >> 000000000e800000-000000002e7fffff (0)
> > >> >>
> > >> >> Coredump memory ranges
> > >> >> 0000000000000000-000000000e7fffff (0)
> > >> >> 000000002e800000-000000003961ffff (0)
> > >> >> 0000000039d40000-000000003ed2ffff (0)
> > >> >> 000000003ed60000-000000003fbfffff (0)
> > >> >> 0000001040000000-0000001ffbffffff (0)
> > >> >> 0000002000000000-0000002ffbffffff (0)
> > >> >> 0000009000000000-0000009ffbffffff (0)
> > >> >> 000000a000000000-000000affbffffff (0)
> > >> >>
> > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > >> >> memory cap'ing passed to the crash kernel inside
> > >> >> 'arch/arm64/mm/init.c' (see below):
> > >> >>
> > >> >> static void __init fdt_enforce_memory_region(void)
> > >> >> {
> > >> >>         struct memblock_region reg = {
> > >> >>                 .size = 0,
> > >> >>         };
> > >> >>
> > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > >> >>
> > >> >>         if (reg.size)
> > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > >> >> comment this out */
> > >> >> }
> > >> >
> > >> > Please just don't do that. It can cause a fatal damage on
> > >> > memory contents of the *crashed* kernel.
> > >> >
> > >> >> 5). Both the above temporary solutions fix the problem.
> > >> >>
> > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > >> >> fail.
> > >> >>
> > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > >> >> dt node 'linux,usable-memory-range'
> > >> >
> > >> > I still don't understand why we need to carry over the information
> > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > >> > such regions are free to be reused by the kernel after some point of
> > >> > initialization. Why does crash dump kernel need to know about them?
> > >> >
> > >>
> > >> Not really. According to the UEFI spec, they can be reclaimed after
> > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > >> no longer needs them. Of course, in order to be able to boot a kexec
> > >> kernel, those regions needs to be preserved, which is why they are
> > >> memblock_reserve()'d now.
> > >
> > > For my better understandings, who is actually accessing such regions
> > > during boot time, uefi itself or efistub?
> > >
> > 
> > No, only the kernel. This is where the ACPI tables are stored. For
> > instance, on QEMU we have
> > 
> >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >   01000013)
> >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > BXPC 00000001)
> >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > BXPC 00000001)
> >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > BXPC 00000001)
> >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > BXPC 00000001)
> >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > BXPC 00000001)
> >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > BXPC 00000001)
> >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > BXPC 00000001)
> > 
> > covered by
> > 
> >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >  ...
> >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> 
> OK. I mistakenly understood those regions could be freed after exiting
> UEFI boot services.
> 
> > 
> > >> So it seems that kexec does not honour the memblock_reserve() table
> > >> when booting the next kernel.
> > >
> > > not really.
> > >
> > >> > (In other words, can or should we skip some part of ACPI-related init code
> > >> > on crash dump kernel?)
> > >> >
> > >>
> > >> I don't think so. And the change to the handling of ACPI reclaim
> > >> regions only revealed the bug, not created it (given that other
> > >> memblock_reserve regions may be affected as well)
> > >
> > > As whether we should honor such reserved regions over kexec'ing
> > > depends on each one's specific nature, we will have to take care one-by-one.
> > > As a matter of fact, no information about "reserved" memblocks is
> > > exposed to user space (via proc/iomem).
> > >
> > 
> > That is why I suggested (somewhere in this thread?) to not expose them
> > as 'System RAM'. Do you think that could solve this?
> 
> Memblock-reserv'ing them is necessary to prevent their corruption and
> marking them under another name in /proc/iomem would also be good in order
> not to allocate them as part of crash kernel's memory.
> 
> But I'm not still convinced that we should export them in useable-
> memory-range to crash dump kernel. They will be accessed through
> acpi_os_map_memory() and so won't be required to be part of system ram
> (or memblocks), I guess.
> 	-> Bhupesh?

I forgot how arm64 kernel retrieve the memory ranges and initialize
them.  If no "e820" like interfaces shouldn't kernel reinitialize all
the memory according to the efi memmap?  For kdump kernel anything other
than usable memory (which is from the dt node instead) should be
reinitialized according to efi passed info, no?

> 
> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> via a kernel command line parameter, "memmap=".

memmap= is only used in old kexec-tools, now we are passing them via
e820 table.

[snip]

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
       [not found]                                                       ` <20171218054009.GA6392-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
  2017-12-19  6:09                                                           ` AKASHI Takahiro
  (?)
@ 2017-12-18  5:43                                                         ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:43 UTC (permalink / raw)
  To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland,
	James Morse, kexec, linux-kernel

Fix the kexec list address.

On 12/18/17 at 01:40pm, Dave Young wrote:
> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > > >> <takahiro.akashi@linaro.org> wrote:
> > > >> > Bhupesh, Ard,
> > > >> >
> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > > >> >> Hi Ard, Akashi
> > > >> >>
> > > >> > (snip)
> > > >> >
> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > > >> >> identify its own usable memory and exclude, at its boot time, any
> > > >> >> other memory areas that are part of the panicked kernel's memory.
> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > > >> >> , for details)
> > > >> >
> > > >> > Right.
> > > >> >
> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > > >> >> with the crashkernel memory range:
> > > >> >>
> > > >> >>                 /* add linux,usable-memory-range */
> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > > >> >>                                 address_cells, size_cells);
> > > >> >>
> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > > >> >> , for details)
> > > >> >>
> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > > >> >> they are marked as System RAM or as RESERVED. As,
> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > > >> >>
> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > > >> >> ACPI memory and crashes while trying to access the same:
> > > >> >>
> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > > >> >> -r`.img --reuse-cmdline -d
> > > >> >>
> > > >> >> [snip..]
> > > >> >>
> > > >> >> Reserved memory range
> > > >> >> 000000000e800000-000000002e7fffff (0)
> > > >> >>
> > > >> >> Coredump memory ranges
> > > >> >> 0000000000000000-000000000e7fffff (0)
> > > >> >> 000000002e800000-000000003961ffff (0)
> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> > > >> >> 000000003ed60000-000000003fbfffff (0)
> > > >> >> 0000001040000000-0000001ffbffffff (0)
> > > >> >> 0000002000000000-0000002ffbffffff (0)
> > > >> >> 0000009000000000-0000009ffbffffff (0)
> > > >> >> 000000a000000000-000000affbffffff (0)
> > > >> >>
> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > > >> >> memory cap'ing passed to the crash kernel inside
> > > >> >> 'arch/arm64/mm/init.c' (see below):
> > > >> >>
> > > >> >> static void __init fdt_enforce_memory_region(void)
> > > >> >> {
> > > >> >>         struct memblock_region reg = {
> > > >> >>                 .size = 0,
> > > >> >>         };
> > > >> >>
> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > >> >>
> > > >> >>         if (reg.size)
> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > > >> >> comment this out */
> > > >> >> }
> > > >> >
> > > >> > Please just don't do that. It can cause a fatal damage on
> > > >> > memory contents of the *crashed* kernel.
> > > >> >
> > > >> >> 5). Both the above temporary solutions fix the problem.
> > > >> >>
> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > > >> >> fail.
> > > >> >>
> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > > >> >> dt node 'linux,usable-memory-range'
> > > >> >
> > > >> > I still don't understand why we need to carry over the information
> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > > >> > such regions are free to be reused by the kernel after some point of
> > > >> > initialization. Why does crash dump kernel need to know about them?
> > > >> >
> > > >>
> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> > > >> kernel, those regions needs to be preserved, which is why they are
> > > >> memblock_reserve()'d now.
> > > >
> > > > For my better understandings, who is actually accessing such regions
> > > > during boot time, uefi itself or efistub?
> > > >
> > > 
> > > No, only the kernel. This is where the ACPI tables are stored. For
> > > instance, on QEMU we have
> > > 
> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >   01000013)
> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > > BXPC 00000001)
> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > > BXPC 00000001)
> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > > BXPC 00000001)
> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > > BXPC 00000001)
> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > > BXPC 00000001)
> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > > BXPC 00000001)
> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > > BXPC 00000001)
> > > 
> > > covered by
> > > 
> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >  ...
> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > 
> > OK. I mistakenly understood those regions could be freed after exiting
> > UEFI boot services.
> > 
> > > 
> > > >> So it seems that kexec does not honour the memblock_reserve() table
> > > >> when booting the next kernel.
> > > >
> > > > not really.
> > > >
> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> > > >> > on crash dump kernel?)
> > > >> >
> > > >>
> > > >> I don't think so. And the change to the handling of ACPI reclaim
> > > >> regions only revealed the bug, not created it (given that other
> > > >> memblock_reserve regions may be affected as well)
> > > >
> > > > As whether we should honor such reserved regions over kexec'ing
> > > > depends on each one's specific nature, we will have to take care one-by-one.
> > > > As a matter of fact, no information about "reserved" memblocks is
> > > > exposed to user space (via proc/iomem).
> > > >
> > > 
> > > That is why I suggested (somewhere in this thread?) to not expose them
> > > as 'System RAM'. Do you think that could solve this?
> > 
> > Memblock-reserv'ing them is necessary to prevent their corruption and
> > marking them under another name in /proc/iomem would also be good in order
> > not to allocate them as part of crash kernel's memory.
> > 
> > But I'm not still convinced that we should export them in useable-
> > memory-range to crash dump kernel. They will be accessed through
> > acpi_os_map_memory() and so won't be required to be part of system ram
> > (or memblocks), I guess.
> > 	-> Bhupesh?
> 
> I forgot how arm64 kernel retrieve the memory ranges and initialize
> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> the memory according to the efi memmap?  For kdump kernel anything other
> than usable memory (which is from the dt node instead) should be
> reinitialized according to efi passed info, no?
> 
> > 
> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > via a kernel command line parameter, "memmap=".
> 
> memmap= is only used in old kexec-tools, now we are passing them via
> e820 table.
> 
> [snip]
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:43                                                         ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:43 UTC (permalink / raw)
  To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Fix the kexec list address.

On 12/18/17 at 01:40pm, Dave Young wrote:
> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > > >> > Bhupesh, Ard,
> > > >> >
> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > > >> >> Hi Ard, Akashi
> > > >> >>
> > > >> > (snip)
> > > >> >
> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > > >> >> identify its own usable memory and exclude, at its boot time, any
> > > >> >> other memory areas that are part of the panicked kernel's memory.
> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > > >> >> , for details)
> > > >> >
> > > >> > Right.
> > > >> >
> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > > >> >> with the crashkernel memory range:
> > > >> >>
> > > >> >>                 /* add linux,usable-memory-range */
> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > > >> >>                                 address_cells, size_cells);
> > > >> >>
> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > > >> >> , for details)
> > > >> >>
> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > > >> >> they are marked as System RAM or as RESERVED. As,
> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > > >> >>
> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > > >> >> ACPI memory and crashes while trying to access the same:
> > > >> >>
> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > > >> >> -r`.img --reuse-cmdline -d
> > > >> >>
> > > >> >> [snip..]
> > > >> >>
> > > >> >> Reserved memory range
> > > >> >> 000000000e800000-000000002e7fffff (0)
> > > >> >>
> > > >> >> Coredump memory ranges
> > > >> >> 0000000000000000-000000000e7fffff (0)
> > > >> >> 000000002e800000-000000003961ffff (0)
> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> > > >> >> 000000003ed60000-000000003fbfffff (0)
> > > >> >> 0000001040000000-0000001ffbffffff (0)
> > > >> >> 0000002000000000-0000002ffbffffff (0)
> > > >> >> 0000009000000000-0000009ffbffffff (0)
> > > >> >> 000000a000000000-000000affbffffff (0)
> > > >> >>
> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > > >> >> memory cap'ing passed to the crash kernel inside
> > > >> >> 'arch/arm64/mm/init.c' (see below):
> > > >> >>
> > > >> >> static void __init fdt_enforce_memory_region(void)
> > > >> >> {
> > > >> >>         struct memblock_region reg = {
> > > >> >>                 .size = 0,
> > > >> >>         };
> > > >> >>
> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > >> >>
> > > >> >>         if (reg.size)
> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > > >> >> comment this out */
> > > >> >> }
> > > >> >
> > > >> > Please just don't do that. It can cause a fatal damage on
> > > >> > memory contents of the *crashed* kernel.
> > > >> >
> > > >> >> 5). Both the above temporary solutions fix the problem.
> > > >> >>
> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > > >> >> fail.
> > > >> >>
> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > > >> >> dt node 'linux,usable-memory-range'
> > > >> >
> > > >> > I still don't understand why we need to carry over the information
> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > > >> > such regions are free to be reused by the kernel after some point of
> > > >> > initialization. Why does crash dump kernel need to know about them?
> > > >> >
> > > >>
> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> > > >> kernel, those regions needs to be preserved, which is why they are
> > > >> memblock_reserve()'d now.
> > > >
> > > > For my better understandings, who is actually accessing such regions
> > > > during boot time, uefi itself or efistub?
> > > >
> > > 
> > > No, only the kernel. This is where the ACPI tables are stored. For
> > > instance, on QEMU we have
> > > 
> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >   01000013)
> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > > BXPC 00000001)
> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > > BXPC 00000001)
> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > > BXPC 00000001)
> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > > BXPC 00000001)
> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > > BXPC 00000001)
> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > > BXPC 00000001)
> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > > BXPC 00000001)
> > > 
> > > covered by
> > > 
> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >  ...
> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > 
> > OK. I mistakenly understood those regions could be freed after exiting
> > UEFI boot services.
> > 
> > > 
> > > >> So it seems that kexec does not honour the memblock_reserve() table
> > > >> when booting the next kernel.
> > > >
> > > > not really.
> > > >
> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> > > >> > on crash dump kernel?)
> > > >> >
> > > >>
> > > >> I don't think so. And the change to the handling of ACPI reclaim
> > > >> regions only revealed the bug, not created it (given that other
> > > >> memblock_reserve regions may be affected as well)
> > > >
> > > > As whether we should honor such reserved regions over kexec'ing
> > > > depends on each one's specific nature, we will have to take care one-by-one.
> > > > As a matter of fact, no information about "reserved" memblocks is
> > > > exposed to user space (via proc/iomem).
> > > >
> > > 
> > > That is why I suggested (somewhere in this thread?) to not expose them
> > > as 'System RAM'. Do you think that could solve this?
> > 
> > Memblock-reserv'ing them is necessary to prevent their corruption and
> > marking them under another name in /proc/iomem would also be good in order
> > not to allocate them as part of crash kernel's memory.
> > 
> > But I'm not still convinced that we should export them in useable-
> > memory-range to crash dump kernel. They will be accessed through
> > acpi_os_map_memory() and so won't be required to be part of system ram
> > (or memblocks), I guess.
> > 	-> Bhupesh?
> 
> I forgot how arm64 kernel retrieve the memory ranges and initialize
> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> the memory according to the efi memmap?  For kdump kernel anything other
> than usable memory (which is from the dt node instead) should be
> reinitialized according to efi passed info, no?
> 
> > 
> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > via a kernel command line parameter, "memmap=".
> 
> memmap= is only used in old kexec-tools, now we are passing them via
> e820 table.
> 
> [snip]
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:43                                                         ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:43 UTC (permalink / raw)
  To: linux-arm-kernel

Fix the kexec list address.

On 12/18/17 at 01:40pm, Dave Young wrote:
> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > > >> <takahiro.akashi@linaro.org> wrote:
> > > >> > Bhupesh, Ard,
> > > >> >
> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > > >> >> Hi Ard, Akashi
> > > >> >>
> > > >> > (snip)
> > > >> >
> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > > >> >> identify its own usable memory and exclude, at its boot time, any
> > > >> >> other memory areas that are part of the panicked kernel's memory.
> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > > >> >> , for details)
> > > >> >
> > > >> > Right.
> > > >> >
> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > > >> >> with the crashkernel memory range:
> > > >> >>
> > > >> >>                 /* add linux,usable-memory-range */
> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > > >> >>                                 address_cells, size_cells);
> > > >> >>
> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > > >> >> , for details)
> > > >> >>
> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > > >> >> they are marked as System RAM or as RESERVED. As,
> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > > >> >>
> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > > >> >> ACPI memory and crashes while trying to access the same:
> > > >> >>
> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > > >> >> -r`.img --reuse-cmdline -d
> > > >> >>
> > > >> >> [snip..]
> > > >> >>
> > > >> >> Reserved memory range
> > > >> >> 000000000e800000-000000002e7fffff (0)
> > > >> >>
> > > >> >> Coredump memory ranges
> > > >> >> 0000000000000000-000000000e7fffff (0)
> > > >> >> 000000002e800000-000000003961ffff (0)
> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> > > >> >> 000000003ed60000-000000003fbfffff (0)
> > > >> >> 0000001040000000-0000001ffbffffff (0)
> > > >> >> 0000002000000000-0000002ffbffffff (0)
> > > >> >> 0000009000000000-0000009ffbffffff (0)
> > > >> >> 000000a000000000-000000affbffffff (0)
> > > >> >>
> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > > >> >> memory cap'ing passed to the crash kernel inside
> > > >> >> 'arch/arm64/mm/init.c' (see below):
> > > >> >>
> > > >> >> static void __init fdt_enforce_memory_region(void)
> > > >> >> {
> > > >> >>         struct memblock_region reg = {
> > > >> >>                 .size = 0,
> > > >> >>         };
> > > >> >>
> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > >> >>
> > > >> >>         if (reg.size)
> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > > >> >> comment this out */
> > > >> >> }
> > > >> >
> > > >> > Please just don't do that. It can cause a fatal damage on
> > > >> > memory contents of the *crashed* kernel.
> > > >> >
> > > >> >> 5). Both the above temporary solutions fix the problem.
> > > >> >>
> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > > >> >> fail.
> > > >> >>
> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > > >> >> dt node 'linux,usable-memory-range'
> > > >> >
> > > >> > I still don't understand why we need to carry over the information
> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > > >> > such regions are free to be reused by the kernel after some point of
> > > >> > initialization. Why does crash dump kernel need to know about them?
> > > >> >
> > > >>
> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> > > >> kernel, those regions needs to be preserved, which is why they are
> > > >> memblock_reserve()'d now.
> > > >
> > > > For my better understandings, who is actually accessing such regions
> > > > during boot time, uefi itself or efistub?
> > > >
> > > 
> > > No, only the kernel. This is where the ACPI tables are stored. For
> > > instance, on QEMU we have
> > > 
> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >   01000013)
> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > > BXPC 00000001)
> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > > BXPC 00000001)
> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > > BXPC 00000001)
> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > > BXPC 00000001)
> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > > BXPC 00000001)
> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > > BXPC 00000001)
> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > > BXPC 00000001)
> > > 
> > > covered by
> > > 
> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >  ...
> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > 
> > OK. I mistakenly understood those regions could be freed after exiting
> > UEFI boot services.
> > 
> > > 
> > > >> So it seems that kexec does not honour the memblock_reserve() table
> > > >> when booting the next kernel.
> > > >
> > > > not really.
> > > >
> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> > > >> > on crash dump kernel?)
> > > >> >
> > > >>
> > > >> I don't think so. And the change to the handling of ACPI reclaim
> > > >> regions only revealed the bug, not created it (given that other
> > > >> memblock_reserve regions may be affected as well)
> > > >
> > > > As whether we should honor such reserved regions over kexec'ing
> > > > depends on each one's specific nature, we will have to take care one-by-one.
> > > > As a matter of fact, no information about "reserved" memblocks is
> > > > exposed to user space (via proc/iomem).
> > > >
> > > 
> > > That is why I suggested (somewhere in this thread?) to not expose them
> > > as 'System RAM'. Do you think that could solve this?
> > 
> > Memblock-reserv'ing them is necessary to prevent their corruption and
> > marking them under another name in /proc/iomem would also be good in order
> > not to allocate them as part of crash kernel's memory.
> > 
> > But I'm not still convinced that we should export them in useable-
> > memory-range to crash dump kernel. They will be accessed through
> > acpi_os_map_memory() and so won't be required to be part of system ram
> > (or memblocks), I guess.
> > 	-> Bhupesh?
> 
> I forgot how arm64 kernel retrieve the memory ranges and initialize
> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> the memory according to the efi memmap?  For kdump kernel anything other
> than usable memory (which is from the dt node instead) should be
> reinitialized according to efi passed info, no?
> 
> > 
> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > via a kernel command line parameter, "memmap=".
> 
> memmap= is only used in old kexec-tools, now we are passing them via
> e820 table.
> 
> [snip]
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:43                                                         ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-18  5:43 UTC (permalink / raw)
  To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland,
	James Morse, kexec, linux-kernel

Fix the kexec list address.

On 12/18/17 at 01:40pm, Dave Young wrote:
> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > > >> <takahiro.akashi@linaro.org> wrote:
> > > >> > Bhupesh, Ard,
> > > >> >
> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > > >> >> Hi Ard, Akashi
> > > >> >>
> > > >> > (snip)
> > > >> >
> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > > >> >> identify its own usable memory and exclude, at its boot time, any
> > > >> >> other memory areas that are part of the panicked kernel's memory.
> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > > >> >> , for details)
> > > >> >
> > > >> > Right.
> > > >> >
> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > > >> >> with the crashkernel memory range:
> > > >> >>
> > > >> >>                 /* add linux,usable-memory-range */
> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > > >> >>                                 address_cells, size_cells);
> > > >> >>
> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > > >> >> , for details)
> > > >> >>
> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > > >> >> they are marked as System RAM or as RESERVED. As,
> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > > >> >>
> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > > >> >> ACPI memory and crashes while trying to access the same:
> > > >> >>
> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > > >> >> -r`.img --reuse-cmdline -d
> > > >> >>
> > > >> >> [snip..]
> > > >> >>
> > > >> >> Reserved memory range
> > > >> >> 000000000e800000-000000002e7fffff (0)
> > > >> >>
> > > >> >> Coredump memory ranges
> > > >> >> 0000000000000000-000000000e7fffff (0)
> > > >> >> 000000002e800000-000000003961ffff (0)
> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> > > >> >> 000000003ed60000-000000003fbfffff (0)
> > > >> >> 0000001040000000-0000001ffbffffff (0)
> > > >> >> 0000002000000000-0000002ffbffffff (0)
> > > >> >> 0000009000000000-0000009ffbffffff (0)
> > > >> >> 000000a000000000-000000affbffffff (0)
> > > >> >>
> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > > >> >> memory cap'ing passed to the crash kernel inside
> > > >> >> 'arch/arm64/mm/init.c' (see below):
> > > >> >>
> > > >> >> static void __init fdt_enforce_memory_region(void)
> > > >> >> {
> > > >> >>         struct memblock_region reg = {
> > > >> >>                 .size = 0,
> > > >> >>         };
> > > >> >>
> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > >> >>
> > > >> >>         if (reg.size)
> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > > >> >> comment this out */
> > > >> >> }
> > > >> >
> > > >> > Please just don't do that. It can cause a fatal damage on
> > > >> > memory contents of the *crashed* kernel.
> > > >> >
> > > >> >> 5). Both the above temporary solutions fix the problem.
> > > >> >>
> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > > >> >> fail.
> > > >> >>
> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > > >> >> dt node 'linux,usable-memory-range'
> > > >> >
> > > >> > I still don't understand why we need to carry over the information
> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > > >> > such regions are free to be reused by the kernel after some point of
> > > >> > initialization. Why does crash dump kernel need to know about them?
> > > >> >
> > > >>
> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> > > >> kernel, those regions needs to be preserved, which is why they are
> > > >> memblock_reserve()'d now.
> > > >
> > > > For my better understandings, who is actually accessing such regions
> > > > during boot time, uefi itself or efistub?
> > > >
> > > 
> > > No, only the kernel. This is where the ACPI tables are stored. For
> > > instance, on QEMU we have
> > > 
> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >   01000013)
> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > > BXPC 00000001)
> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > > BXPC 00000001)
> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > > BXPC 00000001)
> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > > BXPC 00000001)
> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > > BXPC 00000001)
> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > > BXPC 00000001)
> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > > BXPC 00000001)
> > > 
> > > covered by
> > > 
> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >  ...
> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > 
> > OK. I mistakenly understood those regions could be freed after exiting
> > UEFI boot services.
> > 
> > > 
> > > >> So it seems that kexec does not honour the memblock_reserve() table
> > > >> when booting the next kernel.
> > > >
> > > > not really.
> > > >
> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> > > >> > on crash dump kernel?)
> > > >> >
> > > >>
> > > >> I don't think so. And the change to the handling of ACPI reclaim
> > > >> regions only revealed the bug, not created it (given that other
> > > >> memblock_reserve regions may be affected as well)
> > > >
> > > > As whether we should honor such reserved regions over kexec'ing
> > > > depends on each one's specific nature, we will have to take care one-by-one.
> > > > As a matter of fact, no information about "reserved" memblocks is
> > > > exposed to user space (via proc/iomem).
> > > >
> > > 
> > > That is why I suggested (somewhere in this thread?) to not expose them
> > > as 'System RAM'. Do you think that could solve this?
> > 
> > Memblock-reserv'ing them is necessary to prevent their corruption and
> > marking them under another name in /proc/iomem would also be good in order
> > not to allocate them as part of crash kernel's memory.
> > 
> > But I'm not still convinced that we should export them in useable-
> > memory-range to crash dump kernel. They will be accessed through
> > acpi_os_map_memory() and so won't be required to be part of system ram
> > (or memblocks), I guess.
> > 	-> Bhupesh?
> 
> I forgot how arm64 kernel retrieve the memory ranges and initialize
> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> the memory according to the efi memmap?  For kdump kernel anything other
> than usable memory (which is from the dt node instead) should be
> reinitialized according to efi passed info, no?
> 
> > 
> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > via a kernel command line parameter, "memmap=".
> 
> memmap= is only used in old kexec-tools, now we are passing them via
> e820 table.
> 
> [snip]
> 
> Thanks
> Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18  5:16                                                             ` Dave Young
  (?)
  (?)
@ 2017-12-18  5:54                                                               ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18  5:54 UTC (permalink / raw)
  To: Dave Young
  Cc: Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec@lists.infradead.org
> 
> Also add linux-acpi list

Thank you.

> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> > >>> <takahiro.akashi@linaro.org> wrote:
> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > >>> >> <takahiro.akashi@linaro.org> wrote:
> > >>> >> > Bhupesh, Ard,
> > >>> >> >
> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > >>> >> >> Hi Ard, Akashi
> > >>> >> >>
> > >>> >> > (snip)
> > >>> >> >
> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > >>> >> >> , for details)
> > >>> >> >
> > >>> >> > Right.
> > >>> >> >
> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > >>> >> >> with the crashkernel memory range:
> > >>> >> >>
> > >>> >> >>                 /* add linux,usable-memory-range */
> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > >>> >> >>                                 address_cells, size_cells);
> > >>> >> >>
> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > >>> >> >> , for details)
> > >>> >> >>
> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > >>> >> >>
> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > >>> >> >> ACPI memory and crashes while trying to access the same:
> > >>> >> >>
> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > >>> >> >> -r`.img --reuse-cmdline -d
> > >>> >> >>
> > >>> >> >> [snip..]
> > >>> >> >>
> > >>> >> >> Reserved memory range
> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> > >>> >> >>
> > >>> >> >> Coredump memory ranges
> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> > >>> >> >> 000000002e800000-000000003961ffff (0)
> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> > >>> >> >> 000000a000000000-000000affbffffff (0)
> > >>> >> >>
> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > >>> >> >> memory cap'ing passed to the crash kernel inside
> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> > >>> >> >>
> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> > >>> >> >> {
> > >>> >> >>         struct memblock_region reg = {
> > >>> >> >>                 .size = 0,
> > >>> >> >>         };
> > >>> >> >>
> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > >>> >> >>
> > >>> >> >>         if (reg.size)
> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > >>> >> >> comment this out */
> > >>> >> >> }
> > >>> >> >
> > >>> >> > Please just don't do that. It can cause a fatal damage on
> > >>> >> > memory contents of the *crashed* kernel.
> > >>> >> >
> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> > >>> >> >>
> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > >>> >> >> fail.
> > >>> >> >>
> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > >>> >> >> dt node 'linux,usable-memory-range'
> > >>> >> >
> > >>> >> > I still don't understand why we need to carry over the information
> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > >>> >> > such regions are free to be reused by the kernel after some point of
> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> > >>> >> >
> > >>> >>
> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> > >>> >> kernel, those regions needs to be preserved, which is why they are
> > >>> >> memblock_reserve()'d now.
> > >>> >
> > >>> > For my better understandings, who is actually accessing such regions
> > >>> > during boot time, uefi itself or efistub?
> > >>> >
> > >>>
> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> > >>> instance, on QEMU we have
> > >>>
> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >>>   01000013)
> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > >>> BXPC 00000001)
> > >>>
> > >>> covered by
> > >>>
> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >>>  ...
> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > >>
> > >> OK. I mistakenly understood those regions could be freed after exiting
> > >> UEFI boot services.
> > >>
> > >>>
> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> > >>> >> when booting the next kernel.
> > >>> >
> > >>> > not really.
> > >>> >
> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> > >>> >> > on crash dump kernel?)
> > >>> >> >
> > >>> >>
> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> > >>> >> regions only revealed the bug, not created it (given that other
> > >>> >> memblock_reserve regions may be affected as well)
> > >>> >
> > >>> > As whether we should honor such reserved regions over kexec'ing
> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> > >>> > As a matter of fact, no information about "reserved" memblocks is
> > >>> > exposed to user space (via proc/iomem).
> > >>> >
> > >>>
> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> > >>> as 'System RAM'. Do you think that could solve this?
> > >>
> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> > >> marking them under another name in /proc/iomem would also be good in order
> > >> not to allocate them as part of crash kernel's memory.
> > >>
> > >
> > > I agree. However, this may not be entirely trivial, since iterating
> > > over the memblock_reserved table and creating iomem entries may result
> > > in collisions.
> > 
> > I found a method (using the patch I shared earlier in this thread) to mark these
> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> > reserved regions.
> > 
> > >> But I'm not still convinced that we should export them in useable-
> > >> memory-range to crash dump kernel. They will be accessed through
> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> > >> (or memblocks), I guess.
> > >
> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > > which is exactly what we want in this case.
> > 
> > Now this is what is confusing me. I don't see the above happening.
> > 
> > I see that the primary kernel boots up and adds the ACPI regions via:
> > acpi_os_ioremap
> >     -> ioremap_cache
> > 
> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> > variant.

It is natural if that region is out of memblocks.

> > And it fails while accessing the ACPI tables:
> > 
> > [    0.039205] ACPI: Core revision 20170728
> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP

this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
As ioremap() makes the mapping as "Device memory", unaligned memory
access won't be allowed.

> > [    0.100022] Modules linked in:
> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> > pstate: 60000045
> > [    0.132647] sp : ffff000008ccfb40
> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> > [    0.223224] Call trace:
> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> > ffff0000095e3980 ffff000008ccfbe0
> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> > ffff000008ccfc50 0000000000000000
> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> > 00000000ffffff76 0000000000000006
> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> > 000000000000038e 0000000000000000
> > [    0.263843] fa80: 0000000000000000 0000000000000000
> > 0000000000000005 000000000000001b
> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> > ffff000009710027 0000000000000001
> > [    0.279667] fac0: 0000000000000001 000000000000001b
> > 0000000000000000 ffff0000088be820
> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> > ffff00000849b4f8 ffff000008ccfb40
> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> > ffff000008ccfb40 ffff000008260a18
> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> > ffff000008ccfb40 ffff0000084a6764
> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> > [    0.399160] Kernel panic - not syncing: Fatal exception
> > [    0.404437] Rebooting in 10 seconds.
> > 
> > So, I think the linear mapping done by the primary kernel does not
> > make these accessible in the crash kernel directly.
> > 
> > Any pointers?
> 
> Can you get the code line number for acpi_ns_lookup+0x25c?

So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
accesses?
(I didn't find out how unaligned accesses could happen there.)

Thanks,
-Takahiro AKASHI

> > 
> > Regards,
> > Bhupesh
> > 
> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > >> via a kernel command line parameter, "memmap=".
> > >>
> > _______________________________________________
> > kexec mailing list -- kexec@lists.fedoraproject.org
> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:54                                                               ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18  5:54 UTC (permalink / raw)
  To: Dave Young
  Cc: Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec@lists.infradead.org
> 
> Also add linux-acpi list

Thank you.

> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> > >>> <takahiro.akashi@linaro.org> wrote:
> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > >>> >> <takahiro.akashi@linaro.org> wrote:
> > >>> >> > Bhupesh, Ard,
> > >>> >> >
> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > >>> >> >> Hi Ard, Akashi
> > >>> >> >>
> > >>> >> > (snip)
> > >>> >> >
> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > >>> >> >> , for details)
> > >>> >> >
> > >>> >> > Right.
> > >>> >> >
> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > >>> >> >> with the crashkernel memory range:
> > >>> >> >>
> > >>> >> >>                 /* add linux,usable-memory-range */
> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > >>> >> >>                                 address_cells, size_cells);
> > >>> >> >>
> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > >>> >> >> , for details)
> > >>> >> >>
> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > >>> >> >>
> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > >>> >> >> ACPI memory and crashes while trying to access the same:
> > >>> >> >>
> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > >>> >> >> -r`.img --reuse-cmdline -d
> > >>> >> >>
> > >>> >> >> [snip..]
> > >>> >> >>
> > >>> >> >> Reserved memory range
> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> > >>> >> >>
> > >>> >> >> Coredump memory ranges
> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> > >>> >> >> 000000002e800000-000000003961ffff (0)
> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> > >>> >> >> 000000a000000000-000000affbffffff (0)
> > >>> >> >>
> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > >>> >> >> memory cap'ing passed to the crash kernel inside
> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> > >>> >> >>
> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> > >>> >> >> {
> > >>> >> >>         struct memblock_region reg = {
> > >>> >> >>                 .size = 0,
> > >>> >> >>         };
> > >>> >> >>
> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > >>> >> >>
> > >>> >> >>         if (reg.size)
> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > >>> >> >> comment this out */
> > >>> >> >> }
> > >>> >> >
> > >>> >> > Please just don't do that. It can cause a fatal damage on
> > >>> >> > memory contents of the *crashed* kernel.
> > >>> >> >
> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> > >>> >> >>
> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > >>> >> >> fail.
> > >>> >> >>
> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > >>> >> >> dt node 'linux,usable-memory-range'
> > >>> >> >
> > >>> >> > I still don't understand why we need to carry over the information
> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > >>> >> > such regions are free to be reused by the kernel after some point of
> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> > >>> >> >
> > >>> >>
> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> > >>> >> kernel, those regions needs to be preserved, which is why they are
> > >>> >> memblock_reserve()'d now.
> > >>> >
> > >>> > For my better understandings, who is actually accessing such regions
> > >>> > during boot time, uefi itself or efistub?
> > >>> >
> > >>>
> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> > >>> instance, on QEMU we have
> > >>>
> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >>>   01000013)
> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > >>> BXPC 00000001)
> > >>>
> > >>> covered by
> > >>>
> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >>>  ...
> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > >>
> > >> OK. I mistakenly understood those regions could be freed after exiting
> > >> UEFI boot services.
> > >>
> > >>>
> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> > >>> >> when booting the next kernel.
> > >>> >
> > >>> > not really.
> > >>> >
> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> > >>> >> > on crash dump kernel?)
> > >>> >> >
> > >>> >>
> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> > >>> >> regions only revealed the bug, not created it (given that other
> > >>> >> memblock_reserve regions may be affected as well)
> > >>> >
> > >>> > As whether we should honor such reserved regions over kexec'ing
> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> > >>> > As a matter of fact, no information about "reserved" memblocks is
> > >>> > exposed to user space (via proc/iomem).
> > >>> >
> > >>>
> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> > >>> as 'System RAM'. Do you think that could solve this?
> > >>
> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> > >> marking them under another name in /proc/iomem would also be good in order
> > >> not to allocate them as part of crash kernel's memory.
> > >>
> > >
> > > I agree. However, this may not be entirely trivial, since iterating
> > > over the memblock_reserved table and creating iomem entries may result
> > > in collisions.
> > 
> > I found a method (using the patch I shared earlier in this thread) to mark these
> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> > reserved regions.
> > 
> > >> But I'm not still convinced that we should export them in useable-
> > >> memory-range to crash dump kernel. They will be accessed through
> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> > >> (or memblocks), I guess.
> > >
> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > > which is exactly what we want in this case.
> > 
> > Now this is what is confusing me. I don't see the above happening.
> > 
> > I see that the primary kernel boots up and adds the ACPI regions via:
> > acpi_os_ioremap
> >     -> ioremap_cache
> > 
> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> > variant.

It is natural if that region is out of memblocks.

> > And it fails while accessing the ACPI tables:
> > 
> > [    0.039205] ACPI: Core revision 20170728
> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP

this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
As ioremap() makes the mapping as "Device memory", unaligned memory
access won't be allowed.

> > [    0.100022] Modules linked in:
> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> > pstate: 60000045
> > [    0.132647] sp : ffff000008ccfb40
> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> > [    0.223224] Call trace:
> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> > ffff0000095e3980 ffff000008ccfbe0
> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> > ffff000008ccfc50 0000000000000000
> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> > 00000000ffffff76 0000000000000006
> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> > 000000000000038e 0000000000000000
> > [    0.263843] fa80: 0000000000000000 0000000000000000
> > 0000000000000005 000000000000001b
> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> > ffff000009710027 0000000000000001
> > [    0.279667] fac0: 0000000000000001 000000000000001b
> > 0000000000000000 ffff0000088be820
> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> > ffff00000849b4f8 ffff000008ccfb40
> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> > ffff000008ccfb40 ffff000008260a18
> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> > ffff000008ccfb40 ffff0000084a6764
> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> > [    0.399160] Kernel panic - not syncing: Fatal exception
> > [    0.404437] Rebooting in 10 seconds.
> > 
> > So, I think the linear mapping done by the primary kernel does not
> > make these accessible in the crash kernel directly.
> > 
> > Any pointers?
> 
> Can you get the code line number for acpi_ns_lookup+0x25c?

So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
accesses?
(I didn't find out how unaligned accesses could happen there.)

Thanks,
-Takahiro AKASHI

> > 
> > Regards,
> > Bhupesh
> > 
> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > >> via a kernel command line parameter, "memmap=".
> > >>
> > _______________________________________________
> > kexec mailing list -- kexec@lists.fedoraproject.org
> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:54                                                               ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18  5:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec at lists.infradead.org
> 
> Also add linux-acpi list

Thank you.

> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> > >>> <takahiro.akashi@linaro.org> wrote:
> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > >>> >> <takahiro.akashi@linaro.org> wrote:
> > >>> >> > Bhupesh, Ard,
> > >>> >> >
> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > >>> >> >> Hi Ard, Akashi
> > >>> >> >>
> > >>> >> > (snip)
> > >>> >> >
> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > >>> >> >> , for details)
> > >>> >> >
> > >>> >> > Right.
> > >>> >> >
> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > >>> >> >> with the crashkernel memory range:
> > >>> >> >>
> > >>> >> >>                 /* add linux,usable-memory-range */
> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > >>> >> >>                                 address_cells, size_cells);
> > >>> >> >>
> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > >>> >> >> , for details)
> > >>> >> >>
> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > >>> >> >>
> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > >>> >> >> ACPI memory and crashes while trying to access the same:
> > >>> >> >>
> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > >>> >> >> -r`.img --reuse-cmdline -d
> > >>> >> >>
> > >>> >> >> [snip..]
> > >>> >> >>
> > >>> >> >> Reserved memory range
> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> > >>> >> >>
> > >>> >> >> Coredump memory ranges
> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> > >>> >> >> 000000002e800000-000000003961ffff (0)
> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> > >>> >> >> 000000a000000000-000000affbffffff (0)
> > >>> >> >>
> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > >>> >> >> memory cap'ing passed to the crash kernel inside
> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> > >>> >> >>
> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> > >>> >> >> {
> > >>> >> >>         struct memblock_region reg = {
> > >>> >> >>                 .size = 0,
> > >>> >> >>         };
> > >>> >> >>
> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > >>> >> >>
> > >>> >> >>         if (reg.size)
> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > >>> >> >> comment this out */
> > >>> >> >> }
> > >>> >> >
> > >>> >> > Please just don't do that. It can cause a fatal damage on
> > >>> >> > memory contents of the *crashed* kernel.
> > >>> >> >
> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> > >>> >> >>
> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > >>> >> >> fail.
> > >>> >> >>
> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > >>> >> >> dt node 'linux,usable-memory-range'
> > >>> >> >
> > >>> >> > I still don't understand why we need to carry over the information
> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > >>> >> > such regions are free to be reused by the kernel after some point of
> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> > >>> >> >
> > >>> >>
> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> > >>> >> kernel, those regions needs to be preserved, which is why they are
> > >>> >> memblock_reserve()'d now.
> > >>> >
> > >>> > For my better understandings, who is actually accessing such regions
> > >>> > during boot time, uefi itself or efistub?
> > >>> >
> > >>>
> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> > >>> instance, on QEMU we have
> > >>>
> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >>>   01000013)
> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > >>> BXPC 00000001)
> > >>>
> > >>> covered by
> > >>>
> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >>>  ...
> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > >>
> > >> OK. I mistakenly understood those regions could be freed after exiting
> > >> UEFI boot services.
> > >>
> > >>>
> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> > >>> >> when booting the next kernel.
> > >>> >
> > >>> > not really.
> > >>> >
> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> > >>> >> > on crash dump kernel?)
> > >>> >> >
> > >>> >>
> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> > >>> >> regions only revealed the bug, not created it (given that other
> > >>> >> memblock_reserve regions may be affected as well)
> > >>> >
> > >>> > As whether we should honor such reserved regions over kexec'ing
> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> > >>> > As a matter of fact, no information about "reserved" memblocks is
> > >>> > exposed to user space (via proc/iomem).
> > >>> >
> > >>>
> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> > >>> as 'System RAM'. Do you think that could solve this?
> > >>
> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> > >> marking them under another name in /proc/iomem would also be good in order
> > >> not to allocate them as part of crash kernel's memory.
> > >>
> > >
> > > I agree. However, this may not be entirely trivial, since iterating
> > > over the memblock_reserved table and creating iomem entries may result
> > > in collisions.
> > 
> > I found a method (using the patch I shared earlier in this thread) to mark these
> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> > reserved regions.
> > 
> > >> But I'm not still convinced that we should export them in useable-
> > >> memory-range to crash dump kernel. They will be accessed through
> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> > >> (or memblocks), I guess.
> > >
> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > > which is exactly what we want in this case.
> > 
> > Now this is what is confusing me. I don't see the above happening.
> > 
> > I see that the primary kernel boots up and adds the ACPI regions via:
> > acpi_os_ioremap
> >     -> ioremap_cache
> > 
> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> > variant.

It is natural if that region is out of memblocks.

> > And it fails while accessing the ACPI tables:
> > 
> > [    0.039205] ACPI: Core revision 20170728
> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP

this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
As ioremap() makes the mapping as "Device memory", unaligned memory
access won't be allowed.

> > [    0.100022] Modules linked in:
> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> > pstate: 60000045
> > [    0.132647] sp : ffff000008ccfb40
> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> > [    0.223224] Call trace:
> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> > ffff0000095e3980 ffff000008ccfbe0
> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> > ffff000008ccfc50 0000000000000000
> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> > 00000000ffffff76 0000000000000006
> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> > 000000000000038e 0000000000000000
> > [    0.263843] fa80: 0000000000000000 0000000000000000
> > 0000000000000005 000000000000001b
> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> > ffff000009710027 0000000000000001
> > [    0.279667] fac0: 0000000000000001 000000000000001b
> > 0000000000000000 ffff0000088be820
> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> > ffff00000849b4f8 ffff000008ccfb40
> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> > ffff000008ccfb40 ffff000008260a18
> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> > ffff000008ccfb40 ffff0000084a6764
> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> > [    0.399160] Kernel panic - not syncing: Fatal exception
> > [    0.404437] Rebooting in 10 seconds.
> > 
> > So, I think the linear mapping done by the primary kernel does not
> > make these accessible in the crash kernel directly.
> > 
> > Any pointers?
> 
> Can you get the code line number for acpi_ns_lookup+0x25c?

So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
accesses?
(I didn't find out how unaligned accesses could happen there.)

Thanks,
-Takahiro AKASHI

> > 
> > Regards,
> > Bhupesh
> > 
> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > >> via a kernel command line parameter, "memmap=".
> > >>
> > _______________________________________________
> > kexec mailing list -- kexec at lists.fedoraproject.org
> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  5:54                                                               ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18  5:54 UTC (permalink / raw)
  To: Dave Young
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming,
	Bhupesh Sharma, kexec, linux-kernel, linux-acpi, James Morse,
	Bhupesh SHARMA, linux-arm-kernel

On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec@lists.infradead.org
> 
> Also add linux-acpi list

Thank you.

> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> > >>> <takahiro.akashi@linaro.org> wrote:
> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > >>> >> <takahiro.akashi@linaro.org> wrote:
> > >>> >> > Bhupesh, Ard,
> > >>> >> >
> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > >>> >> >> Hi Ard, Akashi
> > >>> >> >>
> > >>> >> > (snip)
> > >>> >> >
> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > >>> >> >> , for details)
> > >>> >> >
> > >>> >> > Right.
> > >>> >> >
> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > >>> >> >> with the crashkernel memory range:
> > >>> >> >>
> > >>> >> >>                 /* add linux,usable-memory-range */
> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > >>> >> >>                                 address_cells, size_cells);
> > >>> >> >>
> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > >>> >> >> , for details)
> > >>> >> >>
> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > >>> >> >>
> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > >>> >> >> ACPI memory and crashes while trying to access the same:
> > >>> >> >>
> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > >>> >> >> -r`.img --reuse-cmdline -d
> > >>> >> >>
> > >>> >> >> [snip..]
> > >>> >> >>
> > >>> >> >> Reserved memory range
> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> > >>> >> >>
> > >>> >> >> Coredump memory ranges
> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> > >>> >> >> 000000002e800000-000000003961ffff (0)
> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> > >>> >> >> 000000a000000000-000000affbffffff (0)
> > >>> >> >>
> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > >>> >> >> memory cap'ing passed to the crash kernel inside
> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> > >>> >> >>
> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> > >>> >> >> {
> > >>> >> >>         struct memblock_region reg = {
> > >>> >> >>                 .size = 0,
> > >>> >> >>         };
> > >>> >> >>
> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > >>> >> >>
> > >>> >> >>         if (reg.size)
> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > >>> >> >> comment this out */
> > >>> >> >> }
> > >>> >> >
> > >>> >> > Please just don't do that. It can cause a fatal damage on
> > >>> >> > memory contents of the *crashed* kernel.
> > >>> >> >
> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> > >>> >> >>
> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > >>> >> >> fail.
> > >>> >> >>
> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > >>> >> >> dt node 'linux,usable-memory-range'
> > >>> >> >
> > >>> >> > I still don't understand why we need to carry over the information
> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > >>> >> > such regions are free to be reused by the kernel after some point of
> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> > >>> >> >
> > >>> >>
> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> > >>> >> kernel, those regions needs to be preserved, which is why they are
> > >>> >> memblock_reserve()'d now.
> > >>> >
> > >>> > For my better understandings, who is actually accessing such regions
> > >>> > during boot time, uefi itself or efistub?
> > >>> >
> > >>>
> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> > >>> instance, on QEMU we have
> > >>>
> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >>>   01000013)
> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > >>> BXPC 00000001)
> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > >>> BXPC 00000001)
> > >>>
> > >>> covered by
> > >>>
> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >>>  ...
> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > >>
> > >> OK. I mistakenly understood those regions could be freed after exiting
> > >> UEFI boot services.
> > >>
> > >>>
> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> > >>> >> when booting the next kernel.
> > >>> >
> > >>> > not really.
> > >>> >
> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> > >>> >> > on crash dump kernel?)
> > >>> >> >
> > >>> >>
> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> > >>> >> regions only revealed the bug, not created it (given that other
> > >>> >> memblock_reserve regions may be affected as well)
> > >>> >
> > >>> > As whether we should honor such reserved regions over kexec'ing
> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> > >>> > As a matter of fact, no information about "reserved" memblocks is
> > >>> > exposed to user space (via proc/iomem).
> > >>> >
> > >>>
> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> > >>> as 'System RAM'. Do you think that could solve this?
> > >>
> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> > >> marking them under another name in /proc/iomem would also be good in order
> > >> not to allocate them as part of crash kernel's memory.
> > >>
> > >
> > > I agree. However, this may not be entirely trivial, since iterating
> > > over the memblock_reserved table and creating iomem entries may result
> > > in collisions.
> > 
> > I found a method (using the patch I shared earlier in this thread) to mark these
> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> > reserved regions.
> > 
> > >> But I'm not still convinced that we should export them in useable-
> > >> memory-range to crash dump kernel. They will be accessed through
> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> > >> (or memblocks), I guess.
> > >
> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> > > which is exactly what we want in this case.
> > 
> > Now this is what is confusing me. I don't see the above happening.
> > 
> > I see that the primary kernel boots up and adds the ACPI regions via:
> > acpi_os_ioremap
> >     -> ioremap_cache
> > 
> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> > variant.

It is natural if that region is out of memblocks.

> > And it fails while accessing the ACPI tables:
> > 
> > [    0.039205] ACPI: Core revision 20170728
> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP

this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
As ioremap() makes the mapping as "Device memory", unaligned memory
access won't be allowed.

> > [    0.100022] Modules linked in:
> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> > pstate: 60000045
> > [    0.132647] sp : ffff000008ccfb40
> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> > [    0.223224] Call trace:
> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> > ffff0000095e3980 ffff000008ccfbe0
> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> > ffff000008ccfc50 0000000000000000
> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> > 00000000ffffff76 0000000000000006
> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> > 000000000000038e 0000000000000000
> > [    0.263843] fa80: 0000000000000000 0000000000000000
> > 0000000000000005 000000000000001b
> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> > ffff000009710027 0000000000000001
> > [    0.279667] fac0: 0000000000000001 000000000000001b
> > 0000000000000000 ffff0000088be820
> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> > ffff00000849b4f8 ffff000008ccfb40
> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> > ffff000008ccfb40 ffff000008260a18
> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> > ffff000008ccfb40 ffff0000084a6764
> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> > [    0.399160] Kernel panic - not syncing: Fatal exception
> > [    0.404437] Rebooting in 10 seconds.
> > 
> > So, I think the linear mapping done by the primary kernel does not
> > make these accessible in the crash kernel directly.
> > 
> > Any pointers?
> 
> Can you get the code line number for acpi_ns_lookup+0x25c?

So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
accesses?
(I didn't find out how unaligned accesses could happen there.)

Thanks,
-Takahiro AKASHI

> > 
> > Regards,
> > Bhupesh
> > 
> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > >> via a kernel command line parameter, "memmap=".
> > >>
> > _______________________________________________
> > kexec mailing list -- kexec@lists.fedoraproject.org
> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18  5:54                                                               ` AKASHI Takahiro
  (?)
  (?)
@ 2017-12-18  8:59                                                                 ` Bhupesh SHARMA
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-12-18  8:59 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse,
	Bhupesh SHARMA, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland,
	Matt Fleming

On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
>> to kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>>
>> Also add linux-acpi list
>
> Thank you.
>
>> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> > <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> > >>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> > >>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > >>> >> > Bhupesh, Ard,
>> > >>> >> >
>> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> > >>> >> >> Hi Ard, Akashi
>> > >>> >> >>
>> > >>> >> > (snip)
>> > >>> >> >
>> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> > >>> >> >> , for details)
>> > >>> >> >
>> > >>> >> > Right.
>> > >>> >> >
>> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> > >>> >> >> with the crashkernel memory range:
>> > >>> >> >>
>> > >>> >> >>                 /* add linux,usable-memory-range */
>> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> > >>> >> >>                                 address_cells, size_cells);
>> > >>> >> >>
>> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> > >>> >> >> , for details)
>> > >>> >> >>
>> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> > >>> >> >>
>> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> > >>> >> >>
>> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> > >>> >> >> -r`.img --reuse-cmdline -d
>> > >>> >> >>
>> > >>> >> >> [snip..]
>> > >>> >> >>
>> > >>> >> >> Reserved memory range
>> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> > >>> >> >>
>> > >>> >> >> Coredump memory ranges
>> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> > >>> >> >>
>> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> > >>> >> >>
>> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> > >>> >> >> {
>> > >>> >> >>         struct memblock_region reg = {
>> > >>> >> >>                 .size = 0,
>> > >>> >> >>         };
>> > >>> >> >>
>> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> > >>> >> >>
>> > >>> >> >>         if (reg.size)
>> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> > >>> >> >> comment this out */
>> > >>> >> >> }
>> > >>> >> >
>> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> > >>> >> > memory contents of the *crashed* kernel.
>> > >>> >> >
>> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> > >>> >> >>
>> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> > >>> >> >> fail.
>> > >>> >> >>
>> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> > >>> >> >> dt node 'linux,usable-memory-range'
>> > >>> >> >
>> > >>> >> > I still don't understand why we need to carry over the information
>> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > >>> >> > such regions are free to be reused by the kernel after some point of
>> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> > >>> >> >
>> > >>> >>
>> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> > >>> >> memblock_reserve()'d now.
>> > >>> >
>> > >>> > For my better understandings, who is actually accessing such regions
>> > >>> > during boot time, uefi itself or efistub?
>> > >>> >
>> > >>>
>> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> > >>> instance, on QEMU we have
>> > >>>
>> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> > >>>   01000013)
>> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> > >>> BXPC 00000001)
>> > >>>
>> > >>> covered by
>> > >>>
>> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> > >>>  ...
>> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> > >>
>> > >> OK. I mistakenly understood those regions could be freed after exiting
>> > >> UEFI boot services.
>> > >>
>> > >>>
>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> > >>> >> when booting the next kernel.
>> > >>> >
>> > >>> > not really.
>> > >>> >
>> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> > >>> >> > on crash dump kernel?)
>> > >>> >> >
>> > >>> >>
>> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> > >>> >> regions only revealed the bug, not created it (given that other
>> > >>> >> memblock_reserve regions may be affected as well)
>> > >>> >
>> > >>> > As whether we should honor such reserved regions over kexec'ing
>> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> > >>> > exposed to user space (via proc/iomem).
>> > >>> >
>> > >>>
>> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> > >>> as 'System RAM'. Do you think that could solve this?
>> > >>
>> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> > >> marking them under another name in /proc/iomem would also be good in order
>> > >> not to allocate them as part of crash kernel's memory.
>> > >>
>> > >
>> > > I agree. However, this may not be entirely trivial, since iterating
>> > > over the memblock_reserved table and creating iomem entries may result
>> > > in collisions.
>> >
>> > I found a method (using the patch I shared earlier in this thread) to mark these
>> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> > reserved regions.
>> >
>> > >> But I'm not still convinced that we should export them in useable-
>> > >> memory-range to crash dump kernel. They will be accessed through
>> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> > >> (or memblocks), I guess.
>> > >
>> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > > which is exactly what we want in this case.
>> >
>> > Now this is what is confusing me. I don't see the above happening.
>> >
>> > I see that the primary kernel boots up and adds the ACPI regions via:
>> > acpi_os_ioremap
>> >     -> ioremap_cache
>> >
>> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> > variant.
>
> It is natural if that region is out of memblocks.

Thanks for the confirmation. This was my understanding as well.

>> > And it fails while accessing the ACPI tables:
>> >
>> > [    0.039205] ACPI: Core revision 20170728
>> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>
> this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> As ioremap() makes the mapping as "Device memory", unaligned memory
> access won't be allowed.
>
>> > [    0.100022] Modules linked in:
>> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> > pstate: 60000045
>> > [    0.132647] sp : ffff000008ccfb40
>> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> > [    0.223224] Call trace:
>> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> > ffff0000095e3980 ffff000008ccfbe0
>> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> > ffff000008ccfc50 0000000000000000
>> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> > 00000000ffffff76 0000000000000006
>> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> > 000000000000038e 0000000000000000
>> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> > 0000000000000005 000000000000001b
>> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> > ffff000009710027 0000000000000001
>> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> > 0000000000000000 ffff0000088be820
>> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> > ffff00000849b4f8 ffff000008ccfb40
>> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> > ffff000008ccfb40 ffff000008260a18
>> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> > ffff000008ccfb40 ffff0000084a6764
>> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> > [    0.404437] Rebooting in 10 seconds.
>> >
>> > So, I think the linear mapping done by the primary kernel does not
>> > make these accessible in the crash kernel directly.
>> >
>> > Any pointers?
>>
>> Can you get the code line number for acpi_ns_lookup+0x25c?
>
> So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> accesses?
> (I didn't find out how unaligned accesses could happen there.)
>

Right. Like I captured somewhere in this thread (perhaps the first
email on this subject),
this is indeed an unaligned address access.

Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
assigning this memory range
as device memory doesn't seem a neat solution as it means we are not
marking some thing with the right memory attribute and we can fall in
similar/related issues later.

Regarding the later suggestion, what I am seeing now is that the acpi
table access functions are perhaps reused from the earlier x86
implementation, but on the arm64 (or even arm) arch we should not be
allowing unaligned accesses which might cause UNDEFINED behaviour and
resultant crash.

So I can try going this approach and see if it works for me.

However, I am still not very sure as to why the crashkernel ranges
historically do not include the System RAM regions (which may include
the ACPI regions as well). These regions are available for the kernel
usage and perhaps should be exported to the crashkernel as well.

I am not fully aware of the previous discussions on capp'ing the
crashkernel memory being passed to the kdump kernel, but did we run
into any issues while doing so?

Also, even if I extend the kexec-tools to modify the
linux,usable-memory-range and add the ACPI regions to it, the
crashkernel fails to boot with the below message (I have added some
logic to print the DTB on the crash kernel boot start):

[    0.000000]     chosen {
[    0.000000]         linux,usable-memory-range
[    0.000000]  = <
[    0.000000] 0x00000000
[    0.000000] 0x0e800000
[    0.000000] 0x00000000
[    0.000000] 0x20000000
[    0.000000] 0x00000000
[    0.000000] 0x396c0000
[    0.000000] 0x00000000
[    0.000000] 0x000a0000
[    0.000000] 0x00000000
[    0.000000] 0x39770000
[    0.000000] 0x00000000
[    0.000000] 0x00040000
[    0.000000] 0x00000000
[    0.000000] 0x398a0000
[    0.000000] 0x00000000
[    0.000000] 0x00020000
[    0.000000] >
[    0.000000] ;

[snip..]

[    0.000000] linux,usable-memory-range base e800000, size 20000000
[    0.000000]  - e800000 ,  20000000
[    0.000000] linux,usable-memory-range base 396c0000, size a0000
[    0.000000]  - 396c0000 ,  a0000
[    0.000000] linux,usable-memory-range base 39770000, size 40000
[    0.000000]  - 39770000 ,  40000
[    0.000000] linux,usable-memory-range base 398a0000, size 20000
[    0.000000]  - 398a0000 ,  20000
[    0.000000] initrd not fully accessible via the linear mapping --
please check your bootloader ...
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
arm64_memblock_init+0x210/0x484
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
[    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.000000] PC is at arm64_memblock_init+0x210/0x484
[    0.000000] LR is at arm64_memblock_init+0x210/0x484
[    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
pstate: 600000c5
[    0.000000] sp : ffff000008ccfe80
[    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
[    0.000000] x27: 0000000011230000 x26: 00000000013b0000
[    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
[    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
[    0.000000] x21: ffff000008afa000 x20: ffff000008080000
[    0.000000] x19: ffff000008afa000 x18: 000000000c283806
[    0.000000] x17: 0000000000000000 x16: ffff000008d05580
[    0.000000] x15: 000000002be00842 x14: 79206b6365686320
[    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
[    0.000000] x11: 6d207261656e696c x10: 2065687420616976
[    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
[    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
[    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
[    0.000000] Call trace:
[    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
[    0.000000] fd40: 0000000000000056 0000000000000000
0000000000000000 0000000000000000
[    0.000000] fd60: 0000000000000001 ffff000008c96360
000000000000000d 746f6f622072756f
[    0.000000] fd80: ffff000008517414 00000000000000f4
2065687420616976 6d207261656e696c
[    0.000000] fda0: 2d20676e69707061 657361656c70202d
79206b6365686320 000000002be00842
[    0.000000] fdc0: ffff000008d05580 0000000000000000
000000000c283806 ffff000008afa000
[    0.000000] fde0: ffff000008080000 ffff000008afa000
ffff000009680000 ffff000008ec0000
[    0.000000] fe00: ffff000008cf3000 000000000fe80000
00000000013b0000 0000000011230000
[    0.000000] fe20: 000000000f370018 ffff000008ccfe80
ffff000008b76984 ffff000008ccfe80
[    0.000000] fe40: ffff000008b76984 00000000600000c5
ffff00000959b7a8 ffff000008ec0000
[    0.000000] fe60: ffffffffffffffff 0000000000000005
ffff000008ccfe80 ffff000008b76984
[    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
[    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x50/0x6c with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
[    0.000000] cma: Failed to reserve 512 MiB
[    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
------------   4.14.0+ #7
[    0.000000] Call trace:
[    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
[    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
[    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
[    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
[    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
[    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
[    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
[    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
[    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
allocate 0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]

I guess it is because of the 1G alignment requirement between the
kernel image and the initrd and how we populate the holes between the
kernel image, segments (including dtb) and the initrd from the
kexec-tools.

Akashi, any pointers on this will be helpful as well.

Regards,
Bhupesh


>> >
>> > Regards,
>> > Bhupesh
>> >
>> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> > >> via a kernel command line parameter, "memmap=".
>> > >>
>> > _______________________________________________
>> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org
>> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  8:59                                                                 ` Bhupesh SHARMA
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-12-18  8:59 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse,
	Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming

On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
>> to kexec@lists.infradead.org
>>
>> Also add linux-acpi list
>
> Thank you.
>
>> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> > <ard.biesheuvel@linaro.org> wrote:
>> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > > <takahiro.akashi@linaro.org> wrote:
>> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> > >>> <takahiro.akashi@linaro.org> wrote:
>> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> > >>> >> > Bhupesh, Ard,
>> > >>> >> >
>> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> > >>> >> >> Hi Ard, Akashi
>> > >>> >> >>
>> > >>> >> > (snip)
>> > >>> >> >
>> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> > >>> >> >> , for details)
>> > >>> >> >
>> > >>> >> > Right.
>> > >>> >> >
>> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> > >>> >> >> with the crashkernel memory range:
>> > >>> >> >>
>> > >>> >> >>                 /* add linux,usable-memory-range */
>> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> > >>> >> >>                                 address_cells, size_cells);
>> > >>> >> >>
>> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> > >>> >> >> , for details)
>> > >>> >> >>
>> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> > >>> >> >>
>> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> > >>> >> >>
>> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> > >>> >> >> -r`.img --reuse-cmdline -d
>> > >>> >> >>
>> > >>> >> >> [snip..]
>> > >>> >> >>
>> > >>> >> >> Reserved memory range
>> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> > >>> >> >>
>> > >>> >> >> Coredump memory ranges
>> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> > >>> >> >>
>> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> > >>> >> >>
>> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> > >>> >> >> {
>> > >>> >> >>         struct memblock_region reg = {
>> > >>> >> >>                 .size = 0,
>> > >>> >> >>         };
>> > >>> >> >>
>> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> > >>> >> >>
>> > >>> >> >>         if (reg.size)
>> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> > >>> >> >> comment this out */
>> > >>> >> >> }
>> > >>> >> >
>> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> > >>> >> > memory contents of the *crashed* kernel.
>> > >>> >> >
>> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> > >>> >> >>
>> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> > >>> >> >> fail.
>> > >>> >> >>
>> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> > >>> >> >> dt node 'linux,usable-memory-range'
>> > >>> >> >
>> > >>> >> > I still don't understand why we need to carry over the information
>> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > >>> >> > such regions are free to be reused by the kernel after some point of
>> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> > >>> >> >
>> > >>> >>
>> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> > >>> >> memblock_reserve()'d now.
>> > >>> >
>> > >>> > For my better understandings, who is actually accessing such regions
>> > >>> > during boot time, uefi itself or efistub?
>> > >>> >
>> > >>>
>> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> > >>> instance, on QEMU we have
>> > >>>
>> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> > >>>   01000013)
>> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> > >>> BXPC 00000001)
>> > >>>
>> > >>> covered by
>> > >>>
>> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> > >>>  ...
>> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> > >>
>> > >> OK. I mistakenly understood those regions could be freed after exiting
>> > >> UEFI boot services.
>> > >>
>> > >>>
>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> > >>> >> when booting the next kernel.
>> > >>> >
>> > >>> > not really.
>> > >>> >
>> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> > >>> >> > on crash dump kernel?)
>> > >>> >> >
>> > >>> >>
>> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> > >>> >> regions only revealed the bug, not created it (given that other
>> > >>> >> memblock_reserve regions may be affected as well)
>> > >>> >
>> > >>> > As whether we should honor such reserved regions over kexec'ing
>> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> > >>> > exposed to user space (via proc/iomem).
>> > >>> >
>> > >>>
>> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> > >>> as 'System RAM'. Do you think that could solve this?
>> > >>
>> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> > >> marking them under another name in /proc/iomem would also be good in order
>> > >> not to allocate them as part of crash kernel's memory.
>> > >>
>> > >
>> > > I agree. However, this may not be entirely trivial, since iterating
>> > > over the memblock_reserved table and creating iomem entries may result
>> > > in collisions.
>> >
>> > I found a method (using the patch I shared earlier in this thread) to mark these
>> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> > reserved regions.
>> >
>> > >> But I'm not still convinced that we should export them in useable-
>> > >> memory-range to crash dump kernel. They will be accessed through
>> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> > >> (or memblocks), I guess.
>> > >
>> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > > which is exactly what we want in this case.
>> >
>> > Now this is what is confusing me. I don't see the above happening.
>> >
>> > I see that the primary kernel boots up and adds the ACPI regions via:
>> > acpi_os_ioremap
>> >     -> ioremap_cache
>> >
>> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> > variant.
>
> It is natural if that region is out of memblocks.

Thanks for the confirmation. This was my understanding as well.

>> > And it fails while accessing the ACPI tables:
>> >
>> > [    0.039205] ACPI: Core revision 20170728
>> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>
> this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> As ioremap() makes the mapping as "Device memory", unaligned memory
> access won't be allowed.
>
>> > [    0.100022] Modules linked in:
>> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> > pstate: 60000045
>> > [    0.132647] sp : ffff000008ccfb40
>> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> > [    0.223224] Call trace:
>> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> > ffff0000095e3980 ffff000008ccfbe0
>> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> > ffff000008ccfc50 0000000000000000
>> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> > 00000000ffffff76 0000000000000006
>> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> > 000000000000038e 0000000000000000
>> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> > 0000000000000005 000000000000001b
>> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> > ffff000009710027 0000000000000001
>> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> > 0000000000000000 ffff0000088be820
>> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> > ffff00000849b4f8 ffff000008ccfb40
>> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> > ffff000008ccfb40 ffff000008260a18
>> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> > ffff000008ccfb40 ffff0000084a6764
>> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> > [    0.404437] Rebooting in 10 seconds.
>> >
>> > So, I think the linear mapping done by the primary kernel does not
>> > make these accessible in the crash kernel directly.
>> >
>> > Any pointers?
>>
>> Can you get the code line number for acpi_ns_lookup+0x25c?
>
> So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> accesses?
> (I didn't find out how unaligned accesses could happen there.)
>

Right. Like I captured somewhere in this thread (perhaps the first
email on this subject),
this is indeed an unaligned address access.

Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
assigning this memory range
as device memory doesn't seem a neat solution as it means we are not
marking some thing with the right memory attribute and we can fall in
similar/related issues later.

Regarding the later suggestion, what I am seeing now is that the acpi
table access functions are perhaps reused from the earlier x86
implementation, but on the arm64 (or even arm) arch we should not be
allowing unaligned accesses which might cause UNDEFINED behaviour and
resultant crash.

So I can try going this approach and see if it works for me.

However, I am still not very sure as to why the crashkernel ranges
historically do not include the System RAM regions (which may include
the ACPI regions as well). These regions are available for the kernel
usage and perhaps should be exported to the crashkernel as well.

I am not fully aware of the previous discussions on capp'ing the
crashkernel memory being passed to the kdump kernel, but did we run
into any issues while doing so?

Also, even if I extend the kexec-tools to modify the
linux,usable-memory-range and add the ACPI regions to it, the
crashkernel fails to boot with the below message (I have added some
logic to print the DTB on the crash kernel boot start):

[    0.000000]     chosen {
[    0.000000]         linux,usable-memory-range
[    0.000000]  = <
[    0.000000] 0x00000000
[    0.000000] 0x0e800000
[    0.000000] 0x00000000
[    0.000000] 0x20000000
[    0.000000] 0x00000000
[    0.000000] 0x396c0000
[    0.000000] 0x00000000
[    0.000000] 0x000a0000
[    0.000000] 0x00000000
[    0.000000] 0x39770000
[    0.000000] 0x00000000
[    0.000000] 0x00040000
[    0.000000] 0x00000000
[    0.000000] 0x398a0000
[    0.000000] 0x00000000
[    0.000000] 0x00020000
[    0.000000] >
[    0.000000] ;

[snip..]

[    0.000000] linux,usable-memory-range base e800000, size 20000000
[    0.000000]  - e800000 ,  20000000
[    0.000000] linux,usable-memory-range base 396c0000, size a0000
[    0.000000]  - 396c0000 ,  a0000
[    0.000000] linux,usable-memory-range base 39770000, size 40000
[    0.000000]  - 39770000 ,  40000
[    0.000000] linux,usable-memory-range base 398a0000, size 20000
[    0.000000]  - 398a0000 ,  20000
[    0.000000] initrd not fully accessible via the linear mapping --
please check your bootloader ...
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
arm64_memblock_init+0x210/0x484
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
[    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.000000] PC is at arm64_memblock_init+0x210/0x484
[    0.000000] LR is at arm64_memblock_init+0x210/0x484
[    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
pstate: 600000c5
[    0.000000] sp : ffff000008ccfe80
[    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
[    0.000000] x27: 0000000011230000 x26: 00000000013b0000
[    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
[    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
[    0.000000] x21: ffff000008afa000 x20: ffff000008080000
[    0.000000] x19: ffff000008afa000 x18: 000000000c283806
[    0.000000] x17: 0000000000000000 x16: ffff000008d05580
[    0.000000] x15: 000000002be00842 x14: 79206b6365686320
[    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
[    0.000000] x11: 6d207261656e696c x10: 2065687420616976
[    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
[    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
[    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
[    0.000000] Call trace:
[    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
[    0.000000] fd40: 0000000000000056 0000000000000000
0000000000000000 0000000000000000
[    0.000000] fd60: 0000000000000001 ffff000008c96360
000000000000000d 746f6f622072756f
[    0.000000] fd80: ffff000008517414 00000000000000f4
2065687420616976 6d207261656e696c
[    0.000000] fda0: 2d20676e69707061 657361656c70202d
79206b6365686320 000000002be00842
[    0.000000] fdc0: ffff000008d05580 0000000000000000
000000000c283806 ffff000008afa000
[    0.000000] fde0: ffff000008080000 ffff000008afa000
ffff000009680000 ffff000008ec0000
[    0.000000] fe00: ffff000008cf3000 000000000fe80000
00000000013b0000 0000000011230000
[    0.000000] fe20: 000000000f370018 ffff000008ccfe80
ffff000008b76984 ffff000008ccfe80
[    0.000000] fe40: ffff000008b76984 00000000600000c5
ffff00000959b7a8 ffff000008ec0000
[    0.000000] fe60: ffffffffffffffff 0000000000000005
ffff000008ccfe80 ffff000008b76984
[    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
[    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x50/0x6c with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
[    0.000000] cma: Failed to reserve 512 MiB
[    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
------------   4.14.0+ #7
[    0.000000] Call trace:
[    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
[    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
[    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
[    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
[    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
[    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
[    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
[    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
[    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
allocate 0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]

I guess it is because of the 1G alignment requirement between the
kernel image and the initrd and how we populate the holes between the
kernel image, segments (including dtb) and the initrd from the
kexec-tools.

Akashi, any pointers on this will be helpful as well.

Regards,
Bhupesh


>> >
>> > Regards,
>> > Bhupesh
>> >
>> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> > >> via a kernel command line parameter, "memmap=".
>> > >>
>> > _______________________________________________
>> > kexec mailing list -- kexec@lists.fedoraproject.org
>> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  8:59                                                                 ` Bhupesh SHARMA
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-12-18  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
>> to kexec at lists.infradead.org
>>
>> Also add linux-acpi list
>
> Thank you.
>
>> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> > <ard.biesheuvel@linaro.org> wrote:
>> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > > <takahiro.akashi@linaro.org> wrote:
>> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> > >>> <takahiro.akashi@linaro.org> wrote:
>> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> > >>> >> > Bhupesh, Ard,
>> > >>> >> >
>> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> > >>> >> >> Hi Ard, Akashi
>> > >>> >> >>
>> > >>> >> > (snip)
>> > >>> >> >
>> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> > >>> >> >> , for details)
>> > >>> >> >
>> > >>> >> > Right.
>> > >>> >> >
>> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> > >>> >> >> with the crashkernel memory range:
>> > >>> >> >>
>> > >>> >> >>                 /* add linux,usable-memory-range */
>> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> > >>> >> >>                                 address_cells, size_cells);
>> > >>> >> >>
>> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> > >>> >> >> , for details)
>> > >>> >> >>
>> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> > >>> >> >>
>> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> > >>> >> >>
>> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> > >>> >> >> -r`.img --reuse-cmdline -d
>> > >>> >> >>
>> > >>> >> >> [snip..]
>> > >>> >> >>
>> > >>> >> >> Reserved memory range
>> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> > >>> >> >>
>> > >>> >> >> Coredump memory ranges
>> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> > >>> >> >>
>> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> > >>> >> >>
>> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> > >>> >> >> {
>> > >>> >> >>         struct memblock_region reg = {
>> > >>> >> >>                 .size = 0,
>> > >>> >> >>         };
>> > >>> >> >>
>> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> > >>> >> >>
>> > >>> >> >>         if (reg.size)
>> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> > >>> >> >> comment this out */
>> > >>> >> >> }
>> > >>> >> >
>> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> > >>> >> > memory contents of the *crashed* kernel.
>> > >>> >> >
>> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> > >>> >> >>
>> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> > >>> >> >> fail.
>> > >>> >> >>
>> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> > >>> >> >> dt node 'linux,usable-memory-range'
>> > >>> >> >
>> > >>> >> > I still don't understand why we need to carry over the information
>> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > >>> >> > such regions are free to be reused by the kernel after some point of
>> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> > >>> >> >
>> > >>> >>
>> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> > >>> >> memblock_reserve()'d now.
>> > >>> >
>> > >>> > For my better understandings, who is actually accessing such regions
>> > >>> > during boot time, uefi itself or efistub?
>> > >>> >
>> > >>>
>> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> > >>> instance, on QEMU we have
>> > >>>
>> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> > >>>   01000013)
>> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> > >>> BXPC 00000001)
>> > >>>
>> > >>> covered by
>> > >>>
>> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> > >>>  ...
>> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> > >>
>> > >> OK. I mistakenly understood those regions could be freed after exiting
>> > >> UEFI boot services.
>> > >>
>> > >>>
>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> > >>> >> when booting the next kernel.
>> > >>> >
>> > >>> > not really.
>> > >>> >
>> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> > >>> >> > on crash dump kernel?)
>> > >>> >> >
>> > >>> >>
>> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> > >>> >> regions only revealed the bug, not created it (given that other
>> > >>> >> memblock_reserve regions may be affected as well)
>> > >>> >
>> > >>> > As whether we should honor such reserved regions over kexec'ing
>> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> > >>> > exposed to user space (via proc/iomem).
>> > >>> >
>> > >>>
>> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> > >>> as 'System RAM'. Do you think that could solve this?
>> > >>
>> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> > >> marking them under another name in /proc/iomem would also be good in order
>> > >> not to allocate them as part of crash kernel's memory.
>> > >>
>> > >
>> > > I agree. However, this may not be entirely trivial, since iterating
>> > > over the memblock_reserved table and creating iomem entries may result
>> > > in collisions.
>> >
>> > I found a method (using the patch I shared earlier in this thread) to mark these
>> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> > reserved regions.
>> >
>> > >> But I'm not still convinced that we should export them in useable-
>> > >> memory-range to crash dump kernel. They will be accessed through
>> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> > >> (or memblocks), I guess.
>> > >
>> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > > which is exactly what we want in this case.
>> >
>> > Now this is what is confusing me. I don't see the above happening.
>> >
>> > I see that the primary kernel boots up and adds the ACPI regions via:
>> > acpi_os_ioremap
>> >     -> ioremap_cache
>> >
>> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> > variant.
>
> It is natural if that region is out of memblocks.

Thanks for the confirmation. This was my understanding as well.

>> > And it fails while accessing the ACPI tables:
>> >
>> > [    0.039205] ACPI: Core revision 20170728
>> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>
> this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> As ioremap() makes the mapping as "Device memory", unaligned memory
> access won't be allowed.
>
>> > [    0.100022] Modules linked in:
>> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> > pstate: 60000045
>> > [    0.132647] sp : ffff000008ccfb40
>> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> > [    0.223224] Call trace:
>> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> > ffff0000095e3980 ffff000008ccfbe0
>> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> > ffff000008ccfc50 0000000000000000
>> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> > 00000000ffffff76 0000000000000006
>> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> > 000000000000038e 0000000000000000
>> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> > 0000000000000005 000000000000001b
>> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> > ffff000009710027 0000000000000001
>> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> > 0000000000000000 ffff0000088be820
>> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> > ffff00000849b4f8 ffff000008ccfb40
>> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> > ffff000008ccfb40 ffff000008260a18
>> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> > ffff000008ccfb40 ffff0000084a6764
>> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> > [    0.404437] Rebooting in 10 seconds.
>> >
>> > So, I think the linear mapping done by the primary kernel does not
>> > make these accessible in the crash kernel directly.
>> >
>> > Any pointers?
>>
>> Can you get the code line number for acpi_ns_lookup+0x25c?
>
> So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> accesses?
> (I didn't find out how unaligned accesses could happen there.)
>

Right. Like I captured somewhere in this thread (perhaps the first
email on this subject),
this is indeed an unaligned address access.

Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
assigning this memory range
as device memory doesn't seem a neat solution as it means we are not
marking some thing with the right memory attribute and we can fall in
similar/related issues later.

Regarding the later suggestion, what I am seeing now is that the acpi
table access functions are perhaps reused from the earlier x86
implementation, but on the arm64 (or even arm) arch we should not be
allowing unaligned accesses which might cause UNDEFINED behaviour and
resultant crash.

So I can try going this approach and see if it works for me.

However, I am still not very sure as to why the crashkernel ranges
historically do not include the System RAM regions (which may include
the ACPI regions as well). These regions are available for the kernel
usage and perhaps should be exported to the crashkernel as well.

I am not fully aware of the previous discussions on capp'ing the
crashkernel memory being passed to the kdump kernel, but did we run
into any issues while doing so?

Also, even if I extend the kexec-tools to modify the
linux,usable-memory-range and add the ACPI regions to it, the
crashkernel fails to boot with the below message (I have added some
logic to print the DTB on the crash kernel boot start):

[    0.000000]     chosen {
[    0.000000]         linux,usable-memory-range
[    0.000000]  = <
[    0.000000] 0x00000000
[    0.000000] 0x0e800000
[    0.000000] 0x00000000
[    0.000000] 0x20000000
[    0.000000] 0x00000000
[    0.000000] 0x396c0000
[    0.000000] 0x00000000
[    0.000000] 0x000a0000
[    0.000000] 0x00000000
[    0.000000] 0x39770000
[    0.000000] 0x00000000
[    0.000000] 0x00040000
[    0.000000] 0x00000000
[    0.000000] 0x398a0000
[    0.000000] 0x00000000
[    0.000000] 0x00020000
[    0.000000] >
[    0.000000] ;

[snip..]

[    0.000000] linux,usable-memory-range base e800000, size 20000000
[    0.000000]  - e800000 ,  20000000
[    0.000000] linux,usable-memory-range base 396c0000, size a0000
[    0.000000]  - 396c0000 ,  a0000
[    0.000000] linux,usable-memory-range base 39770000, size 40000
[    0.000000]  - 39770000 ,  40000
[    0.000000] linux,usable-memory-range base 398a0000, size 20000
[    0.000000]  - 398a0000 ,  20000
[    0.000000] initrd not fully accessible via the linear mapping --
please check your bootloader ...
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
arm64_memblock_init+0x210/0x484
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
[    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.000000] PC is at arm64_memblock_init+0x210/0x484
[    0.000000] LR is at arm64_memblock_init+0x210/0x484
[    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
pstate: 600000c5
[    0.000000] sp : ffff000008ccfe80
[    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
[    0.000000] x27: 0000000011230000 x26: 00000000013b0000
[    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
[    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
[    0.000000] x21: ffff000008afa000 x20: ffff000008080000
[    0.000000] x19: ffff000008afa000 x18: 000000000c283806
[    0.000000] x17: 0000000000000000 x16: ffff000008d05580
[    0.000000] x15: 000000002be00842 x14: 79206b6365686320
[    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
[    0.000000] x11: 6d207261656e696c x10: 2065687420616976
[    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
[    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
[    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
[    0.000000] Call trace:
[    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
[    0.000000] fd40: 0000000000000056 0000000000000000
0000000000000000 0000000000000000
[    0.000000] fd60: 0000000000000001 ffff000008c96360
000000000000000d 746f6f622072756f
[    0.000000] fd80: ffff000008517414 00000000000000f4
2065687420616976 6d207261656e696c
[    0.000000] fda0: 2d20676e69707061 657361656c70202d
79206b6365686320 000000002be00842
[    0.000000] fdc0: ffff000008d05580 0000000000000000
000000000c283806 ffff000008afa000
[    0.000000] fde0: ffff000008080000 ffff000008afa000
ffff000009680000 ffff000008ec0000
[    0.000000] fe00: ffff000008cf3000 000000000fe80000
00000000013b0000 0000000011230000
[    0.000000] fe20: 000000000f370018 ffff000008ccfe80
ffff000008b76984 ffff000008ccfe80
[    0.000000] fe40: ffff000008b76984 00000000600000c5
ffff00000959b7a8 ffff000008ec0000
[    0.000000] fe60: ffffffffffffffff 0000000000000005
ffff000008ccfe80 ffff000008b76984
[    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
[    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x50/0x6c with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
[    0.000000] cma: Failed to reserve 512 MiB
[    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
------------   4.14.0+ #7
[    0.000000] Call trace:
[    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
[    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
[    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
[    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
[    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
[    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
[    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
[    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
[    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
allocate 0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]

I guess it is because of the 1G alignment requirement between the
kernel image and the initrd and how we populate the holes between the
kernel image, segments (including dtb) and the initrd from the
kexec-tools.

Akashi, any pointers on this will be helpful as well.

Regards,
Bhupesh


>> >
>> > Regards,
>> > Bhupesh
>> >
>> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> > >> via a kernel command line parameter, "memmap=".
>> > >>
>> > _______________________________________________
>> > kexec mailing list -- kexec at lists.fedoraproject.org
>> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18  8:59                                                                 ` Bhupesh SHARMA
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh SHARMA @ 2017-12-18  8:59 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse,
	Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming

On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
>> to kexec@lists.infradead.org
>>
>> Also add linux-acpi list
>
> Thank you.
>
>> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> > <ard.biesheuvel@linaro.org> wrote:
>> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > > <takahiro.akashi@linaro.org> wrote:
>> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> > >>> <takahiro.akashi@linaro.org> wrote:
>> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> > >>> >> > Bhupesh, Ard,
>> > >>> >> >
>> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> > >>> >> >> Hi Ard, Akashi
>> > >>> >> >>
>> > >>> >> > (snip)
>> > >>> >> >
>> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> > >>> >> >> , for details)
>> > >>> >> >
>> > >>> >> > Right.
>> > >>> >> >
>> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> > >>> >> >> with the crashkernel memory range:
>> > >>> >> >>
>> > >>> >> >>                 /* add linux,usable-memory-range */
>> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> > >>> >> >>                                 address_cells, size_cells);
>> > >>> >> >>
>> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> > >>> >> >> , for details)
>> > >>> >> >>
>> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> > >>> >> >>
>> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> > >>> >> >>
>> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> > >>> >> >> -r`.img --reuse-cmdline -d
>> > >>> >> >>
>> > >>> >> >> [snip..]
>> > >>> >> >>
>> > >>> >> >> Reserved memory range
>> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> > >>> >> >>
>> > >>> >> >> Coredump memory ranges
>> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> > >>> >> >>
>> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> > >>> >> >>
>> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> > >>> >> >> {
>> > >>> >> >>         struct memblock_region reg = {
>> > >>> >> >>                 .size = 0,
>> > >>> >> >>         };
>> > >>> >> >>
>> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> > >>> >> >>
>> > >>> >> >>         if (reg.size)
>> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> > >>> >> >> comment this out */
>> > >>> >> >> }
>> > >>> >> >
>> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> > >>> >> > memory contents of the *crashed* kernel.
>> > >>> >> >
>> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> > >>> >> >>
>> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> > >>> >> >> fail.
>> > >>> >> >>
>> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> > >>> >> >> dt node 'linux,usable-memory-range'
>> > >>> >> >
>> > >>> >> > I still don't understand why we need to carry over the information
>> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > >>> >> > such regions are free to be reused by the kernel after some point of
>> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> > >>> >> >
>> > >>> >>
>> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> > >>> >> memblock_reserve()'d now.
>> > >>> >
>> > >>> > For my better understandings, who is actually accessing such regions
>> > >>> > during boot time, uefi itself or efistub?
>> > >>> >
>> > >>>
>> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> > >>> instance, on QEMU we have
>> > >>>
>> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> > >>>   01000013)
>> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> > >>> BXPC 00000001)
>> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> > >>> BXPC 00000001)
>> > >>>
>> > >>> covered by
>> > >>>
>> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> > >>>  ...
>> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> > >>
>> > >> OK. I mistakenly understood those regions could be freed after exiting
>> > >> UEFI boot services.
>> > >>
>> > >>>
>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> > >>> >> when booting the next kernel.
>> > >>> >
>> > >>> > not really.
>> > >>> >
>> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> > >>> >> > on crash dump kernel?)
>> > >>> >> >
>> > >>> >>
>> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> > >>> >> regions only revealed the bug, not created it (given that other
>> > >>> >> memblock_reserve regions may be affected as well)
>> > >>> >
>> > >>> > As whether we should honor such reserved regions over kexec'ing
>> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> > >>> > exposed to user space (via proc/iomem).
>> > >>> >
>> > >>>
>> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> > >>> as 'System RAM'. Do you think that could solve this?
>> > >>
>> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> > >> marking them under another name in /proc/iomem would also be good in order
>> > >> not to allocate them as part of crash kernel's memory.
>> > >>
>> > >
>> > > I agree. However, this may not be entirely trivial, since iterating
>> > > over the memblock_reserved table and creating iomem entries may result
>> > > in collisions.
>> >
>> > I found a method (using the patch I shared earlier in this thread) to mark these
>> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> > reserved regions.
>> >
>> > >> But I'm not still convinced that we should export them in useable-
>> > >> memory-range to crash dump kernel. They will be accessed through
>> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> > >> (or memblocks), I guess.
>> > >
>> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > > which is exactly what we want in this case.
>> >
>> > Now this is what is confusing me. I don't see the above happening.
>> >
>> > I see that the primary kernel boots up and adds the ACPI regions via:
>> > acpi_os_ioremap
>> >     -> ioremap_cache
>> >
>> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> > variant.
>
> It is natural if that region is out of memblocks.

Thanks for the confirmation. This was my understanding as well.

>> > And it fails while accessing the ACPI tables:
>> >
>> > [    0.039205] ACPI: Core revision 20170728
>> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>
> this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> As ioremap() makes the mapping as "Device memory", unaligned memory
> access won't be allowed.
>
>> > [    0.100022] Modules linked in:
>> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> > pstate: 60000045
>> > [    0.132647] sp : ffff000008ccfb40
>> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> > [    0.223224] Call trace:
>> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> > ffff0000095e3980 ffff000008ccfbe0
>> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> > ffff000008ccfc50 0000000000000000
>> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> > 00000000ffffff76 0000000000000006
>> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> > 000000000000038e 0000000000000000
>> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> > 0000000000000005 000000000000001b
>> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> > ffff000009710027 0000000000000001
>> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> > 0000000000000000 ffff0000088be820
>> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> > ffff00000849b4f8 ffff000008ccfb40
>> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> > ffff000008ccfb40 ffff000008260a18
>> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> > ffff000008ccfb40 ffff0000084a6764
>> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> > [    0.404437] Rebooting in 10 seconds.
>> >
>> > So, I think the linear mapping done by the primary kernel does not
>> > make these accessible in the crash kernel directly.
>> >
>> > Any pointers?
>>
>> Can you get the code line number for acpi_ns_lookup+0x25c?
>
> So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> accesses?
> (I didn't find out how unaligned accesses could happen there.)
>

Right. Like I captured somewhere in this thread (perhaps the first
email on this subject),
this is indeed an unaligned address access.

Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
assigning this memory range
as device memory doesn't seem a neat solution as it means we are not
marking some thing with the right memory attribute and we can fall in
similar/related issues later.

Regarding the later suggestion, what I am seeing now is that the acpi
table access functions are perhaps reused from the earlier x86
implementation, but on the arm64 (or even arm) arch we should not be
allowing unaligned accesses which might cause UNDEFINED behaviour and
resultant crash.

So I can try going this approach and see if it works for me.

However, I am still not very sure as to why the crashkernel ranges
historically do not include the System RAM regions (which may include
the ACPI regions as well). These regions are available for the kernel
usage and perhaps should be exported to the crashkernel as well.

I am not fully aware of the previous discussions on capp'ing the
crashkernel memory being passed to the kdump kernel, but did we run
into any issues while doing so?

Also, even if I extend the kexec-tools to modify the
linux,usable-memory-range and add the ACPI regions to it, the
crashkernel fails to boot with the below message (I have added some
logic to print the DTB on the crash kernel boot start):

[    0.000000]     chosen {
[    0.000000]         linux,usable-memory-range
[    0.000000]  = <
[    0.000000] 0x00000000
[    0.000000] 0x0e800000
[    0.000000] 0x00000000
[    0.000000] 0x20000000
[    0.000000] 0x00000000
[    0.000000] 0x396c0000
[    0.000000] 0x00000000
[    0.000000] 0x000a0000
[    0.000000] 0x00000000
[    0.000000] 0x39770000
[    0.000000] 0x00000000
[    0.000000] 0x00040000
[    0.000000] 0x00000000
[    0.000000] 0x398a0000
[    0.000000] 0x00000000
[    0.000000] 0x00020000
[    0.000000] >
[    0.000000] ;

[snip..]

[    0.000000] linux,usable-memory-range base e800000, size 20000000
[    0.000000]  - e800000 ,  20000000
[    0.000000] linux,usable-memory-range base 396c0000, size a0000
[    0.000000]  - 396c0000 ,  a0000
[    0.000000] linux,usable-memory-range base 39770000, size 40000
[    0.000000]  - 39770000 ,  40000
[    0.000000] linux,usable-memory-range base 398a0000, size 20000
[    0.000000]  - 398a0000 ,  20000
[    0.000000] initrd not fully accessible via the linear mapping --
please check your bootloader ...
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
arm64_memblock_init+0x210/0x484
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
[    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
[    0.000000] PC is at arm64_memblock_init+0x210/0x484
[    0.000000] LR is at arm64_memblock_init+0x210/0x484
[    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
pstate: 600000c5
[    0.000000] sp : ffff000008ccfe80
[    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
[    0.000000] x27: 0000000011230000 x26: 00000000013b0000
[    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
[    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
[    0.000000] x21: ffff000008afa000 x20: ffff000008080000
[    0.000000] x19: ffff000008afa000 x18: 000000000c283806
[    0.000000] x17: 0000000000000000 x16: ffff000008d05580
[    0.000000] x15: 000000002be00842 x14: 79206b6365686320
[    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
[    0.000000] x11: 6d207261656e696c x10: 2065687420616976
[    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
[    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
[    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
[    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
[    0.000000] Call trace:
[    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
[    0.000000] fd40: 0000000000000056 0000000000000000
0000000000000000 0000000000000000
[    0.000000] fd60: 0000000000000001 ffff000008c96360
000000000000000d 746f6f622072756f
[    0.000000] fd80: ffff000008517414 00000000000000f4
2065687420616976 6d207261656e696c
[    0.000000] fda0: 2d20676e69707061 657361656c70202d
79206b6365686320 000000002be00842
[    0.000000] fdc0: ffff000008d05580 0000000000000000
000000000c283806 ffff000008afa000
[    0.000000] fde0: ffff000008080000 ffff000008afa000
ffff000009680000 ffff000008ec0000
[    0.000000] fe00: ffff000008cf3000 000000000fe80000
00000000013b0000 0000000011230000
[    0.000000] fe20: 000000000f370018 ffff000008ccfe80
ffff000008b76984 ffff000008ccfe80
[    0.000000] fe40: ffff000008b76984 00000000600000c5
ffff00000959b7a8 ffff000008ec0000
[    0.000000] fe60: ffffffffffffffff 0000000000000005
ffff000008ccfe80 ffff000008b76984
[    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
[    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x50/0x6c with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
[    0.000000] cma: Failed to reserve 512 MiB
[    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
------------   4.14.0+ #7
[    0.000000] Call trace:
[    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
[    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
[    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
[    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
[    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
[    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
[    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
[    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
[    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
[    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
[    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
allocate 0x0000000000010000 bytes below 0x0000000000000000.
[    0.000000]

I guess it is because of the 1G alignment requirement between the
kernel image and the initrd and how we populate the holes between the
kernel image, segments (including dtb) and the initrd from the
kexec-tools.

Akashi, any pointers on this will be helpful as well.

Regards,
Bhupesh


>> >
>> > Regards,
>> > Bhupesh
>> >
>> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> > >> via a kernel command line parameter, "memmap=".
>> > >>
>> > _______________________________________________
>> > kexec mailing list -- kexec@lists.fedoraproject.org
>> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18  8:59                                                                 ` Bhupesh SHARMA
  (?)
  (?)
@ 2017-12-18 11:18                                                                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming

Bhupesh,

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> >> to kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> >>
> >> Also add linux-acpi list
> >
> > Thank you.
> >
> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> > <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> > >>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> > >>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > >>> >> > Bhupesh, Ard,
> >> > >>> >> >
> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> > >>> >> >> Hi Ard, Akashi
> >> > >>> >> >>
> >> > >>> >> > (snip)
> >> > >>> >> >
> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> > >>> >> >> , for details)
> >> > >>> >> >
> >> > >>> >> > Right.
> >> > >>> >> >
> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> > >>> >> >> with the crashkernel memory range:
> >> > >>> >> >>
> >> > >>> >> >>                 /* add linux,usable-memory-range */
> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> > >>> >> >>                                 address_cells, size_cells);
> >> > >>> >> >>
> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> > >>> >> >> , for details)
> >> > >>> >> >>
> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> > >>> >> >>
> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
> >> > >>> >> >>
> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> > >>> >> >> -r`.img --reuse-cmdline -d
> >> > >>> >> >>
> >> > >>> >> >> [snip..]
> >> > >>> >> >>
> >> > >>> >> >> Reserved memory range
> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> > >>> >> >>
> >> > >>> >> >> Coredump memory ranges
> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
> >> > >>> >> >>
> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> > >>> >> >>
> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> > >>> >> >> {
> >> > >>> >> >>         struct memblock_region reg = {
> >> > >>> >> >>                 .size = 0,
> >> > >>> >> >>         };
> >> > >>> >> >>
> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> > >>> >> >>
> >> > >>> >> >>         if (reg.size)
> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> > >>> >> >> comment this out */
> >> > >>> >> >> }
> >> > >>> >> >
> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
> >> > >>> >> > memory contents of the *crashed* kernel.
> >> > >>> >> >
> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> > >>> >> >>
> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> > >>> >> >> fail.
> >> > >>> >> >>
> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> > >>> >> >> dt node 'linux,usable-memory-range'
> >> > >>> >> >
> >> > >>> >> > I still don't understand why we need to carry over the information
> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> > >>> >> > such regions are free to be reused by the kernel after some point of
> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
> >> > >>> >> memblock_reserve()'d now.
> >> > >>> >
> >> > >>> > For my better understandings, who is actually accessing such regions
> >> > >>> > during boot time, uefi itself or efistub?
> >> > >>> >
> >> > >>>
> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> > >>> instance, on QEMU we have
> >> > >>>
> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> > >>>   01000013)
> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>
> >> > >>> covered by
> >> > >>>
> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> > >>>  ...
> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> > >>
> >> > >> OK. I mistakenly understood those regions could be freed after exiting
> >> > >> UEFI boot services.
> >> > >>
> >> > >>>
> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> > >>> >> when booting the next kernel.
> >> > >>> >
> >> > >>> > not really.
> >> > >>> >
> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> > >>> >> > on crash dump kernel?)
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> > >>> >> regions only revealed the bug, not created it (given that other
> >> > >>> >> memblock_reserve regions may be affected as well)
> >> > >>> >
> >> > >>> > As whether we should honor such reserved regions over kexec'ing
> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
> >> > >>> > exposed to user space (via proc/iomem).
> >> > >>> >
> >> > >>>
> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> > >>> as 'System RAM'. Do you think that could solve this?
> >> > >>
> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> > >> marking them under another name in /proc/iomem would also be good in order
> >> > >> not to allocate them as part of crash kernel's memory.
> >> > >>
> >> > >
> >> > > I agree. However, this may not be entirely trivial, since iterating
> >> > > over the memblock_reserved table and creating iomem entries may result
> >> > > in collisions.
> >> >
> >> > I found a method (using the patch I shared earlier in this thread) to mark these
> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> > reserved regions.
> >> >
> >> > >> But I'm not still convinced that we should export them in useable-
> >> > >> memory-range to crash dump kernel. They will be accessed through
> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> > >> (or memblocks), I guess.
> >> > >
> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > > which is exactly what we want in this case.
> >> >
> >> > Now this is what is confusing me. I don't see the above happening.
> >> >
> >> > I see that the primary kernel boots up and adds the ACPI regions via:
> >> > acpi_os_ioremap
> >> >     -> ioremap_cache
> >> >
> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> > variant.
> >
> > It is natural if that region is out of memblocks.
> 
> Thanks for the confirmation. This was my understanding as well.
> 
> >> > And it fails while accessing the ACPI tables:
> >> >
> >> > [    0.039205] ACPI: Core revision 20170728
> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >
> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> > As ioremap() makes the mapping as "Device memory", unaligned memory
> > access won't be allowed.
> >
> >> > [    0.100022] Modules linked in:
> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> > pstate: 60000045
> >> > [    0.132647] sp : ffff000008ccfb40
> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> > [    0.223224] Call trace:
> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> > ffff0000095e3980 ffff000008ccfbe0
> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> > ffff000008ccfc50 0000000000000000
> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> > 00000000ffffff76 0000000000000006
> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> > 000000000000038e 0000000000000000
> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
> >> > 0000000000000005 000000000000001b
> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> > ffff000009710027 0000000000000001
> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
> >> > 0000000000000000 ffff0000088be820
> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> > ffff00000849b4f8 ffff000008ccfb40
> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> > ffff000008ccfb40 ffff000008260a18
> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> > ffff000008ccfb40 ffff0000084a6764
> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
> >> > [    0.404437] Rebooting in 10 seconds.
> >> >
> >> > So, I think the linear mapping done by the primary kernel does not
> >> > make these accessible in the crash kernel directly.
> >> >
> >> > Any pointers?
> >>
> >> Can you get the code line number for acpi_ns_lookup+0x25c?
> >
> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> > accesses?
> > (I didn't find out how unaligned accesses could happen there.)
> >
> 
> Right. Like I captured somewhere in this thread (perhaps the first
> email on this subject),
> this is indeed an unaligned address access.
> 
> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
> assigning this memory range
> as device memory doesn't seem a neat solution as it means we are not
> marking some thing with the right memory attribute and we can fall in
> similar/related issues later.
> 
> Regarding the later suggestion, what I am seeing now is that the acpi
> table access functions are perhaps reused from the earlier x86
> implementation, but on the arm64 (or even arm) arch we should not be
> allowing unaligned accesses which might cause UNDEFINED behaviour and
> resultant crash.
> 
> So I can try going this approach and see if it works for me.
> 
> However, I am still not very sure as to why the crashkernel ranges
> historically do not include the System RAM regions (which may include
> the ACPI regions as well). These regions are available for the kernel
> usage and perhaps should be exported to the crashkernel as well.
> 
> I am not fully aware of the previous discussions on capp'ing the
> crashkernel memory being passed to the kdump kernel, but did we run
> into any issues while doing so?
> 
> Also, even if I extend the kexec-tools to modify the
> linux,usable-memory-range and add the ACPI regions to it, the
> crashkernel fails to boot with the below message (I have added some
> logic to print the DTB on the crash kernel boot start):
> 
> [    0.000000]     chosen {
> [    0.000000]         linux,usable-memory-range
> [    0.000000]  = <
> [    0.000000] 0x00000000
> [    0.000000] 0x0e800000
> [    0.000000] 0x00000000
> [    0.000000] 0x20000000
> [    0.000000] 0x00000000
> [    0.000000] 0x396c0000
> [    0.000000] 0x00000000
> [    0.000000] 0x000a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x39770000
> [    0.000000] 0x00000000
> [    0.000000] 0x00040000
> [    0.000000] 0x00000000
> [    0.000000] 0x398a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x00020000
> [    0.000000] >
> [    0.000000] ;
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.

Please show me:
 * "Virtual kernel memory layout" in dmesg
 * /proc/iomem
 * debug messages from kexec-tools (kexec -d)

-Takahiro AKASHI


> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org
> >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 11:18                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi,
	linux-kernel, linux-arm-kernel, James Morse, linux-efi,
	Mark Rutland, Matt Fleming

Bhupesh,

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> >> to kexec@lists.infradead.org
> >>
> >> Also add linux-acpi list
> >
> > Thank you.
> >
> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> > <ard.biesheuvel@linaro.org> wrote:
> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > > <takahiro.akashi@linaro.org> wrote:
> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> > >>> <takahiro.akashi@linaro.org> wrote:
> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
> >> > >>> >> > Bhupesh, Ard,
> >> > >>> >> >
> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> > >>> >> >> Hi Ard, Akashi
> >> > >>> >> >>
> >> > >>> >> > (snip)
> >> > >>> >> >
> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> > >>> >> >> , for details)
> >> > >>> >> >
> >> > >>> >> > Right.
> >> > >>> >> >
> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> > >>> >> >> with the crashkernel memory range:
> >> > >>> >> >>
> >> > >>> >> >>                 /* add linux,usable-memory-range */
> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> > >>> >> >>                                 address_cells, size_cells);
> >> > >>> >> >>
> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> > >>> >> >> , for details)
> >> > >>> >> >>
> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> > >>> >> >>
> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
> >> > >>> >> >>
> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> > >>> >> >> -r`.img --reuse-cmdline -d
> >> > >>> >> >>
> >> > >>> >> >> [snip..]
> >> > >>> >> >>
> >> > >>> >> >> Reserved memory range
> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> > >>> >> >>
> >> > >>> >> >> Coredump memory ranges
> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
> >> > >>> >> >>
> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> > >>> >> >>
> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> > >>> >> >> {
> >> > >>> >> >>         struct memblock_region reg = {
> >> > >>> >> >>                 .size = 0,
> >> > >>> >> >>         };
> >> > >>> >> >>
> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> > >>> >> >>
> >> > >>> >> >>         if (reg.size)
> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> > >>> >> >> comment this out */
> >> > >>> >> >> }
> >> > >>> >> >
> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
> >> > >>> >> > memory contents of the *crashed* kernel.
> >> > >>> >> >
> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> > >>> >> >>
> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> > >>> >> >> fail.
> >> > >>> >> >>
> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> > >>> >> >> dt node 'linux,usable-memory-range'
> >> > >>> >> >
> >> > >>> >> > I still don't understand why we need to carry over the information
> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> > >>> >> > such regions are free to be reused by the kernel after some point of
> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
> >> > >>> >> memblock_reserve()'d now.
> >> > >>> >
> >> > >>> > For my better understandings, who is actually accessing such regions
> >> > >>> > during boot time, uefi itself or efistub?
> >> > >>> >
> >> > >>>
> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> > >>> instance, on QEMU we have
> >> > >>>
> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> > >>>   01000013)
> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>
> >> > >>> covered by
> >> > >>>
> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> > >>>  ...
> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> > >>
> >> > >> OK. I mistakenly understood those regions could be freed after exiting
> >> > >> UEFI boot services.
> >> > >>
> >> > >>>
> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> > >>> >> when booting the next kernel.
> >> > >>> >
> >> > >>> > not really.
> >> > >>> >
> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> > >>> >> > on crash dump kernel?)
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> > >>> >> regions only revealed the bug, not created it (given that other
> >> > >>> >> memblock_reserve regions may be affected as well)
> >> > >>> >
> >> > >>> > As whether we should honor such reserved regions over kexec'ing
> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
> >> > >>> > exposed to user space (via proc/iomem).
> >> > >>> >
> >> > >>>
> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> > >>> as 'System RAM'. Do you think that could solve this?
> >> > >>
> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> > >> marking them under another name in /proc/iomem would also be good in order
> >> > >> not to allocate them as part of crash kernel's memory.
> >> > >>
> >> > >
> >> > > I agree. However, this may not be entirely trivial, since iterating
> >> > > over the memblock_reserved table and creating iomem entries may result
> >> > > in collisions.
> >> >
> >> > I found a method (using the patch I shared earlier in this thread) to mark these
> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> > reserved regions.
> >> >
> >> > >> But I'm not still convinced that we should export them in useable-
> >> > >> memory-range to crash dump kernel. They will be accessed through
> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> > >> (or memblocks), I guess.
> >> > >
> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > > which is exactly what we want in this case.
> >> >
> >> > Now this is what is confusing me. I don't see the above happening.
> >> >
> >> > I see that the primary kernel boots up and adds the ACPI regions via:
> >> > acpi_os_ioremap
> >> >     -> ioremap_cache
> >> >
> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> > variant.
> >
> > It is natural if that region is out of memblocks.
> 
> Thanks for the confirmation. This was my understanding as well.
> 
> >> > And it fails while accessing the ACPI tables:
> >> >
> >> > [    0.039205] ACPI: Core revision 20170728
> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >
> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> > As ioremap() makes the mapping as "Device memory", unaligned memory
> > access won't be allowed.
> >
> >> > [    0.100022] Modules linked in:
> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> > pstate: 60000045
> >> > [    0.132647] sp : ffff000008ccfb40
> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> > [    0.223224] Call trace:
> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> > ffff0000095e3980 ffff000008ccfbe0
> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> > ffff000008ccfc50 0000000000000000
> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> > 00000000ffffff76 0000000000000006
> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> > 000000000000038e 0000000000000000
> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
> >> > 0000000000000005 000000000000001b
> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> > ffff000009710027 0000000000000001
> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
> >> > 0000000000000000 ffff0000088be820
> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> > ffff00000849b4f8 ffff000008ccfb40
> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> > ffff000008ccfb40 ffff000008260a18
> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> > ffff000008ccfb40 ffff0000084a6764
> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
> >> > [    0.404437] Rebooting in 10 seconds.
> >> >
> >> > So, I think the linear mapping done by the primary kernel does not
> >> > make these accessible in the crash kernel directly.
> >> >
> >> > Any pointers?
> >>
> >> Can you get the code line number for acpi_ns_lookup+0x25c?
> >
> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> > accesses?
> > (I didn't find out how unaligned accesses could happen there.)
> >
> 
> Right. Like I captured somewhere in this thread (perhaps the first
> email on this subject),
> this is indeed an unaligned address access.
> 
> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
> assigning this memory range
> as device memory doesn't seem a neat solution as it means we are not
> marking some thing with the right memory attribute and we can fall in
> similar/related issues later.
> 
> Regarding the later suggestion, what I am seeing now is that the acpi
> table access functions are perhaps reused from the earlier x86
> implementation, but on the arm64 (or even arm) arch we should not be
> allowing unaligned accesses which might cause UNDEFINED behaviour and
> resultant crash.
> 
> So I can try going this approach and see if it works for me.
> 
> However, I am still not very sure as to why the crashkernel ranges
> historically do not include the System RAM regions (which may include
> the ACPI regions as well). These regions are available for the kernel
> usage and perhaps should be exported to the crashkernel as well.
> 
> I am not fully aware of the previous discussions on capp'ing the
> crashkernel memory being passed to the kdump kernel, but did we run
> into any issues while doing so?
> 
> Also, even if I extend the kexec-tools to modify the
> linux,usable-memory-range and add the ACPI regions to it, the
> crashkernel fails to boot with the below message (I have added some
> logic to print the DTB on the crash kernel boot start):
> 
> [    0.000000]     chosen {
> [    0.000000]         linux,usable-memory-range
> [    0.000000]  = <
> [    0.000000] 0x00000000
> [    0.000000] 0x0e800000
> [    0.000000] 0x00000000
> [    0.000000] 0x20000000
> [    0.000000] 0x00000000
> [    0.000000] 0x396c0000
> [    0.000000] 0x00000000
> [    0.000000] 0x000a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x39770000
> [    0.000000] 0x00000000
> [    0.000000] 0x00040000
> [    0.000000] 0x00000000
> [    0.000000] 0x398a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x00020000
> [    0.000000] >
> [    0.000000] ;
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.

Please show me:
 * "Virtual kernel memory layout" in dmesg
 * /proc/iomem
 * debug messages from kexec-tools (kexec -d)

-Takahiro AKASHI


> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec@lists.fedoraproject.org
> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 11:18                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

Bhupesh,

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> >> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
> >> to kexec at lists.infradead.org
> >>
> >> Also add linux-acpi list
> >
> > Thank you.
> >
> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> > <ard.biesheuvel@linaro.org> wrote:
> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > > <takahiro.akashi@linaro.org> wrote:
> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> > >>> <takahiro.akashi@linaro.org> wrote:
> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
> >> > >>> >> > Bhupesh, Ard,
> >> > >>> >> >
> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> > >>> >> >> Hi Ard, Akashi
> >> > >>> >> >>
> >> > >>> >> > (snip)
> >> > >>> >> >
> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> > >>> >> >> , for details)
> >> > >>> >> >
> >> > >>> >> > Right.
> >> > >>> >> >
> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> > >>> >> >> with the crashkernel memory range:
> >> > >>> >> >>
> >> > >>> >> >>                 /* add linux,usable-memory-range */
> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> > >>> >> >>                                 address_cells, size_cells);
> >> > >>> >> >>
> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> > >>> >> >> , for details)
> >> > >>> >> >>
> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> > >>> >> >>
> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
> >> > >>> >> >>
> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> > >>> >> >> -r`.img --reuse-cmdline -d
> >> > >>> >> >>
> >> > >>> >> >> [snip..]
> >> > >>> >> >>
> >> > >>> >> >> Reserved memory range
> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> > >>> >> >>
> >> > >>> >> >> Coredump memory ranges
> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
> >> > >>> >> >>
> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> > >>> >> >>
> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> > >>> >> >> {
> >> > >>> >> >>         struct memblock_region reg = {
> >> > >>> >> >>                 .size = 0,
> >> > >>> >> >>         };
> >> > >>> >> >>
> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> > >>> >> >>
> >> > >>> >> >>         if (reg.size)
> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> > >>> >> >> comment this out */
> >> > >>> >> >> }
> >> > >>> >> >
> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
> >> > >>> >> > memory contents of the *crashed* kernel.
> >> > >>> >> >
> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> > >>> >> >>
> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> > >>> >> >> fail.
> >> > >>> >> >>
> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> > >>> >> >> dt node 'linux,usable-memory-range'
> >> > >>> >> >
> >> > >>> >> > I still don't understand why we need to carry over the information
> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> > >>> >> > such regions are free to be reused by the kernel after some point of
> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
> >> > >>> >> memblock_reserve()'d now.
> >> > >>> >
> >> > >>> > For my better understandings, who is actually accessing such regions
> >> > >>> > during boot time, uefi itself or efistub?
> >> > >>> >
> >> > >>>
> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> > >>> instance, on QEMU we have
> >> > >>>
> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> > >>>   01000013)
> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>
> >> > >>> covered by
> >> > >>>
> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> > >>>  ...
> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> > >>
> >> > >> OK. I mistakenly understood those regions could be freed after exiting
> >> > >> UEFI boot services.
> >> > >>
> >> > >>>
> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> > >>> >> when booting the next kernel.
> >> > >>> >
> >> > >>> > not really.
> >> > >>> >
> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> > >>> >> > on crash dump kernel?)
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> > >>> >> regions only revealed the bug, not created it (given that other
> >> > >>> >> memblock_reserve regions may be affected as well)
> >> > >>> >
> >> > >>> > As whether we should honor such reserved regions over kexec'ing
> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
> >> > >>> > exposed to user space (via proc/iomem).
> >> > >>> >
> >> > >>>
> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> > >>> as 'System RAM'. Do you think that could solve this?
> >> > >>
> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> > >> marking them under another name in /proc/iomem would also be good in order
> >> > >> not to allocate them as part of crash kernel's memory.
> >> > >>
> >> > >
> >> > > I agree. However, this may not be entirely trivial, since iterating
> >> > > over the memblock_reserved table and creating iomem entries may result
> >> > > in collisions.
> >> >
> >> > I found a method (using the patch I shared earlier in this thread) to mark these
> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> > reserved regions.
> >> >
> >> > >> But I'm not still convinced that we should export them in useable-
> >> > >> memory-range to crash dump kernel. They will be accessed through
> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> > >> (or memblocks), I guess.
> >> > >
> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > > which is exactly what we want in this case.
> >> >
> >> > Now this is what is confusing me. I don't see the above happening.
> >> >
> >> > I see that the primary kernel boots up and adds the ACPI regions via:
> >> > acpi_os_ioremap
> >> >     -> ioremap_cache
> >> >
> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> > variant.
> >
> > It is natural if that region is out of memblocks.
> 
> Thanks for the confirmation. This was my understanding as well.
> 
> >> > And it fails while accessing the ACPI tables:
> >> >
> >> > [    0.039205] ACPI: Core revision 20170728
> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >
> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> > As ioremap() makes the mapping as "Device memory", unaligned memory
> > access won't be allowed.
> >
> >> > [    0.100022] Modules linked in:
> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> > pstate: 60000045
> >> > [    0.132647] sp : ffff000008ccfb40
> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> > [    0.223224] Call trace:
> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> > ffff0000095e3980 ffff000008ccfbe0
> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> > ffff000008ccfc50 0000000000000000
> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> > 00000000ffffff76 0000000000000006
> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> > 000000000000038e 0000000000000000
> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
> >> > 0000000000000005 000000000000001b
> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> > ffff000009710027 0000000000000001
> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
> >> > 0000000000000000 ffff0000088be820
> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> > ffff00000849b4f8 ffff000008ccfb40
> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> > ffff000008ccfb40 ffff000008260a18
> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> > ffff000008ccfb40 ffff0000084a6764
> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
> >> > [    0.404437] Rebooting in 10 seconds.
> >> >
> >> > So, I think the linear mapping done by the primary kernel does not
> >> > make these accessible in the crash kernel directly.
> >> >
> >> > Any pointers?
> >>
> >> Can you get the code line number for acpi_ns_lookup+0x25c?
> >
> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> > accesses?
> > (I didn't find out how unaligned accesses could happen there.)
> >
> 
> Right. Like I captured somewhere in this thread (perhaps the first
> email on this subject),
> this is indeed an unaligned address access.
> 
> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
> assigning this memory range
> as device memory doesn't seem a neat solution as it means we are not
> marking some thing with the right memory attribute and we can fall in
> similar/related issues later.
> 
> Regarding the later suggestion, what I am seeing now is that the acpi
> table access functions are perhaps reused from the earlier x86
> implementation, but on the arm64 (or even arm) arch we should not be
> allowing unaligned accesses which might cause UNDEFINED behaviour and
> resultant crash.
> 
> So I can try going this approach and see if it works for me.
> 
> However, I am still not very sure as to why the crashkernel ranges
> historically do not include the System RAM regions (which may include
> the ACPI regions as well). These regions are available for the kernel
> usage and perhaps should be exported to the crashkernel as well.
> 
> I am not fully aware of the previous discussions on capp'ing the
> crashkernel memory being passed to the kdump kernel, but did we run
> into any issues while doing so?
> 
> Also, even if I extend the kexec-tools to modify the
> linux,usable-memory-range and add the ACPI regions to it, the
> crashkernel fails to boot with the below message (I have added some
> logic to print the DTB on the crash kernel boot start):
> 
> [    0.000000]     chosen {
> [    0.000000]         linux,usable-memory-range
> [    0.000000]  = <
> [    0.000000] 0x00000000
> [    0.000000] 0x0e800000
> [    0.000000] 0x00000000
> [    0.000000] 0x20000000
> [    0.000000] 0x00000000
> [    0.000000] 0x396c0000
> [    0.000000] 0x00000000
> [    0.000000] 0x000a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x39770000
> [    0.000000] 0x00000000
> [    0.000000] 0x00040000
> [    0.000000] 0x00000000
> [    0.000000] 0x398a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x00020000
> [    0.000000] >
> [    0.000000] ;
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.

Please show me:
 * "Virtual kernel memory layout" in dmesg
 * /proc/iomem
 * debug messages from kexec-tools (kexec -d)

-Takahiro AKASHI


> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec at lists.fedoraproject.org
> >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 11:18                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming,
	Bhupesh Sharma, kexec, linux-kernel, linux-acpi, James Morse,
	Dave Young, linux-arm-kernel

Bhupesh,

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> >> to kexec@lists.infradead.org
> >>
> >> Also add linux-acpi list
> >
> > Thank you.
> >
> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> > <ard.biesheuvel@linaro.org> wrote:
> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > > <takahiro.akashi@linaro.org> wrote:
> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> > >>> <takahiro.akashi@linaro.org> wrote:
> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
> >> > >>> >> > Bhupesh, Ard,
> >> > >>> >> >
> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> > >>> >> >> Hi Ard, Akashi
> >> > >>> >> >>
> >> > >>> >> > (snip)
> >> > >>> >> >
> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> > >>> >> >> , for details)
> >> > >>> >> >
> >> > >>> >> > Right.
> >> > >>> >> >
> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> > >>> >> >> with the crashkernel memory range:
> >> > >>> >> >>
> >> > >>> >> >>                 /* add linux,usable-memory-range */
> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> > >>> >> >>                                 address_cells, size_cells);
> >> > >>> >> >>
> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> > >>> >> >> , for details)
> >> > >>> >> >>
> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> > >>> >> >>
> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
> >> > >>> >> >>
> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> > >>> >> >> -r`.img --reuse-cmdline -d
> >> > >>> >> >>
> >> > >>> >> >> [snip..]
> >> > >>> >> >>
> >> > >>> >> >> Reserved memory range
> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> > >>> >> >>
> >> > >>> >> >> Coredump memory ranges
> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
> >> > >>> >> >>
> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> > >>> >> >>
> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> > >>> >> >> {
> >> > >>> >> >>         struct memblock_region reg = {
> >> > >>> >> >>                 .size = 0,
> >> > >>> >> >>         };
> >> > >>> >> >>
> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> > >>> >> >>
> >> > >>> >> >>         if (reg.size)
> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> > >>> >> >> comment this out */
> >> > >>> >> >> }
> >> > >>> >> >
> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
> >> > >>> >> > memory contents of the *crashed* kernel.
> >> > >>> >> >
> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> > >>> >> >>
> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> > >>> >> >> fail.
> >> > >>> >> >>
> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> > >>> >> >> dt node 'linux,usable-memory-range'
> >> > >>> >> >
> >> > >>> >> > I still don't understand why we need to carry over the information
> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> > >>> >> > such regions are free to be reused by the kernel after some point of
> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
> >> > >>> >> memblock_reserve()'d now.
> >> > >>> >
> >> > >>> > For my better understandings, who is actually accessing such regions
> >> > >>> > during boot time, uefi itself or efistub?
> >> > >>> >
> >> > >>>
> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> > >>> instance, on QEMU we have
> >> > >>>
> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> > >>>   01000013)
> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> > >>> BXPC 00000001)
> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> > >>> BXPC 00000001)
> >> > >>>
> >> > >>> covered by
> >> > >>>
> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> > >>>  ...
> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> > >>
> >> > >> OK. I mistakenly understood those regions could be freed after exiting
> >> > >> UEFI boot services.
> >> > >>
> >> > >>>
> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> > >>> >> when booting the next kernel.
> >> > >>> >
> >> > >>> > not really.
> >> > >>> >
> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> > >>> >> > on crash dump kernel?)
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> > >>> >> regions only revealed the bug, not created it (given that other
> >> > >>> >> memblock_reserve regions may be affected as well)
> >> > >>> >
> >> > >>> > As whether we should honor such reserved regions over kexec'ing
> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
> >> > >>> > exposed to user space (via proc/iomem).
> >> > >>> >
> >> > >>>
> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> > >>> as 'System RAM'. Do you think that could solve this?
> >> > >>
> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> > >> marking them under another name in /proc/iomem would also be good in order
> >> > >> not to allocate them as part of crash kernel's memory.
> >> > >>
> >> > >
> >> > > I agree. However, this may not be entirely trivial, since iterating
> >> > > over the memblock_reserved table and creating iomem entries may result
> >> > > in collisions.
> >> >
> >> > I found a method (using the patch I shared earlier in this thread) to mark these
> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> > reserved regions.
> >> >
> >> > >> But I'm not still convinced that we should export them in useable-
> >> > >> memory-range to crash dump kernel. They will be accessed through
> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> > >> (or memblocks), I guess.
> >> > >
> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > > which is exactly what we want in this case.
> >> >
> >> > Now this is what is confusing me. I don't see the above happening.
> >> >
> >> > I see that the primary kernel boots up and adds the ACPI regions via:
> >> > acpi_os_ioremap
> >> >     -> ioremap_cache
> >> >
> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> > variant.
> >
> > It is natural if that region is out of memblocks.
> 
> Thanks for the confirmation. This was my understanding as well.
> 
> >> > And it fails while accessing the ACPI tables:
> >> >
> >> > [    0.039205] ACPI: Core revision 20170728
> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >
> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
> > As ioremap() makes the mapping as "Device memory", unaligned memory
> > access won't be allowed.
> >
> >> > [    0.100022] Modules linked in:
> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> > pstate: 60000045
> >> > [    0.132647] sp : ffff000008ccfb40
> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> > [    0.223224] Call trace:
> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> > ffff0000095e3980 ffff000008ccfbe0
> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> > ffff000008ccfc50 0000000000000000
> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> > 00000000ffffff76 0000000000000006
> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> > 000000000000038e 0000000000000000
> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
> >> > 0000000000000005 000000000000001b
> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> > ffff000009710027 0000000000000001
> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
> >> > 0000000000000000 ffff0000088be820
> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> > ffff00000849b4f8 ffff000008ccfb40
> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> > ffff000008ccfb40 ffff000008260a18
> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> > ffff000008ccfb40 ffff0000084a6764
> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
> >> > [    0.404437] Rebooting in 10 seconds.
> >> >
> >> > So, I think the linear mapping done by the primary kernel does not
> >> > make these accessible in the crash kernel directly.
> >> >
> >> > Any pointers?
> >>
> >> Can you get the code line number for acpi_ns_lookup+0x25c?
> >
> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
> > accesses?
> > (I didn't find out how unaligned accesses could happen there.)
> >
> 
> Right. Like I captured somewhere in this thread (perhaps the first
> email on this subject),
> this is indeed an unaligned address access.
> 
> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
> assigning this memory range
> as device memory doesn't seem a neat solution as it means we are not
> marking some thing with the right memory attribute and we can fall in
> similar/related issues later.
> 
> Regarding the later suggestion, what I am seeing now is that the acpi
> table access functions are perhaps reused from the earlier x86
> implementation, but on the arm64 (or even arm) arch we should not be
> allowing unaligned accesses which might cause UNDEFINED behaviour and
> resultant crash.
> 
> So I can try going this approach and see if it works for me.
> 
> However, I am still not very sure as to why the crashkernel ranges
> historically do not include the System RAM regions (which may include
> the ACPI regions as well). These regions are available for the kernel
> usage and perhaps should be exported to the crashkernel as well.
> 
> I am not fully aware of the previous discussions on capp'ing the
> crashkernel memory being passed to the kdump kernel, but did we run
> into any issues while doing so?
> 
> Also, even if I extend the kexec-tools to modify the
> linux,usable-memory-range and add the ACPI regions to it, the
> crashkernel fails to boot with the below message (I have added some
> logic to print the DTB on the crash kernel boot start):
> 
> [    0.000000]     chosen {
> [    0.000000]         linux,usable-memory-range
> [    0.000000]  = <
> [    0.000000] 0x00000000
> [    0.000000] 0x0e800000
> [    0.000000] 0x00000000
> [    0.000000] 0x20000000
> [    0.000000] 0x00000000
> [    0.000000] 0x396c0000
> [    0.000000] 0x00000000
> [    0.000000] 0x000a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x39770000
> [    0.000000] 0x00000000
> [    0.000000] 0x00040000
> [    0.000000] 0x00000000
> [    0.000000] 0x398a0000
> [    0.000000] 0x00000000
> [    0.000000] 0x00020000
> [    0.000000] >
> [    0.000000] ;
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.

Please show me:
 * "Virtual kernel memory layout" in dmesg
 * /proc/iomem
 * debug messages from kexec-tools (kexec -d)

-Takahiro AKASHI


> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec@lists.fedoraproject.org
> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18  5:16                                                             ` Dave Young
  (?)
  (?)
@ 2017-12-18 21:28                                                               ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw)
  To: Dave Young
  Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

Hi Dave,

On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec@lists.infradead.org
>
> Also add linux-acpi list
> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> <takahiro.akashi@linaro.org> wrote:
>> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> >> <takahiro.akashi@linaro.org> wrote:
>> >>> >> > Bhupesh, Ard,
>> >>> >> >
>> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> >> >> Hi Ard, Akashi
>> >>> >> >>
>> >>> >> > (snip)
>> >>> >> >
>> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> >> >> , for details)
>> >>> >> >
>> >>> >> > Right.
>> >>> >> >
>> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> >> >> with the crashkernel memory range:
>> >>> >> >>
>> >>> >> >>                 /* add linux,usable-memory-range */
>> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> >> >>                                 address_cells, size_cells);
>> >>> >> >>
>> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> >> >> , for details)
>> >>> >> >>
>> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> >> >>
>> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> >> >> ACPI memory and crashes while trying to access the same:
>> >>> >> >>
>> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> >> >> -r`.img --reuse-cmdline -d
>> >>> >> >>
>> >>> >> >> [snip..]
>> >>> >> >>
>> >>> >> >> Reserved memory range
>> >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >>> >> >>
>> >>> >> >> Coredump memory ranges
>> >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >>> >> >> 000000002e800000-000000003961ffff (0)
>> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> >> >> 000000a000000000-000000affbffffff (0)
>> >>> >> >>
>> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> >> >> memory cap'ing passed to the crash kernel inside
>> >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> >> >>
>> >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >>> >> >> {
>> >>> >> >>         struct memblock_region reg = {
>> >>> >> >>                 .size = 0,
>> >>> >> >>         };
>> >>> >> >>
>> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> >> >>
>> >>> >> >>         if (reg.size)
>> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> >> >> comment this out */
>> >>> >> >> }
>> >>> >> >
>> >>> >> > Please just don't do that. It can cause a fatal damage on
>> >>> >> > memory contents of the *crashed* kernel.
>> >>> >> >
>> >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >>> >> >>
>> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> >> >> fail.
>> >>> >> >>
>> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> >> >> dt node 'linux,usable-memory-range'
>> >>> >> >
>> >>> >> > I still don't understand why we need to carry over the information
>> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> >> > such regions are free to be reused by the kernel after some point of
>> >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >>> >> >
>> >>> >>
>> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> >> kernel, those regions needs to be preserved, which is why they are
>> >>> >> memblock_reserve()'d now.
>> >>> >
>> >>> > For my better understandings, who is actually accessing such regions
>> >>> > during boot time, uefi itself or efistub?
>> >>> >
>> >>>
>> >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >>> instance, on QEMU we have
>> >>>
>> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>>   01000013)
>> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> BXPC 00000001)
>> >>>
>> >>> covered by
>> >>>
>> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>>  ...
>> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>
>> >> OK. I mistakenly understood those regions could be freed after exiting
>> >> UEFI boot services.
>> >>
>> >>>
>> >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> >> when booting the next kernel.
>> >>> >
>> >>> > not really.
>> >>> >
>> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> >> > on crash dump kernel?)
>> >>> >> >
>> >>> >>
>> >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> >> regions only revealed the bug, not created it (given that other
>> >>> >> memblock_reserve regions may be affected as well)
>> >>> >
>> >>> > As whether we should honor such reserved regions over kexec'ing
>> >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > As a matter of fact, no information about "reserved" memblocks is
>> >>> > exposed to user space (via proc/iomem).
>> >>> >
>> >>>
>> >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >>> as 'System RAM'. Do you think that could solve this?
>> >>
>> >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> marking them under another name in /proc/iomem would also be good in order
>> >> not to allocate them as part of crash kernel's memory.
>> >>
>> >
>> > I agree. However, this may not be entirely trivial, since iterating
>> > over the memblock_reserved table and creating iomem entries may result
>> > in collisions.
>>
>> I found a method (using the patch I shared earlier in this thread) to mark these
>> entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> reserved regions.
>>
>> >> But I'm not still convinced that we should export them in useable-
>> >> memory-range to crash dump kernel. They will be accessed through
>> >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> (or memblocks), I guess.
>> >
>> > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > which is exactly what we want in this case.
>>
>> Now this is what is confusing me. I don't see the above happening.
>>
>> I see that the primary kernel boots up and adds the ACPI regions via:
>> acpi_os_ioremap
>>     -> ioremap_cache
>>
>> But during the crashkernel boot, ''acpi_os_ioremap' calls
>> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> variant.
>>
>> And it fails while accessing the ACPI tables:
>>
>> [    0.039205] ACPI: Core revision 20170728
>> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> [    0.100022] Modules linked in:
>> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> pstate: 60000045
>> [    0.132647] sp : ffff000008ccfb40
>> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> [    0.223224] Call trace:
>> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> [    0.232194] fa00: 0000000000000000 ffff000009710027
>> ffff0000095e3980 ffff000008ccfbe0
>> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> ffff000008ccfc50 0000000000000000
>> [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> 00000000ffffff76 0000000000000006
>> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> 000000000000038e 0000000000000000
>> [    0.263843] fa80: 0000000000000000 0000000000000000
>> 0000000000000005 000000000000001b
>> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> ffff000009710027 0000000000000001
>> [    0.279667] fac0: 0000000000000001 000000000000001b
>> 0000000000000000 ffff0000088be820
>> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> ffff00000849b4f8 ffff000008ccfb40
>> [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> ffff000008ccfb40 ffff000008260a18
>> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> ffff000008ccfb40 ffff0000084a6764
>> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> [    0.399160] Kernel panic - not syncing: Fatal exception
>> [    0.404437] Rebooting in 10 seconds.
>>
>> So, I think the linear mapping done by the primary kernel does not
>> make these accessible in the crash kernel directly.
>>
>> Any pointers?
>
> Can you get the code line number for acpi_ns_lookup+0x25c?

gdb points to the following code line number:

(gdb) list *(acpi_ns_lookup+0x25c)
0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
572                }
573            }
574
575            /* Extract one ACPI name from the front of the pathname */
576
577            ACPI_MOVE_32_TO_32(&simple_name, path);
578
579            /* Try to find the single (4 character) ACPI name */
580
581            status =
(gdb)

i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

addr2line also confirms the same:

# addr2line -e  vmlinux ffff0000084aa250
/root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577


Regards,
Bhupesh


>>
>> Regards,
>> Bhupesh
>>
>> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> via a kernel command line parameter, "memmap=".
>> >>
>> _______________________________________________
>> kexec mailing list -- kexec@lists.fedoraproject.org
>> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 21:28                                                               ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw)
  To: Dave Young
  Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

Hi Dave,

On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec@lists.infradead.org
>
> Also add linux-acpi list
> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> <takahiro.akashi@linaro.org> wrote:
>> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> >> <takahiro.akashi@linaro.org> wrote:
>> >>> >> > Bhupesh, Ard,
>> >>> >> >
>> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> >> >> Hi Ard, Akashi
>> >>> >> >>
>> >>> >> > (snip)
>> >>> >> >
>> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> >> >> , for details)
>> >>> >> >
>> >>> >> > Right.
>> >>> >> >
>> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> >> >> with the crashkernel memory range:
>> >>> >> >>
>> >>> >> >>                 /* add linux,usable-memory-range */
>> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> >> >>                                 address_cells, size_cells);
>> >>> >> >>
>> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> >> >> , for details)
>> >>> >> >>
>> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> >> >>
>> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> >> >> ACPI memory and crashes while trying to access the same:
>> >>> >> >>
>> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> >> >> -r`.img --reuse-cmdline -d
>> >>> >> >>
>> >>> >> >> [snip..]
>> >>> >> >>
>> >>> >> >> Reserved memory range
>> >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >>> >> >>
>> >>> >> >> Coredump memory ranges
>> >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >>> >> >> 000000002e800000-000000003961ffff (0)
>> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> >> >> 000000a000000000-000000affbffffff (0)
>> >>> >> >>
>> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> >> >> memory cap'ing passed to the crash kernel inside
>> >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> >> >>
>> >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >>> >> >> {
>> >>> >> >>         struct memblock_region reg = {
>> >>> >> >>                 .size = 0,
>> >>> >> >>         };
>> >>> >> >>
>> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> >> >>
>> >>> >> >>         if (reg.size)
>> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> >> >> comment this out */
>> >>> >> >> }
>> >>> >> >
>> >>> >> > Please just don't do that. It can cause a fatal damage on
>> >>> >> > memory contents of the *crashed* kernel.
>> >>> >> >
>> >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >>> >> >>
>> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> >> >> fail.
>> >>> >> >>
>> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> >> >> dt node 'linux,usable-memory-range'
>> >>> >> >
>> >>> >> > I still don't understand why we need to carry over the information
>> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> >> > such regions are free to be reused by the kernel after some point of
>> >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >>> >> >
>> >>> >>
>> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> >> kernel, those regions needs to be preserved, which is why they are
>> >>> >> memblock_reserve()'d now.
>> >>> >
>> >>> > For my better understandings, who is actually accessing such regions
>> >>> > during boot time, uefi itself or efistub?
>> >>> >
>> >>>
>> >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >>> instance, on QEMU we have
>> >>>
>> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>>   01000013)
>> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> BXPC 00000001)
>> >>>
>> >>> covered by
>> >>>
>> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>>  ...
>> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>
>> >> OK. I mistakenly understood those regions could be freed after exiting
>> >> UEFI boot services.
>> >>
>> >>>
>> >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> >> when booting the next kernel.
>> >>> >
>> >>> > not really.
>> >>> >
>> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> >> > on crash dump kernel?)
>> >>> >> >
>> >>> >>
>> >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> >> regions only revealed the bug, not created it (given that other
>> >>> >> memblock_reserve regions may be affected as well)
>> >>> >
>> >>> > As whether we should honor such reserved regions over kexec'ing
>> >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > As a matter of fact, no information about "reserved" memblocks is
>> >>> > exposed to user space (via proc/iomem).
>> >>> >
>> >>>
>> >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >>> as 'System RAM'. Do you think that could solve this?
>> >>
>> >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> marking them under another name in /proc/iomem would also be good in order
>> >> not to allocate them as part of crash kernel's memory.
>> >>
>> >
>> > I agree. However, this may not be entirely trivial, since iterating
>> > over the memblock_reserved table and creating iomem entries may result
>> > in collisions.
>>
>> I found a method (using the patch I shared earlier in this thread) to mark these
>> entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> reserved regions.
>>
>> >> But I'm not still convinced that we should export them in useable-
>> >> memory-range to crash dump kernel. They will be accessed through
>> >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> (or memblocks), I guess.
>> >
>> > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > which is exactly what we want in this case.
>>
>> Now this is what is confusing me. I don't see the above happening.
>>
>> I see that the primary kernel boots up and adds the ACPI regions via:
>> acpi_os_ioremap
>>     -> ioremap_cache
>>
>> But during the crashkernel boot, ''acpi_os_ioremap' calls
>> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> variant.
>>
>> And it fails while accessing the ACPI tables:
>>
>> [    0.039205] ACPI: Core revision 20170728
>> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> [    0.100022] Modules linked in:
>> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> pstate: 60000045
>> [    0.132647] sp : ffff000008ccfb40
>> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> [    0.223224] Call trace:
>> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> [    0.232194] fa00: 0000000000000000 ffff000009710027
>> ffff0000095e3980 ffff000008ccfbe0
>> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> ffff000008ccfc50 0000000000000000
>> [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> 00000000ffffff76 0000000000000006
>> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> 000000000000038e 0000000000000000
>> [    0.263843] fa80: 0000000000000000 0000000000000000
>> 0000000000000005 000000000000001b
>> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> ffff000009710027 0000000000000001
>> [    0.279667] fac0: 0000000000000001 000000000000001b
>> 0000000000000000 ffff0000088be820
>> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> ffff00000849b4f8 ffff000008ccfb40
>> [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> ffff000008ccfb40 ffff000008260a18
>> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> ffff000008ccfb40 ffff0000084a6764
>> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> [    0.399160] Kernel panic - not syncing: Fatal exception
>> [    0.404437] Rebooting in 10 seconds.
>>
>> So, I think the linear mapping done by the primary kernel does not
>> make these accessible in the crash kernel directly.
>>
>> Any pointers?
>
> Can you get the code line number for acpi_ns_lookup+0x25c?

gdb points to the following code line number:

(gdb) list *(acpi_ns_lookup+0x25c)
0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
572                }
573            }
574
575            /* Extract one ACPI name from the front of the pathname */
576
577            ACPI_MOVE_32_TO_32(&simple_name, path);
578
579            /* Try to find the single (4 character) ACPI name */
580
581            status =
(gdb)

i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

addr2line also confirms the same:

# addr2line -e  vmlinux ffff0000084aa250
/root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577


Regards,
Bhupesh


>>
>> Regards,
>> Bhupesh
>>
>> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> via a kernel command line parameter, "memmap=".
>> >>
>> _______________________________________________
>> kexec mailing list -- kexec@lists.fedoraproject.org
>> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 21:28                                                               ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dave,

On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec at lists.infradead.org
>
> Also add linux-acpi list
> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> <takahiro.akashi@linaro.org> wrote:
>> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> >> <takahiro.akashi@linaro.org> wrote:
>> >>> >> > Bhupesh, Ard,
>> >>> >> >
>> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> >> >> Hi Ard, Akashi
>> >>> >> >>
>> >>> >> > (snip)
>> >>> >> >
>> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> >> >> , for details)
>> >>> >> >
>> >>> >> > Right.
>> >>> >> >
>> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> >> >> with the crashkernel memory range:
>> >>> >> >>
>> >>> >> >>                 /* add linux,usable-memory-range */
>> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> >> >>                                 address_cells, size_cells);
>> >>> >> >>
>> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> >> >> , for details)
>> >>> >> >>
>> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> >> >>
>> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> >> >> ACPI memory and crashes while trying to access the same:
>> >>> >> >>
>> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> >> >> -r`.img --reuse-cmdline -d
>> >>> >> >>
>> >>> >> >> [snip..]
>> >>> >> >>
>> >>> >> >> Reserved memory range
>> >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >>> >> >>
>> >>> >> >> Coredump memory ranges
>> >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >>> >> >> 000000002e800000-000000003961ffff (0)
>> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> >> >> 000000a000000000-000000affbffffff (0)
>> >>> >> >>
>> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> >> >> memory cap'ing passed to the crash kernel inside
>> >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> >> >>
>> >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >>> >> >> {
>> >>> >> >>         struct memblock_region reg = {
>> >>> >> >>                 .size = 0,
>> >>> >> >>         };
>> >>> >> >>
>> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> >> >>
>> >>> >> >>         if (reg.size)
>> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> >> >> comment this out */
>> >>> >> >> }
>> >>> >> >
>> >>> >> > Please just don't do that. It can cause a fatal damage on
>> >>> >> > memory contents of the *crashed* kernel.
>> >>> >> >
>> >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >>> >> >>
>> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> >> >> fail.
>> >>> >> >>
>> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> >> >> dt node 'linux,usable-memory-range'
>> >>> >> >
>> >>> >> > I still don't understand why we need to carry over the information
>> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> >> > such regions are free to be reused by the kernel after some point of
>> >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >>> >> >
>> >>> >>
>> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> >> kernel, those regions needs to be preserved, which is why they are
>> >>> >> memblock_reserve()'d now.
>> >>> >
>> >>> > For my better understandings, who is actually accessing such regions
>> >>> > during boot time, uefi itself or efistub?
>> >>> >
>> >>>
>> >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >>> instance, on QEMU we have
>> >>>
>> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>>   01000013)
>> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> BXPC 00000001)
>> >>>
>> >>> covered by
>> >>>
>> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>>  ...
>> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>
>> >> OK. I mistakenly understood those regions could be freed after exiting
>> >> UEFI boot services.
>> >>
>> >>>
>> >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> >> when booting the next kernel.
>> >>> >
>> >>> > not really.
>> >>> >
>> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> >> > on crash dump kernel?)
>> >>> >> >
>> >>> >>
>> >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> >> regions only revealed the bug, not created it (given that other
>> >>> >> memblock_reserve regions may be affected as well)
>> >>> >
>> >>> > As whether we should honor such reserved regions over kexec'ing
>> >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > As a matter of fact, no information about "reserved" memblocks is
>> >>> > exposed to user space (via proc/iomem).
>> >>> >
>> >>>
>> >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >>> as 'System RAM'. Do you think that could solve this?
>> >>
>> >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> marking them under another name in /proc/iomem would also be good in order
>> >> not to allocate them as part of crash kernel's memory.
>> >>
>> >
>> > I agree. However, this may not be entirely trivial, since iterating
>> > over the memblock_reserved table and creating iomem entries may result
>> > in collisions.
>>
>> I found a method (using the patch I shared earlier in this thread) to mark these
>> entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> reserved regions.
>>
>> >> But I'm not still convinced that we should export them in useable-
>> >> memory-range to crash dump kernel. They will be accessed through
>> >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> (or memblocks), I guess.
>> >
>> > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > which is exactly what we want in this case.
>>
>> Now this is what is confusing me. I don't see the above happening.
>>
>> I see that the primary kernel boots up and adds the ACPI regions via:
>> acpi_os_ioremap
>>     -> ioremap_cache
>>
>> But during the crashkernel boot, ''acpi_os_ioremap' calls
>> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> variant.
>>
>> And it fails while accessing the ACPI tables:
>>
>> [    0.039205] ACPI: Core revision 20170728
>> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> [    0.100022] Modules linked in:
>> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> pstate: 60000045
>> [    0.132647] sp : ffff000008ccfb40
>> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> [    0.223224] Call trace:
>> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> [    0.232194] fa00: 0000000000000000 ffff000009710027
>> ffff0000095e3980 ffff000008ccfbe0
>> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> ffff000008ccfc50 0000000000000000
>> [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> 00000000ffffff76 0000000000000006
>> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> 000000000000038e 0000000000000000
>> [    0.263843] fa80: 0000000000000000 0000000000000000
>> 0000000000000005 000000000000001b
>> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> ffff000009710027 0000000000000001
>> [    0.279667] fac0: 0000000000000001 000000000000001b
>> 0000000000000000 ffff0000088be820
>> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> ffff00000849b4f8 ffff000008ccfb40
>> [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> ffff000008ccfb40 ffff000008260a18
>> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> ffff000008ccfb40 ffff0000084a6764
>> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> [    0.399160] Kernel panic - not syncing: Fatal exception
>> [    0.404437] Rebooting in 10 seconds.
>>
>> So, I think the linear mapping done by the primary kernel does not
>> make these accessible in the crash kernel directly.
>>
>> Any pointers?
>
> Can you get the code line number for acpi_ns_lookup+0x25c?

gdb points to the following code line number:

(gdb) list *(acpi_ns_lookup+0x25c)
0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
572                }
573            }
574
575            /* Extract one ACPI name from the front of the pathname */
576
577            ACPI_MOVE_32_TO_32(&simple_name, path);
578
579            /* Try to find the single (4 character) ACPI name */
580
581            status =
(gdb)

i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

addr2line also confirms the same:

# addr2line -e  vmlinux ffff0000084aa250
/root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577


Regards,
Bhupesh


>>
>> Regards,
>> Bhupesh
>>
>> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> via a kernel command line parameter, "memmap=".
>> >>
>> _______________________________________________
>> kexec mailing list -- kexec at lists.fedoraproject.org
>> To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 21:28                                                               ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw)
  To: Dave Young
  Cc: Mark Rutland, linux-efi, AKASHI Takahiro, Matt Fleming,
	Ard Biesheuvel, kexec, linux-kernel, linux-acpi, James Morse,
	Bhupesh SHARMA, linux-arm-kernel

Hi Dave,

On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> to kexec@lists.infradead.org
>
> Also add linux-acpi list
> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> > On 15 December 2017 at 09:59, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> <takahiro.akashi@linaro.org> wrote:
>> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> >> <takahiro.akashi@linaro.org> wrote:
>> >>> >> > Bhupesh, Ard,
>> >>> >> >
>> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> >> >> Hi Ard, Akashi
>> >>> >> >>
>> >>> >> > (snip)
>> >>> >> >
>> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> >> >> , for details)
>> >>> >> >
>> >>> >> > Right.
>> >>> >> >
>> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> >> >> with the crashkernel memory range:
>> >>> >> >>
>> >>> >> >>                 /* add linux,usable-memory-range */
>> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> >> >>                                 address_cells, size_cells);
>> >>> >> >>
>> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> >> >> , for details)
>> >>> >> >>
>> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> >> >>
>> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> >> >> ACPI memory and crashes while trying to access the same:
>> >>> >> >>
>> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> >> >> -r`.img --reuse-cmdline -d
>> >>> >> >>
>> >>> >> >> [snip..]
>> >>> >> >>
>> >>> >> >> Reserved memory range
>> >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >>> >> >>
>> >>> >> >> Coredump memory ranges
>> >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >>> >> >> 000000002e800000-000000003961ffff (0)
>> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> >> >> 000000a000000000-000000affbffffff (0)
>> >>> >> >>
>> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> >> >> memory cap'ing passed to the crash kernel inside
>> >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> >> >>
>> >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >>> >> >> {
>> >>> >> >>         struct memblock_region reg = {
>> >>> >> >>                 .size = 0,
>> >>> >> >>         };
>> >>> >> >>
>> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> >> >>
>> >>> >> >>         if (reg.size)
>> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> >> >> comment this out */
>> >>> >> >> }
>> >>> >> >
>> >>> >> > Please just don't do that. It can cause a fatal damage on
>> >>> >> > memory contents of the *crashed* kernel.
>> >>> >> >
>> >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >>> >> >>
>> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> >> >> fail.
>> >>> >> >>
>> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> >> >> dt node 'linux,usable-memory-range'
>> >>> >> >
>> >>> >> > I still don't understand why we need to carry over the information
>> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> >> > such regions are free to be reused by the kernel after some point of
>> >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >>> >> >
>> >>> >>
>> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> >> kernel, those regions needs to be preserved, which is why they are
>> >>> >> memblock_reserve()'d now.
>> >>> >
>> >>> > For my better understandings, who is actually accessing such regions
>> >>> > during boot time, uefi itself or efistub?
>> >>> >
>> >>>
>> >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >>> instance, on QEMU we have
>> >>>
>> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>>   01000013)
>> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> BXPC 00000001)
>> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> BXPC 00000001)
>> >>>
>> >>> covered by
>> >>>
>> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>>  ...
>> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>
>> >> OK. I mistakenly understood those regions could be freed after exiting
>> >> UEFI boot services.
>> >>
>> >>>
>> >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> >> when booting the next kernel.
>> >>> >
>> >>> > not really.
>> >>> >
>> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> >> > on crash dump kernel?)
>> >>> >> >
>> >>> >>
>> >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> >> regions only revealed the bug, not created it (given that other
>> >>> >> memblock_reserve regions may be affected as well)
>> >>> >
>> >>> > As whether we should honor such reserved regions over kexec'ing
>> >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > As a matter of fact, no information about "reserved" memblocks is
>> >>> > exposed to user space (via proc/iomem).
>> >>> >
>> >>>
>> >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >>> as 'System RAM'. Do you think that could solve this?
>> >>
>> >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> marking them under another name in /proc/iomem would also be good in order
>> >> not to allocate them as part of crash kernel's memory.
>> >>
>> >
>> > I agree. However, this may not be entirely trivial, since iterating
>> > over the memblock_reserved table and creating iomem entries may result
>> > in collisions.
>>
>> I found a method (using the patch I shared earlier in this thread) to mark these
>> entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> reserved regions.
>>
>> >> But I'm not still convinced that we should export them in useable-
>> >> memory-range to crash dump kernel. They will be accessed through
>> >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> (or memblocks), I guess.
>> >
>> > Agreed. They will be covered by the linear mapping in the boot kernel,
>> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> > which is exactly what we want in this case.
>>
>> Now this is what is confusing me. I don't see the above happening.
>>
>> I see that the primary kernel boots up and adds the ACPI regions via:
>> acpi_os_ioremap
>>     -> ioremap_cache
>>
>> But during the crashkernel boot, ''acpi_os_ioremap' calls
>> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> variant.
>>
>> And it fails while accessing the ACPI tables:
>>
>> [    0.039205] ACPI: Core revision 20170728
>> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> [    0.100022] Modules linked in:
>> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> pstate: 60000045
>> [    0.132647] sp : ffff000008ccfb40
>> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> [    0.223224] Call trace:
>> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> [    0.232194] fa00: 0000000000000000 ffff000009710027
>> ffff0000095e3980 ffff000008ccfbe0
>> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> ffff000008ccfc50 0000000000000000
>> [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> 00000000ffffff76 0000000000000006
>> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> 000000000000038e 0000000000000000
>> [    0.263843] fa80: 0000000000000000 0000000000000000
>> 0000000000000005 000000000000001b
>> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> ffff000009710027 0000000000000001
>> [    0.279667] fac0: 0000000000000001 000000000000001b
>> 0000000000000000 ffff0000088be820
>> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> ffff00000849b4f8 ffff000008ccfb40
>> [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> ffff000008ccfb40 ffff000008260a18
>> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> ffff000008ccfb40 ffff0000084a6764
>> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> [    0.399160] Kernel panic - not syncing: Fatal exception
>> [    0.404437] Rebooting in 10 seconds.
>>
>> So, I think the linear mapping done by the primary kernel does not
>> make these accessible in the crash kernel directly.
>>
>> Any pointers?
>
> Can you get the code line number for acpi_ns_lookup+0x25c?

gdb points to the following code line number:

(gdb) list *(acpi_ns_lookup+0x25c)
0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
572                }
573            }
574
575            /* Extract one ACPI name from the front of the pathname */
576
577            ACPI_MOVE_32_TO_32(&simple_name, path);
578
579            /* Try to find the single (4 character) ACPI name */
580
581            status =
(gdb)

i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

addr2line also confirms the same:

# addr2line -e  vmlinux ffff0000084aa250
/root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577


Regards,
Bhupesh


>>
>> Regards,
>> Bhupesh
>>
>> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> via a kernel command line parameter, "memmap=".
>> >>
>> _______________________________________________
>> kexec mailing list -- kexec@lists.fedoraproject.org
>> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18 11:18                                                                     ` AKASHI Takahiro
  (?)
  (?)
@ 2017-12-18 22:28                                                                       ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma,
	Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, linux-efi, Mark Rutland,
	Matt Fleming

On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
>> >> to kexec@lists.infradead.org
>> >>
>> >> Also add linux-acpi list
>> >
>> > Thank you.
>> >
>> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> >> > <ard.biesheuvel@linaro.org> wrote:
>> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> >> > > <takahiro.akashi@linaro.org> wrote:
>> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> > >>> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> >> > Bhupesh, Ard,
>> >> > >>> >> >
>> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> > >>> >> >> Hi Ard, Akashi
>> >> > >>> >> >>
>> >> > >>> >> > (snip)
>> >> > >>> >> >
>> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> > >>> >> >> , for details)
>> >> > >>> >> >
>> >> > >>> >> > Right.
>> >> > >>> >> >
>> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> > >>> >> >> with the crashkernel memory range:
>> >> > >>> >> >>
>> >> > >>> >> >>                 /* add linux,usable-memory-range */
>> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> > >>> >> >>                                 address_cells, size_cells);
>> >> > >>> >> >>
>> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> > >>> >> >> , for details)
>> >> > >>> >> >>
>> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> > >>> >> >>
>> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> >> > >>> >> >>
>> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> > >>> >> >> -r`.img --reuse-cmdline -d
>> >> > >>> >> >>
>> >> > >>> >> >> [snip..]
>> >> > >>> >> >>
>> >> > >>> >> >> Reserved memory range
>> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> Coredump memory ranges
>> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >> > >>> >> >>
>> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >> > >>> >> >> {
>> >> > >>> >> >>         struct memblock_region reg = {
>> >> > >>> >> >>                 .size = 0,
>> >> > >>> >> >>         };
>> >> > >>> >> >>
>> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> > >>> >> >>
>> >> > >>> >> >>         if (reg.size)
>> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> > >>> >> >> comment this out */
>> >> > >>> >> >> }
>> >> > >>> >> >
>> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> >> > >>> >> > memory contents of the *crashed* kernel.
>> >> > >>> >> >
>> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >> > >>> >> >>
>> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> > >>> >> >> fail.
>> >> > >>> >> >>
>> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> > >>> >> >> dt node 'linux,usable-memory-range'
>> >> > >>> >> >
>> >> > >>> >> > I still don't understand why we need to carry over the information
>> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> > >>> >> > such regions are free to be reused by the kernel after some point of
>> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> >> > >>> >> memblock_reserve()'d now.
>> >> > >>> >
>> >> > >>> > For my better understandings, who is actually accessing such regions
>> >> > >>> > during boot time, uefi itself or efistub?
>> >> > >>> >
>> >> > >>>
>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >> > >>> instance, on QEMU we have
>> >> > >>>
>> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> > >>>   01000013)
>> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>
>> >> > >>> covered by
>> >> > >>>
>> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> > >>>  ...
>> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> > >>
>> >> > >> OK. I mistakenly understood those regions could be freed after exiting
>> >> > >> UEFI boot services.
>> >> > >>
>> >> > >>>
>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >> > >>> >> when booting the next kernel.
>> >> > >>> >
>> >> > >>> > not really.
>> >> > >>> >
>> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> > >>> >> > on crash dump kernel?)
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >> > >>> >> regions only revealed the bug, not created it (given that other
>> >> > >>> >> memblock_reserve regions may be affected as well)
>> >> > >>> >
>> >> > >>> > As whether we should honor such reserved regions over kexec'ing
>> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> >> > >>> > exposed to user space (via proc/iomem).
>> >> > >>> >
>> >> > >>>
>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >> > >>> as 'System RAM'. Do you think that could solve this?
>> >> > >>
>> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> > >> marking them under another name in /proc/iomem would also be good in order
>> >> > >> not to allocate them as part of crash kernel's memory.
>> >> > >>
>> >> > >
>> >> > > I agree. However, this may not be entirely trivial, since iterating
>> >> > > over the memblock_reserved table and creating iomem entries may result
>> >> > > in collisions.
>> >> >
>> >> > I found a method (using the patch I shared earlier in this thread) to mark these
>> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> >> > reserved regions.
>> >> >
>> >> > >> But I'm not still convinced that we should export them in useable-
>> >> > >> memory-range to crash dump kernel. They will be accessed through
>> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> > >> (or memblocks), I guess.
>> >> > >
>> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> >> > > which is exactly what we want in this case.
>> >> >
>> >> > Now this is what is confusing me. I don't see the above happening.
>> >> >
>> >> > I see that the primary kernel boots up and adds the ACPI regions via:
>> >> > acpi_os_ioremap
>> >> >     -> ioremap_cache
>> >> >
>> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> >> > variant.
>> >
>> > It is natural if that region is out of memblocks.
>>
>> Thanks for the confirmation. This was my understanding as well.
>>
>> >> > And it fails while accessing the ACPI tables:
>> >> >
>> >> > [    0.039205] ACPI: Core revision 20170728
>> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> >
>> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
>> > As ioremap() makes the mapping as "Device memory", unaligned memory
>> > access won't be allowed.
>> >
>> >> > [    0.100022] Modules linked in:
>> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> >> > pstate: 60000045
>> >> > [    0.132647] sp : ffff000008ccfb40
>> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> >> > [    0.223224] Call trace:
>> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> >> > ffff0000095e3980 ffff000008ccfbe0
>> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> >> > ffff000008ccfc50 0000000000000000
>> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> >> > 00000000ffffff76 0000000000000006
>> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> >> > 000000000000038e 0000000000000000
>> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> >> > 0000000000000005 000000000000001b
>> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> >> > ffff000009710027 0000000000000001
>> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> >> > 0000000000000000 ffff0000088be820
>> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> >> > ffff00000849b4f8 ffff000008ccfb40
>> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> >> > ffff000008ccfb40 ffff000008260a18
>> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> >> > ffff000008ccfb40 ffff0000084a6764
>> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> >> > [    0.404437] Rebooting in 10 seconds.
>> >> >
>> >> > So, I think the linear mapping done by the primary kernel does not
>> >> > make these accessible in the crash kernel directly.
>> >> >
>> >> > Any pointers?
>> >>
>> >> Can you get the code line number for acpi_ns_lookup+0x25c?
>> >
>> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
>> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
>> > accesses?
>> > (I didn't find out how unaligned accesses could happen there.)
>> >
>>
>> Right. Like I captured somewhere in this thread (perhaps the first
>> email on this subject),
>> this is indeed an unaligned address access.
>>
>> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
>> assigning this memory range
>> as device memory doesn't seem a neat solution as it means we are not
>> marking some thing with the right memory attribute and we can fall in
>> similar/related issues later.
>>
>> Regarding the later suggestion, what I am seeing now is that the acpi
>> table access functions are perhaps reused from the earlier x86
>> implementation, but on the arm64 (or even arm) arch we should not be
>> allowing unaligned accesses which might cause UNDEFINED behaviour and
>> resultant crash.
>>
>> So I can try going this approach and see if it works for me.
>>
>> However, I am still not very sure as to why the crashkernel ranges
>> historically do not include the System RAM regions (which may include
>> the ACPI regions as well). These regions are available for the kernel
>> usage and perhaps should be exported to the crashkernel as well.
>>
>> I am not fully aware of the previous discussions on capp'ing the
>> crashkernel memory being passed to the kdump kernel, but did we run
>> into any issues while doing so?
>>
>> Also, even if I extend the kexec-tools to modify the
>> linux,usable-memory-range and add the ACPI regions to it, the
>> crashkernel fails to boot with the below message (I have added some
>> logic to print the DTB on the crash kernel boot start):
>>
>> [    0.000000]     chosen {
>> [    0.000000]         linux,usable-memory-range
>> [    0.000000]  = <
>> [    0.000000] 0x00000000
>> [    0.000000] 0x0e800000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x20000000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x396c0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x000a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x39770000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00040000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x398a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00020000
>> [    0.000000] >
>> [    0.000000] ;
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>
> Please show me:
>  * "Virtual kernel memory layout" in dmesg
>  * /proc/iomem
>  * debug messages from kexec-tools (kexec -d)

So here are the changes which I have done so far in the kernel and
kexec-tools to allow mapping ACPI reclaim regions as identifiable
regions in '/proc/iomem' and to append them to the DTB property:
linux,usable-memory-range:

Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>,
and

<https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848>

kexec-tools patch:
<https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650>

Note that I am not very clear about the hole margins that the
kexec-tools adds (so that the crashkernel's expectation that the
kernel image and initrd lie within a 1G boundary), so I have not added
my temporary changes to the github code - but any suggestions on how
to correctly put them in place would be appreciated.

And here are the rest of the inputs you asked for:

(1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout"

[    0.000000] Kernel command line:
BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
[    0.000000] PCIe ASPM is disabled
[    0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB)
mapped at [        (ptrval)-        (ptrval)]
[    0.000000] Memory: 267251520K/268169216K available (7868K kernel
code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K
reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0x        (ptrval) - 0x        (ptrval)
(  7872 KB)
[    0.000000]     .rodata : 0x        (ptrval) - 0x        (ptrval)
(  3392 KB)
[    0.000000]       .init : 0x        (ptrval) - 0x        (ptrval)
(  1280 KB)
[    0.000000]       .data : 0x        (ptrval) - 0x        (ptrval)
(  1765 KB)
[    0.000000]        .bss : 0x        (ptrval) - 0x        (ptrval)
(  7728 KB)
[    0.000000]     fixed   : 0xffff7fdffe7b0000 - 0xffff7fdffec00000
(  4416 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7fe000000000 - 0xffff7fe02bff0000
(   703 MB actual)
[    0.000000]     memory  : 0xffff800000000000 - 0xffff80affc000000
(720832 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4
[    0.000000] ftrace: allocating 29903 entries in 8 pages
[    0.000000] Hierarchical RCU implementation.

(2) # cat /proc/iomem
00000000-3961ffff : System RAM
  00080000-00b7ffff : Kernel code
  00cc0000-0166ffff : Kernel data
  0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : ACPI reclaim region
39760000-3976ffff : reserved
39770000-397affff : ACPI reclaim region
397b0000-3989ffff : reserved
398a0000-398bffff : ACPI reclaim region
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM
3ed30000-3ed5ffff : reserved
3ed60000-3fbfffff : System RAM
40500000-40500fff : sbsa-gwdt.0
  40500000-40500fff : sbsa-gwdt.0
40600000-40600fff : sbsa-gwdt.0
  40600000-40600fff : sbsa-gwdt.0
60080000-6008ffff : HISI0152:00
602b0000-602b0fff : ARMH0011:00
  602b0000-602b0fff : ARMH0011:00
603c0000-603cffff : HISI0141:00
  603c0000-603cffff : HISI0141:00
a0080000-a008ffff : HISI0152:05
  a0080000-a008ffff : HISI0152:04
    a0080000-a008ffff : HISI0152:03
a00a0000-a00affff : pnp 00:01
a01b0000-a01b0fff : HISI0191:00
a2000000-a200ffff : HISI0162:01
  a2000000-a200ffff : HISI0162:01
a3000000-a300ffff : HISI0162:02
  a3000000-a300ffff : HISI0162:02
a7020000-a702ffff : PNP0D20:00
  a7020000-a702ffff : PNP0D20:00
b0000000-be7fffff : PCI Bus 0002:e8
  b0000000-b06fffff : PCI Bus 0002:e9
    b0000000-b00fffff : 0002:e9:00.0
      b0000000-b00fffff : igb
    b0100000-b01fffff : 0002:e9:00.0
    b0200000-b02fffff : 0002:e9:00.1
      b0200000-b02fffff : igb
    b0300000-b03fffff : 0002:e9:00.1
    b0400000-b04fffff : 0002:e9:00.2
      b0400000-b04fffff : igb
    b0500000-b05fffff : 0002:e9:00.3
      b0500000-b05fffff : igb
    b0600000-b0603fff : 0002:e9:00.0
      b0600000-b0603fff : igb
    b0604000-b0607fff : 0002:e9:00.1
      b0604000-b0607fff : igb
    b0608000-b060bfff : 0002:e9:00.2
      b0608000-b060bfff : igb
    b060c000-b060ffff : 0002:e9:00.3
      b060c000-b060ffff : igb
  b0700000-b0afffff : PCI Bus 0002:e9
    b0700000-b077ffff : 0002:e9:00.0
    b0780000-b07fffff : 0002:e9:00.0
    b0800000-b087ffff : 0002:e9:00.1
    b0880000-b08fffff : 0002:e9:00.1
    b0900000-b097ffff : 0002:e9:00.2
    b0980000-b09fffff : 0002:e9:00.2
    b0a00000-b0a7ffff : 0002:e9:00.3
    b0a80000-b0afffff : 0002:e9:00.3
  b0b00000-b0b0ffff : 0002:e8:00.0
be800000-beffffff : PCI ECAM
c0080000-c008ffff : HISI0152:02
  c0080000-c008ffff : HISI0152:01
c3000000-c300ffff : HISI0162:00
  c3000000-c300ffff : HISI0162:00
c5000000-c588ffff : HISI00B2:00
  c5000000-c588ffff : HISI00B2:00
c7000000-c705ffff : HISI00B2:00
  c7000000-c705ffff : HISI00B2:00
d0080000-d008ffff : HISI0152:07
  d0080000-d008ffff : HISI0152:06
d0100000-d010ffff : HISI02A1:00
  d0100000-d010ffff : HISI02A1:00
400000000-4007fffff : PCI ECAM
440000000-4ffffffff : PCI Bus 0005:00
  440000000-4407fffff : PCI Bus 0005:01
    440000000-4403fffff : 0005:01:00.0
    440400000-4407fffff : 0005:01:00.1
  440800000-4421fffff : PCI Bus 0005:01
    440800000-440bfffff : 0005:01:00.0
      440800000-440bfffff : ixgbe
    440c00000-440ffffff : 0005:01:00.1
      440c00000-440ffffff : ixgbe
    441000000-4413fffff : 0005:01:00.0
    441400000-4417fffff : 0005:01:00.0
    441800000-441bfffff : 0005:01:00.1
    441c00000-441ffffff : 0005:01:00.1
    442000000-442003fff : 0005:01:00.0
      442000000-442003fff : ixgbe
    442004000-442007fff : 0005:01:00.1
      442004000-442007fff : ixgbe
  442200000-442200fff : 0005:00:00.0
700090000-70009ffff : pnp 00:03
7000a0000-7000affff : pnp 00:05
7000b0000-7000bffff : pnp 00:06
700200000-70020ffff : pnp 00:04
740800000-740ffffff : PCI ECAM
741000000-77ffeffff : PCI Bus 0006:08
  741000000-74100ffff : 0006:08:00.0
784000000-7847fffff : PCI ECAM
784800000-7bffeffff : PCI Bus 0007:40
  784800000-7849fffff : PCI Bus 0007:41
    784800000-7849fffff : 0007:41:00.0
  786000000-787ffffff : PCI Bus 0007:41
    786000000-787ffffff : 0007:41:00.0
7c4800000-7c4ffffff : PCI ECAM
7c5000000-7fffeffff : PCI Bus 0004:48
  7c5000000-7c51fffff : PCI Bus 0004:49
    7c5000000-7c50fffff : 0004:49:00.0
    7c5100000-7c513ffff : 0004:49:00.0
      7c5100000-7c513ffff : mpt3sas
    7c5140000-7c514ffff : 0004:49:00.0
      7c5140000-7c514ffff : mpt3sas
  7c5200000-7c520ffff : 0004:48:00.0
1040000000-1ffbffffff : System RAM
2000000000-2ffbffffff : System RAM
9000000000-9ffbffffff : System RAM
a000000000-affbffffff : System RAM
400c0080000-400c008ffff : HISI0152:08
600a00a0000-600a00affff : pnp 00:08
64001000000-64001ffffff : PCI ECAM
65040000000-650ffffffff : PCI Bus 000a:10
  65040000000-6504000ffff : 000a:10:00.0
700a0090000-700a009ffff : pnp 00:0a
700a0200000-700a020ffff : pnp 00:0b
74002000000-74002ffffff : PCI ECAM
75040000000-750ffffffff : PCI Bus 000c:20
  75040000000-7504000ffff : 000c:20:00.0
78003000000-78003ffffff : PCI ECAM
79040000000-790ffffffff : PCI Bus 000d:30
  79040000000-79040000fff : 000d:30:00.0

(3)

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
-r`.img --reuse-cmdline -d
arch_process_options:149: command_line:
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img
arch_process_options:152: dtb: (null)
Try gzip decompression.
kernel: 0xffff968d0010 kernel_size: 0xdf9200
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
image_arm64_load: kernel_segment: 000000000e800000
image_arm64_load: text_offset:    0000000000080000
image_arm64_load: image_size:     00000000015f0000
image_arm64_load: phys_offset:    0000000000000000
image_arm64_load: vp_offset:      ffffffffffffffff
image_arm64_load: PE format:      yes
Reserved memory range
000000000e800000-000000002e7fffff (0)
Coredump memory ranges
0000000000000000-000000000e7fffff (0)
000000002e800000-000000003961ffff (0)
0000000039d40000-000000003ed2ffff (0)
000000003ed60000-000000003fbfffff (0)
0000001040000000-0000001ffbffffff (0)
0000002000000000-0000002ffbffffff (0)
0000009000000000-0000009ffbffffff (0)
000000a000000000-000000affbffffff (0)
ACPI reclaim memory ranges
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
crashkernel memory ranges
000000000e800000-000000002e7fffff (0)
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
kernel symbol _text vaddr = ffff000008080000
load_crashdump_segments: page_offset:   ffff800000000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr =
0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000
Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000
p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000
Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000
p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000
Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000
p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000
Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000
p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz =
0xfbc000000
Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000
p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000
p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000
p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000
p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000
Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000
p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000
Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000
p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000
load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff
read_1st_dtb: found /sys/firmware/fdt
get_cells_size: #address-cells:2 #size-cells:2
cells_size_fitted: 2e7f0000-2e7f0fff
cells_size_fitted: e800000-2e7fffff
cells_size_fitted: 396c0000-3975ffff
cells_size_fitted: 39770000-397affff
cells_size_fitted: 398a0000-398bffff
 / {
    #size-cells = <0x00000002>;
    #address-cells = <0x00000002>;
    chosen {
        linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000
0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000
0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000
0x00020000>;
        linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>;
        linux,uefi-mmap-desc-ver = <0x00000001>;
        linux,uefi-mmap-desc-size = <0x00000030>;
        linux,uefi-mmap-size = <0x00000e40>;
        linux,uefi-mmap-start = <0x00000000 0x30288018>;
        linux,uefi-system-table = <0x00000000 0x3ed50018>;
        bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200";
        linux,initrd-end = <0x00000000 0x2fbff9e0>;
        linux,initrd-start = <0x00000000 0x2e84d000>;
    };
 };
initrd: base fe70000, size 13b29e0h (20654560), end 112229e0

[snip..]

sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c
sym: sha256_starts value: 11240eb0 addr: 11240018
machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6
sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c
sym: sha256_update value: 11245158 addr: 11240034
machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449
sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc
sym: sha256_finish value: 11245164 addr: 11240050
machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445
sym:     memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34
sym: memcmp value: 11240634 addr: 11240060
machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240070
machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240078
machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240088
machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400a8
machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400b0
machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400c0
machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400d4
machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112453a8 addr: 112400f0
machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245338 addr: 112400f8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245358 addr: 11240100
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245368 addr: 11240108
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 1124536e addr: 11240110
machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245370 addr: 11240118
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 1124012c
machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106
sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4
sym: setup_arch value: 11240ea8 addr: 11240130
machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e
sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0
sym: verify_sha256_digest value: 11240000 addr: 11240134
machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3
sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4
sym: post_verification_setup_arch value: 11240ea4 addr: 11240144
machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245380 addr: 11240148
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 112401ac
machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240220
machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240478
machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245392 addr: 112404b8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 11240538
machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 112405c8
machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2
sym:  purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28
sym: purgatory value: 11240120 addr: 11240678
machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa
sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8
sym: arm64_kernel_entry value: 112454c8 addr: 1124067c
machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271
sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8
sym: arm64_dtb_addr value: 112454d0 addr: 11240680
machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 112450bc
machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245118
machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245130
machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 1124513c
machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112454d8 addr: 11245330
machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8
kexec_load: entry = 0x11240670 flags = 0xb70001
nr_segments = 5
segment[0].buf   = 0xffff968d0010
segment[0].bufsz = 0xdf9200
segment[0].mem   = 0xe880000
segment[0].memsz = 0x15f0000
segment[1].buf   = 0xffff950e0010
segment[1].bufsz = 0x13b29e0
segment[1].mem   = 0xfe70000
segment[1].memsz = 0x13c0000
segment[2].buf   = 0x1115b440
segment[2].bufsz = 0x33d
segment[2].mem   = 0x11230000
segment[2].memsz = 0x10000
segment[3].buf   = 0x1115bb70
segment[3].bufsz = 0x5518
segment[3].mem   = 0x11240000
segment[3].memsz = 0x10000
segment[4].buf   = 0x11159ca0
segment[4].bufsz = 0x1000
segment[4].mem   = 0x2e7f0000
segment[4].memsz = 0x10000

Regards,
Bhupesh

>
>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec@lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 22:28                                                                       ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma,
	Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, linux-efi, Mark Rutland,
	Matt Fleming

On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
>> >> to kexec@lists.infradead.org
>> >>
>> >> Also add linux-acpi list
>> >
>> > Thank you.
>> >
>> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> >> > <ard.biesheuvel@linaro.org> wrote:
>> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> >> > > <takahiro.akashi@linaro.org> wrote:
>> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> > >>> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> >> > Bhupesh, Ard,
>> >> > >>> >> >
>> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> > >>> >> >> Hi Ard, Akashi
>> >> > >>> >> >>
>> >> > >>> >> > (snip)
>> >> > >>> >> >
>> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> > >>> >> >> , for details)
>> >> > >>> >> >
>> >> > >>> >> > Right.
>> >> > >>> >> >
>> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> > >>> >> >> with the crashkernel memory range:
>> >> > >>> >> >>
>> >> > >>> >> >>                 /* add linux,usable-memory-range */
>> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> > >>> >> >>                                 address_cells, size_cells);
>> >> > >>> >> >>
>> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> > >>> >> >> , for details)
>> >> > >>> >> >>
>> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> > >>> >> >>
>> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> >> > >>> >> >>
>> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> > >>> >> >> -r`.img --reuse-cmdline -d
>> >> > >>> >> >>
>> >> > >>> >> >> [snip..]
>> >> > >>> >> >>
>> >> > >>> >> >> Reserved memory range
>> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> Coredump memory ranges
>> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >> > >>> >> >>
>> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >> > >>> >> >> {
>> >> > >>> >> >>         struct memblock_region reg = {
>> >> > >>> >> >>                 .size = 0,
>> >> > >>> >> >>         };
>> >> > >>> >> >>
>> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> > >>> >> >>
>> >> > >>> >> >>         if (reg.size)
>> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> > >>> >> >> comment this out */
>> >> > >>> >> >> }
>> >> > >>> >> >
>> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> >> > >>> >> > memory contents of the *crashed* kernel.
>> >> > >>> >> >
>> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >> > >>> >> >>
>> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> > >>> >> >> fail.
>> >> > >>> >> >>
>> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> > >>> >> >> dt node 'linux,usable-memory-range'
>> >> > >>> >> >
>> >> > >>> >> > I still don't understand why we need to carry over the information
>> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> > >>> >> > such regions are free to be reused by the kernel after some point of
>> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> >> > >>> >> memblock_reserve()'d now.
>> >> > >>> >
>> >> > >>> > For my better understandings, who is actually accessing such regions
>> >> > >>> > during boot time, uefi itself or efistub?
>> >> > >>> >
>> >> > >>>
>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >> > >>> instance, on QEMU we have
>> >> > >>>
>> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> > >>>   01000013)
>> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>
>> >> > >>> covered by
>> >> > >>>
>> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> > >>>  ...
>> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> > >>
>> >> > >> OK. I mistakenly understood those regions could be freed after exiting
>> >> > >> UEFI boot services.
>> >> > >>
>> >> > >>>
>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >> > >>> >> when booting the next kernel.
>> >> > >>> >
>> >> > >>> > not really.
>> >> > >>> >
>> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> > >>> >> > on crash dump kernel?)
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >> > >>> >> regions only revealed the bug, not created it (given that other
>> >> > >>> >> memblock_reserve regions may be affected as well)
>> >> > >>> >
>> >> > >>> > As whether we should honor such reserved regions over kexec'ing
>> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> >> > >>> > exposed to user space (via proc/iomem).
>> >> > >>> >
>> >> > >>>
>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >> > >>> as 'System RAM'. Do you think that could solve this?
>> >> > >>
>> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> > >> marking them under another name in /proc/iomem would also be good in order
>> >> > >> not to allocate them as part of crash kernel's memory.
>> >> > >>
>> >> > >
>> >> > > I agree. However, this may not be entirely trivial, since iterating
>> >> > > over the memblock_reserved table and creating iomem entries may result
>> >> > > in collisions.
>> >> >
>> >> > I found a method (using the patch I shared earlier in this thread) to mark these
>> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> >> > reserved regions.
>> >> >
>> >> > >> But I'm not still convinced that we should export them in useable-
>> >> > >> memory-range to crash dump kernel. They will be accessed through
>> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> > >> (or memblocks), I guess.
>> >> > >
>> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> >> > > which is exactly what we want in this case.
>> >> >
>> >> > Now this is what is confusing me. I don't see the above happening.
>> >> >
>> >> > I see that the primary kernel boots up and adds the ACPI regions via:
>> >> > acpi_os_ioremap
>> >> >     -> ioremap_cache
>> >> >
>> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> >> > variant.
>> >
>> > It is natural if that region is out of memblocks.
>>
>> Thanks for the confirmation. This was my understanding as well.
>>
>> >> > And it fails while accessing the ACPI tables:
>> >> >
>> >> > [    0.039205] ACPI: Core revision 20170728
>> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> >
>> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
>> > As ioremap() makes the mapping as "Device memory", unaligned memory
>> > access won't be allowed.
>> >
>> >> > [    0.100022] Modules linked in:
>> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> >> > pstate: 60000045
>> >> > [    0.132647] sp : ffff000008ccfb40
>> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> >> > [    0.223224] Call trace:
>> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> >> > ffff0000095e3980 ffff000008ccfbe0
>> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> >> > ffff000008ccfc50 0000000000000000
>> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> >> > 00000000ffffff76 0000000000000006
>> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> >> > 000000000000038e 0000000000000000
>> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> >> > 0000000000000005 000000000000001b
>> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> >> > ffff000009710027 0000000000000001
>> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> >> > 0000000000000000 ffff0000088be820
>> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> >> > ffff00000849b4f8 ffff000008ccfb40
>> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> >> > ffff000008ccfb40 ffff000008260a18
>> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> >> > ffff000008ccfb40 ffff0000084a6764
>> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> >> > [    0.404437] Rebooting in 10 seconds.
>> >> >
>> >> > So, I think the linear mapping done by the primary kernel does not
>> >> > make these accessible in the crash kernel directly.
>> >> >
>> >> > Any pointers?
>> >>
>> >> Can you get the code line number for acpi_ns_lookup+0x25c?
>> >
>> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
>> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
>> > accesses?
>> > (I didn't find out how unaligned accesses could happen there.)
>> >
>>
>> Right. Like I captured somewhere in this thread (perhaps the first
>> email on this subject),
>> this is indeed an unaligned address access.
>>
>> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
>> assigning this memory range
>> as device memory doesn't seem a neat solution as it means we are not
>> marking some thing with the right memory attribute and we can fall in
>> similar/related issues later.
>>
>> Regarding the later suggestion, what I am seeing now is that the acpi
>> table access functions are perhaps reused from the earlier x86
>> implementation, but on the arm64 (or even arm) arch we should not be
>> allowing unaligned accesses which might cause UNDEFINED behaviour and
>> resultant crash.
>>
>> So I can try going this approach and see if it works for me.
>>
>> However, I am still not very sure as to why the crashkernel ranges
>> historically do not include the System RAM regions (which may include
>> the ACPI regions as well). These regions are available for the kernel
>> usage and perhaps should be exported to the crashkernel as well.
>>
>> I am not fully aware of the previous discussions on capp'ing the
>> crashkernel memory being passed to the kdump kernel, but did we run
>> into any issues while doing so?
>>
>> Also, even if I extend the kexec-tools to modify the
>> linux,usable-memory-range and add the ACPI regions to it, the
>> crashkernel fails to boot with the below message (I have added some
>> logic to print the DTB on the crash kernel boot start):
>>
>> [    0.000000]     chosen {
>> [    0.000000]         linux,usable-memory-range
>> [    0.000000]  = <
>> [    0.000000] 0x00000000
>> [    0.000000] 0x0e800000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x20000000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x396c0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x000a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x39770000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00040000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x398a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00020000
>> [    0.000000] >
>> [    0.000000] ;
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>
> Please show me:
>  * "Virtual kernel memory layout" in dmesg
>  * /proc/iomem
>  * debug messages from kexec-tools (kexec -d)

So here are the changes which I have done so far in the kernel and
kexec-tools to allow mapping ACPI reclaim regions as identifiable
regions in '/proc/iomem' and to append them to the DTB property:
linux,usable-memory-range:

Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>,
and

<https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848>

kexec-tools patch:
<https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650>

Note that I am not very clear about the hole margins that the
kexec-tools adds (so that the crashkernel's expectation that the
kernel image and initrd lie within a 1G boundary), so I have not added
my temporary changes to the github code - but any suggestions on how
to correctly put them in place would be appreciated.

And here are the rest of the inputs you asked for:

(1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout"

[    0.000000] Kernel command line:
BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
[    0.000000] PCIe ASPM is disabled
[    0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB)
mapped at [        (ptrval)-        (ptrval)]
[    0.000000] Memory: 267251520K/268169216K available (7868K kernel
code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K
reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0x        (ptrval) - 0x        (ptrval)
(  7872 KB)
[    0.000000]     .rodata : 0x        (ptrval) - 0x        (ptrval)
(  3392 KB)
[    0.000000]       .init : 0x        (ptrval) - 0x        (ptrval)
(  1280 KB)
[    0.000000]       .data : 0x        (ptrval) - 0x        (ptrval)
(  1765 KB)
[    0.000000]        .bss : 0x        (ptrval) - 0x        (ptrval)
(  7728 KB)
[    0.000000]     fixed   : 0xffff7fdffe7b0000 - 0xffff7fdffec00000
(  4416 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7fe000000000 - 0xffff7fe02bff0000
(   703 MB actual)
[    0.000000]     memory  : 0xffff800000000000 - 0xffff80affc000000
(720832 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4
[    0.000000] ftrace: allocating 29903 entries in 8 pages
[    0.000000] Hierarchical RCU implementation.

(2) # cat /proc/iomem
00000000-3961ffff : System RAM
  00080000-00b7ffff : Kernel code
  00cc0000-0166ffff : Kernel data
  0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : ACPI reclaim region
39760000-3976ffff : reserved
39770000-397affff : ACPI reclaim region
397b0000-3989ffff : reserved
398a0000-398bffff : ACPI reclaim region
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM
3ed30000-3ed5ffff : reserved
3ed60000-3fbfffff : System RAM
40500000-40500fff : sbsa-gwdt.0
  40500000-40500fff : sbsa-gwdt.0
40600000-40600fff : sbsa-gwdt.0
  40600000-40600fff : sbsa-gwdt.0
60080000-6008ffff : HISI0152:00
602b0000-602b0fff : ARMH0011:00
  602b0000-602b0fff : ARMH0011:00
603c0000-603cffff : HISI0141:00
  603c0000-603cffff : HISI0141:00
a0080000-a008ffff : HISI0152:05
  a0080000-a008ffff : HISI0152:04
    a0080000-a008ffff : HISI0152:03
a00a0000-a00affff : pnp 00:01
a01b0000-a01b0fff : HISI0191:00
a2000000-a200ffff : HISI0162:01
  a2000000-a200ffff : HISI0162:01
a3000000-a300ffff : HISI0162:02
  a3000000-a300ffff : HISI0162:02
a7020000-a702ffff : PNP0D20:00
  a7020000-a702ffff : PNP0D20:00
b0000000-be7fffff : PCI Bus 0002:e8
  b0000000-b06fffff : PCI Bus 0002:e9
    b0000000-b00fffff : 0002:e9:00.0
      b0000000-b00fffff : igb
    b0100000-b01fffff : 0002:e9:00.0
    b0200000-b02fffff : 0002:e9:00.1
      b0200000-b02fffff : igb
    b0300000-b03fffff : 0002:e9:00.1
    b0400000-b04fffff : 0002:e9:00.2
      b0400000-b04fffff : igb
    b0500000-b05fffff : 0002:e9:00.3
      b0500000-b05fffff : igb
    b0600000-b0603fff : 0002:e9:00.0
      b0600000-b0603fff : igb
    b0604000-b0607fff : 0002:e9:00.1
      b0604000-b0607fff : igb
    b0608000-b060bfff : 0002:e9:00.2
      b0608000-b060bfff : igb
    b060c000-b060ffff : 0002:e9:00.3
      b060c000-b060ffff : igb
  b0700000-b0afffff : PCI Bus 0002:e9
    b0700000-b077ffff : 0002:e9:00.0
    b0780000-b07fffff : 0002:e9:00.0
    b0800000-b087ffff : 0002:e9:00.1
    b0880000-b08fffff : 0002:e9:00.1
    b0900000-b097ffff : 0002:e9:00.2
    b0980000-b09fffff : 0002:e9:00.2
    b0a00000-b0a7ffff : 0002:e9:00.3
    b0a80000-b0afffff : 0002:e9:00.3
  b0b00000-b0b0ffff : 0002:e8:00.0
be800000-beffffff : PCI ECAM
c0080000-c008ffff : HISI0152:02
  c0080000-c008ffff : HISI0152:01
c3000000-c300ffff : HISI0162:00
  c3000000-c300ffff : HISI0162:00
c5000000-c588ffff : HISI00B2:00
  c5000000-c588ffff : HISI00B2:00
c7000000-c705ffff : HISI00B2:00
  c7000000-c705ffff : HISI00B2:00
d0080000-d008ffff : HISI0152:07
  d0080000-d008ffff : HISI0152:06
d0100000-d010ffff : HISI02A1:00
  d0100000-d010ffff : HISI02A1:00
400000000-4007fffff : PCI ECAM
440000000-4ffffffff : PCI Bus 0005:00
  440000000-4407fffff : PCI Bus 0005:01
    440000000-4403fffff : 0005:01:00.0
    440400000-4407fffff : 0005:01:00.1
  440800000-4421fffff : PCI Bus 0005:01
    440800000-440bfffff : 0005:01:00.0
      440800000-440bfffff : ixgbe
    440c00000-440ffffff : 0005:01:00.1
      440c00000-440ffffff : ixgbe
    441000000-4413fffff : 0005:01:00.0
    441400000-4417fffff : 0005:01:00.0
    441800000-441bfffff : 0005:01:00.1
    441c00000-441ffffff : 0005:01:00.1
    442000000-442003fff : 0005:01:00.0
      442000000-442003fff : ixgbe
    442004000-442007fff : 0005:01:00.1
      442004000-442007fff : ixgbe
  442200000-442200fff : 0005:00:00.0
700090000-70009ffff : pnp 00:03
7000a0000-7000affff : pnp 00:05
7000b0000-7000bffff : pnp 00:06
700200000-70020ffff : pnp 00:04
740800000-740ffffff : PCI ECAM
741000000-77ffeffff : PCI Bus 0006:08
  741000000-74100ffff : 0006:08:00.0
784000000-7847fffff : PCI ECAM
784800000-7bffeffff : PCI Bus 0007:40
  784800000-7849fffff : PCI Bus 0007:41
    784800000-7849fffff : 0007:41:00.0
  786000000-787ffffff : PCI Bus 0007:41
    786000000-787ffffff : 0007:41:00.0
7c4800000-7c4ffffff : PCI ECAM
7c5000000-7fffeffff : PCI Bus 0004:48
  7c5000000-7c51fffff : PCI Bus 0004:49
    7c5000000-7c50fffff : 0004:49:00.0
    7c5100000-7c513ffff : 0004:49:00.0
      7c5100000-7c513ffff : mpt3sas
    7c5140000-7c514ffff : 0004:49:00.0
      7c5140000-7c514ffff : mpt3sas
  7c5200000-7c520ffff : 0004:48:00.0
1040000000-1ffbffffff : System RAM
2000000000-2ffbffffff : System RAM
9000000000-9ffbffffff : System RAM
a000000000-affbffffff : System RAM
400c0080000-400c008ffff : HISI0152:08
600a00a0000-600a00affff : pnp 00:08
64001000000-64001ffffff : PCI ECAM
65040000000-650ffffffff : PCI Bus 000a:10
  65040000000-6504000ffff : 000a:10:00.0
700a0090000-700a009ffff : pnp 00:0a
700a0200000-700a020ffff : pnp 00:0b
74002000000-74002ffffff : PCI ECAM
75040000000-750ffffffff : PCI Bus 000c:20
  75040000000-7504000ffff : 000c:20:00.0
78003000000-78003ffffff : PCI ECAM
79040000000-790ffffffff : PCI Bus 000d:30
  79040000000-79040000fff : 000d:30:00.0

(3)

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
-r`.img --reuse-cmdline -d
arch_process_options:149: command_line:
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img
arch_process_options:152: dtb: (null)
Try gzip decompression.
kernel: 0xffff968d0010 kernel_size: 0xdf9200
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
image_arm64_load: kernel_segment: 000000000e800000
image_arm64_load: text_offset:    0000000000080000
image_arm64_load: image_size:     00000000015f0000
image_arm64_load: phys_offset:    0000000000000000
image_arm64_load: vp_offset:      ffffffffffffffff
image_arm64_load: PE format:      yes
Reserved memory range
000000000e800000-000000002e7fffff (0)
Coredump memory ranges
0000000000000000-000000000e7fffff (0)
000000002e800000-000000003961ffff (0)
0000000039d40000-000000003ed2ffff (0)
000000003ed60000-000000003fbfffff (0)
0000001040000000-0000001ffbffffff (0)
0000002000000000-0000002ffbffffff (0)
0000009000000000-0000009ffbffffff (0)
000000a000000000-000000affbffffff (0)
ACPI reclaim memory ranges
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
crashkernel memory ranges
000000000e800000-000000002e7fffff (0)
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
kernel symbol _text vaddr = ffff000008080000
load_crashdump_segments: page_offset:   ffff800000000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr =
0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000
Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000
p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000
Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000
p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000
Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000
p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000
Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000
p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz =
0xfbc000000
Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000
p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000
p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000
p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000
p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000
Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000
p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000
Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000
p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000
load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff
read_1st_dtb: found /sys/firmware/fdt
get_cells_size: #address-cells:2 #size-cells:2
cells_size_fitted: 2e7f0000-2e7f0fff
cells_size_fitted: e800000-2e7fffff
cells_size_fitted: 396c0000-3975ffff
cells_size_fitted: 39770000-397affff
cells_size_fitted: 398a0000-398bffff
 / {
    #size-cells = <0x00000002>;
    #address-cells = <0x00000002>;
    chosen {
        linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000
0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000
0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000
0x00020000>;
        linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>;
        linux,uefi-mmap-desc-ver = <0x00000001>;
        linux,uefi-mmap-desc-size = <0x00000030>;
        linux,uefi-mmap-size = <0x00000e40>;
        linux,uefi-mmap-start = <0x00000000 0x30288018>;
        linux,uefi-system-table = <0x00000000 0x3ed50018>;
        bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200";
        linux,initrd-end = <0x00000000 0x2fbff9e0>;
        linux,initrd-start = <0x00000000 0x2e84d000>;
    };
 };
initrd: base fe70000, size 13b29e0h (20654560), end 112229e0

[snip..]

sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c
sym: sha256_starts value: 11240eb0 addr: 11240018
machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6
sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c
sym: sha256_update value: 11245158 addr: 11240034
machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449
sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc
sym: sha256_finish value: 11245164 addr: 11240050
machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445
sym:     memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34
sym: memcmp value: 11240634 addr: 11240060
machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240070
machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240078
machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240088
machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400a8
machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400b0
machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400c0
machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400d4
machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112453a8 addr: 112400f0
machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245338 addr: 112400f8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245358 addr: 11240100
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245368 addr: 11240108
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 1124536e addr: 11240110
machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245370 addr: 11240118
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 1124012c
machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106
sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4
sym: setup_arch value: 11240ea8 addr: 11240130
machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e
sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0
sym: verify_sha256_digest value: 11240000 addr: 11240134
machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3
sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4
sym: post_verification_setup_arch value: 11240ea4 addr: 11240144
machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245380 addr: 11240148
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 112401ac
machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240220
machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240478
machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245392 addr: 112404b8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 11240538
machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 112405c8
machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2
sym:  purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28
sym: purgatory value: 11240120 addr: 11240678
machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa
sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8
sym: arm64_kernel_entry value: 112454c8 addr: 1124067c
machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271
sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8
sym: arm64_dtb_addr value: 112454d0 addr: 11240680
machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 112450bc
machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245118
machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245130
machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 1124513c
machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112454d8 addr: 11245330
machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8
kexec_load: entry = 0x11240670 flags = 0xb70001
nr_segments = 5
segment[0].buf   = 0xffff968d0010
segment[0].bufsz = 0xdf9200
segment[0].mem   = 0xe880000
segment[0].memsz = 0x15f0000
segment[1].buf   = 0xffff950e0010
segment[1].bufsz = 0x13b29e0
segment[1].mem   = 0xfe70000
segment[1].memsz = 0x13c0000
segment[2].buf   = 0x1115b440
segment[2].bufsz = 0x33d
segment[2].mem   = 0x11230000
segment[2].memsz = 0x10000
segment[3].buf   = 0x1115bb70
segment[3].bufsz = 0x5518
segment[3].mem   = 0x11240000
segment[3].memsz = 0x10000
segment[4].buf   = 0x11159ca0
segment[4].bufsz = 0x1000
segment[4].mem   = 0x2e7f0000
segment[4].memsz = 0x10000

Regards,
Bhupesh

>
>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec@lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 22:28                                                                       ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> >> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
>> >> to kexec at lists.infradead.org
>> >>
>> >> Also add linux-acpi list
>> >
>> > Thank you.
>> >
>> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> >> > <ard.biesheuvel@linaro.org> wrote:
>> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> >> > > <takahiro.akashi@linaro.org> wrote:
>> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> > >>> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> >> > Bhupesh, Ard,
>> >> > >>> >> >
>> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> > >>> >> >> Hi Ard, Akashi
>> >> > >>> >> >>
>> >> > >>> >> > (snip)
>> >> > >>> >> >
>> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> > >>> >> >> , for details)
>> >> > >>> >> >
>> >> > >>> >> > Right.
>> >> > >>> >> >
>> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> > >>> >> >> with the crashkernel memory range:
>> >> > >>> >> >>
>> >> > >>> >> >>                 /* add linux,usable-memory-range */
>> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> > >>> >> >>                                 address_cells, size_cells);
>> >> > >>> >> >>
>> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> > >>> >> >> , for details)
>> >> > >>> >> >>
>> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> > >>> >> >>
>> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> >> > >>> >> >>
>> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> > >>> >> >> -r`.img --reuse-cmdline -d
>> >> > >>> >> >>
>> >> > >>> >> >> [snip..]
>> >> > >>> >> >>
>> >> > >>> >> >> Reserved memory range
>> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> Coredump memory ranges
>> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >> > >>> >> >>
>> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >> > >>> >> >> {
>> >> > >>> >> >>         struct memblock_region reg = {
>> >> > >>> >> >>                 .size = 0,
>> >> > >>> >> >>         };
>> >> > >>> >> >>
>> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> > >>> >> >>
>> >> > >>> >> >>         if (reg.size)
>> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> > >>> >> >> comment this out */
>> >> > >>> >> >> }
>> >> > >>> >> >
>> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> >> > >>> >> > memory contents of the *crashed* kernel.
>> >> > >>> >> >
>> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >> > >>> >> >>
>> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> > >>> >> >> fail.
>> >> > >>> >> >>
>> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> > >>> >> >> dt node 'linux,usable-memory-range'
>> >> > >>> >> >
>> >> > >>> >> > I still don't understand why we need to carry over the information
>> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> > >>> >> > such regions are free to be reused by the kernel after some point of
>> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> >> > >>> >> memblock_reserve()'d now.
>> >> > >>> >
>> >> > >>> > For my better understandings, who is actually accessing such regions
>> >> > >>> > during boot time, uefi itself or efistub?
>> >> > >>> >
>> >> > >>>
>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >> > >>> instance, on QEMU we have
>> >> > >>>
>> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> > >>>   01000013)
>> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>
>> >> > >>> covered by
>> >> > >>>
>> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> > >>>  ...
>> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> > >>
>> >> > >> OK. I mistakenly understood those regions could be freed after exiting
>> >> > >> UEFI boot services.
>> >> > >>
>> >> > >>>
>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >> > >>> >> when booting the next kernel.
>> >> > >>> >
>> >> > >>> > not really.
>> >> > >>> >
>> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> > >>> >> > on crash dump kernel?)
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >> > >>> >> regions only revealed the bug, not created it (given that other
>> >> > >>> >> memblock_reserve regions may be affected as well)
>> >> > >>> >
>> >> > >>> > As whether we should honor such reserved regions over kexec'ing
>> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> >> > >>> > exposed to user space (via proc/iomem).
>> >> > >>> >
>> >> > >>>
>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >> > >>> as 'System RAM'. Do you think that could solve this?
>> >> > >>
>> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> > >> marking them under another name in /proc/iomem would also be good in order
>> >> > >> not to allocate them as part of crash kernel's memory.
>> >> > >>
>> >> > >
>> >> > > I agree. However, this may not be entirely trivial, since iterating
>> >> > > over the memblock_reserved table and creating iomem entries may result
>> >> > > in collisions.
>> >> >
>> >> > I found a method (using the patch I shared earlier in this thread) to mark these
>> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> >> > reserved regions.
>> >> >
>> >> > >> But I'm not still convinced that we should export them in useable-
>> >> > >> memory-range to crash dump kernel. They will be accessed through
>> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> > >> (or memblocks), I guess.
>> >> > >
>> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> >> > > which is exactly what we want in this case.
>> >> >
>> >> > Now this is what is confusing me. I don't see the above happening.
>> >> >
>> >> > I see that the primary kernel boots up and adds the ACPI regions via:
>> >> > acpi_os_ioremap
>> >> >     -> ioremap_cache
>> >> >
>> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> >> > variant.
>> >
>> > It is natural if that region is out of memblocks.
>>
>> Thanks for the confirmation. This was my understanding as well.
>>
>> >> > And it fails while accessing the ACPI tables:
>> >> >
>> >> > [    0.039205] ACPI: Core revision 20170728
>> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> >
>> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
>> > As ioremap() makes the mapping as "Device memory", unaligned memory
>> > access won't be allowed.
>> >
>> >> > [    0.100022] Modules linked in:
>> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> >> > pstate: 60000045
>> >> > [    0.132647] sp : ffff000008ccfb40
>> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> >> > [    0.223224] Call trace:
>> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> >> > ffff0000095e3980 ffff000008ccfbe0
>> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> >> > ffff000008ccfc50 0000000000000000
>> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> >> > 00000000ffffff76 0000000000000006
>> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> >> > 000000000000038e 0000000000000000
>> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> >> > 0000000000000005 000000000000001b
>> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> >> > ffff000009710027 0000000000000001
>> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> >> > 0000000000000000 ffff0000088be820
>> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> >> > ffff00000849b4f8 ffff000008ccfb40
>> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> >> > ffff000008ccfb40 ffff000008260a18
>> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> >> > ffff000008ccfb40 ffff0000084a6764
>> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> >> > [    0.404437] Rebooting in 10 seconds.
>> >> >
>> >> > So, I think the linear mapping done by the primary kernel does not
>> >> > make these accessible in the crash kernel directly.
>> >> >
>> >> > Any pointers?
>> >>
>> >> Can you get the code line number for acpi_ns_lookup+0x25c?
>> >
>> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
>> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
>> > accesses?
>> > (I didn't find out how unaligned accesses could happen there.)
>> >
>>
>> Right. Like I captured somewhere in this thread (perhaps the first
>> email on this subject),
>> this is indeed an unaligned address access.
>>
>> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
>> assigning this memory range
>> as device memory doesn't seem a neat solution as it means we are not
>> marking some thing with the right memory attribute and we can fall in
>> similar/related issues later.
>>
>> Regarding the later suggestion, what I am seeing now is that the acpi
>> table access functions are perhaps reused from the earlier x86
>> implementation, but on the arm64 (or even arm) arch we should not be
>> allowing unaligned accesses which might cause UNDEFINED behaviour and
>> resultant crash.
>>
>> So I can try going this approach and see if it works for me.
>>
>> However, I am still not very sure as to why the crashkernel ranges
>> historically do not include the System RAM regions (which may include
>> the ACPI regions as well). These regions are available for the kernel
>> usage and perhaps should be exported to the crashkernel as well.
>>
>> I am not fully aware of the previous discussions on capp'ing the
>> crashkernel memory being passed to the kdump kernel, but did we run
>> into any issues while doing so?
>>
>> Also, even if I extend the kexec-tools to modify the
>> linux,usable-memory-range and add the ACPI regions to it, the
>> crashkernel fails to boot with the below message (I have added some
>> logic to print the DTB on the crash kernel boot start):
>>
>> [    0.000000]     chosen {
>> [    0.000000]         linux,usable-memory-range
>> [    0.000000]  = <
>> [    0.000000] 0x00000000
>> [    0.000000] 0x0e800000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x20000000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x396c0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x000a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x39770000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00040000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x398a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00020000
>> [    0.000000] >
>> [    0.000000] ;
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>
> Please show me:
>  * "Virtual kernel memory layout" in dmesg
>  * /proc/iomem
>  * debug messages from kexec-tools (kexec -d)

So here are the changes which I have done so far in the kernel and
kexec-tools to allow mapping ACPI reclaim regions as identifiable
regions in '/proc/iomem' and to append them to the DTB property:
linux,usable-memory-range:

Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>,
and

<https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848>

kexec-tools patch:
<https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650>

Note that I am not very clear about the hole margins that the
kexec-tools adds (so that the crashkernel's expectation that the
kernel image and initrd lie within a 1G boundary), so I have not added
my temporary changes to the github code - but any suggestions on how
to correctly put them in place would be appreciated.

And here are the rest of the inputs you asked for:

(1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout"

[    0.000000] Kernel command line:
BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
[    0.000000] PCIe ASPM is disabled
[    0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB)
mapped at [        (ptrval)-        (ptrval)]
[    0.000000] Memory: 267251520K/268169216K available (7868K kernel
code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K
reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0x        (ptrval) - 0x        (ptrval)
(  7872 KB)
[    0.000000]     .rodata : 0x        (ptrval) - 0x        (ptrval)
(  3392 KB)
[    0.000000]       .init : 0x        (ptrval) - 0x        (ptrval)
(  1280 KB)
[    0.000000]       .data : 0x        (ptrval) - 0x        (ptrval)
(  1765 KB)
[    0.000000]        .bss : 0x        (ptrval) - 0x        (ptrval)
(  7728 KB)
[    0.000000]     fixed   : 0xffff7fdffe7b0000 - 0xffff7fdffec00000
(  4416 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7fe000000000 - 0xffff7fe02bff0000
(   703 MB actual)
[    0.000000]     memory  : 0xffff800000000000 - 0xffff80affc000000
(720832 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4
[    0.000000] ftrace: allocating 29903 entries in 8 pages
[    0.000000] Hierarchical RCU implementation.

(2) # cat /proc/iomem
00000000-3961ffff : System RAM
  00080000-00b7ffff : Kernel code
  00cc0000-0166ffff : Kernel data
  0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : ACPI reclaim region
39760000-3976ffff : reserved
39770000-397affff : ACPI reclaim region
397b0000-3989ffff : reserved
398a0000-398bffff : ACPI reclaim region
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM
3ed30000-3ed5ffff : reserved
3ed60000-3fbfffff : System RAM
40500000-40500fff : sbsa-gwdt.0
  40500000-40500fff : sbsa-gwdt.0
40600000-40600fff : sbsa-gwdt.0
  40600000-40600fff : sbsa-gwdt.0
60080000-6008ffff : HISI0152:00
602b0000-602b0fff : ARMH0011:00
  602b0000-602b0fff : ARMH0011:00
603c0000-603cffff : HISI0141:00
  603c0000-603cffff : HISI0141:00
a0080000-a008ffff : HISI0152:05
  a0080000-a008ffff : HISI0152:04
    a0080000-a008ffff : HISI0152:03
a00a0000-a00affff : pnp 00:01
a01b0000-a01b0fff : HISI0191:00
a2000000-a200ffff : HISI0162:01
  a2000000-a200ffff : HISI0162:01
a3000000-a300ffff : HISI0162:02
  a3000000-a300ffff : HISI0162:02
a7020000-a702ffff : PNP0D20:00
  a7020000-a702ffff : PNP0D20:00
b0000000-be7fffff : PCI Bus 0002:e8
  b0000000-b06fffff : PCI Bus 0002:e9
    b0000000-b00fffff : 0002:e9:00.0
      b0000000-b00fffff : igb
    b0100000-b01fffff : 0002:e9:00.0
    b0200000-b02fffff : 0002:e9:00.1
      b0200000-b02fffff : igb
    b0300000-b03fffff : 0002:e9:00.1
    b0400000-b04fffff : 0002:e9:00.2
      b0400000-b04fffff : igb
    b0500000-b05fffff : 0002:e9:00.3
      b0500000-b05fffff : igb
    b0600000-b0603fff : 0002:e9:00.0
      b0600000-b0603fff : igb
    b0604000-b0607fff : 0002:e9:00.1
      b0604000-b0607fff : igb
    b0608000-b060bfff : 0002:e9:00.2
      b0608000-b060bfff : igb
    b060c000-b060ffff : 0002:e9:00.3
      b060c000-b060ffff : igb
  b0700000-b0afffff : PCI Bus 0002:e9
    b0700000-b077ffff : 0002:e9:00.0
    b0780000-b07fffff : 0002:e9:00.0
    b0800000-b087ffff : 0002:e9:00.1
    b0880000-b08fffff : 0002:e9:00.1
    b0900000-b097ffff : 0002:e9:00.2
    b0980000-b09fffff : 0002:e9:00.2
    b0a00000-b0a7ffff : 0002:e9:00.3
    b0a80000-b0afffff : 0002:e9:00.3
  b0b00000-b0b0ffff : 0002:e8:00.0
be800000-beffffff : PCI ECAM
c0080000-c008ffff : HISI0152:02
  c0080000-c008ffff : HISI0152:01
c3000000-c300ffff : HISI0162:00
  c3000000-c300ffff : HISI0162:00
c5000000-c588ffff : HISI00B2:00
  c5000000-c588ffff : HISI00B2:00
c7000000-c705ffff : HISI00B2:00
  c7000000-c705ffff : HISI00B2:00
d0080000-d008ffff : HISI0152:07
  d0080000-d008ffff : HISI0152:06
d0100000-d010ffff : HISI02A1:00
  d0100000-d010ffff : HISI02A1:00
400000000-4007fffff : PCI ECAM
440000000-4ffffffff : PCI Bus 0005:00
  440000000-4407fffff : PCI Bus 0005:01
    440000000-4403fffff : 0005:01:00.0
    440400000-4407fffff : 0005:01:00.1
  440800000-4421fffff : PCI Bus 0005:01
    440800000-440bfffff : 0005:01:00.0
      440800000-440bfffff : ixgbe
    440c00000-440ffffff : 0005:01:00.1
      440c00000-440ffffff : ixgbe
    441000000-4413fffff : 0005:01:00.0
    441400000-4417fffff : 0005:01:00.0
    441800000-441bfffff : 0005:01:00.1
    441c00000-441ffffff : 0005:01:00.1
    442000000-442003fff : 0005:01:00.0
      442000000-442003fff : ixgbe
    442004000-442007fff : 0005:01:00.1
      442004000-442007fff : ixgbe
  442200000-442200fff : 0005:00:00.0
700090000-70009ffff : pnp 00:03
7000a0000-7000affff : pnp 00:05
7000b0000-7000bffff : pnp 00:06
700200000-70020ffff : pnp 00:04
740800000-740ffffff : PCI ECAM
741000000-77ffeffff : PCI Bus 0006:08
  741000000-74100ffff : 0006:08:00.0
784000000-7847fffff : PCI ECAM
784800000-7bffeffff : PCI Bus 0007:40
  784800000-7849fffff : PCI Bus 0007:41
    784800000-7849fffff : 0007:41:00.0
  786000000-787ffffff : PCI Bus 0007:41
    786000000-787ffffff : 0007:41:00.0
7c4800000-7c4ffffff : PCI ECAM
7c5000000-7fffeffff : PCI Bus 0004:48
  7c5000000-7c51fffff : PCI Bus 0004:49
    7c5000000-7c50fffff : 0004:49:00.0
    7c5100000-7c513ffff : 0004:49:00.0
      7c5100000-7c513ffff : mpt3sas
    7c5140000-7c514ffff : 0004:49:00.0
      7c5140000-7c514ffff : mpt3sas
  7c5200000-7c520ffff : 0004:48:00.0
1040000000-1ffbffffff : System RAM
2000000000-2ffbffffff : System RAM
9000000000-9ffbffffff : System RAM
a000000000-affbffffff : System RAM
400c0080000-400c008ffff : HISI0152:08
600a00a0000-600a00affff : pnp 00:08
64001000000-64001ffffff : PCI ECAM
65040000000-650ffffffff : PCI Bus 000a:10
  65040000000-6504000ffff : 000a:10:00.0
700a0090000-700a009ffff : pnp 00:0a
700a0200000-700a020ffff : pnp 00:0b
74002000000-74002ffffff : PCI ECAM
75040000000-750ffffffff : PCI Bus 000c:20
  75040000000-7504000ffff : 000c:20:00.0
78003000000-78003ffffff : PCI ECAM
79040000000-790ffffffff : PCI Bus 000d:30
  79040000000-79040000fff : 000d:30:00.0

(3)

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
-r`.img --reuse-cmdline -d
arch_process_options:149: command_line:
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img
arch_process_options:152: dtb: (null)
Try gzip decompression.
kernel: 0xffff968d0010 kernel_size: 0xdf9200
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
image_arm64_load: kernel_segment: 000000000e800000
image_arm64_load: text_offset:    0000000000080000
image_arm64_load: image_size:     00000000015f0000
image_arm64_load: phys_offset:    0000000000000000
image_arm64_load: vp_offset:      ffffffffffffffff
image_arm64_load: PE format:      yes
Reserved memory range
000000000e800000-000000002e7fffff (0)
Coredump memory ranges
0000000000000000-000000000e7fffff (0)
000000002e800000-000000003961ffff (0)
0000000039d40000-000000003ed2ffff (0)
000000003ed60000-000000003fbfffff (0)
0000001040000000-0000001ffbffffff (0)
0000002000000000-0000002ffbffffff (0)
0000009000000000-0000009ffbffffff (0)
000000a000000000-000000affbffffff (0)
ACPI reclaim memory ranges
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
crashkernel memory ranges
000000000e800000-000000002e7fffff (0)
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
kernel symbol _text vaddr = ffff000008080000
load_crashdump_segments: page_offset:   ffff800000000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr =
0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000
Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000
p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000
Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000
p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000
Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000
p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000
Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000
p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz =
0xfbc000000
Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000
p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000
p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000
p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000
p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000
Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000
p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000
Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000
p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000
load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff
read_1st_dtb: found /sys/firmware/fdt
get_cells_size: #address-cells:2 #size-cells:2
cells_size_fitted: 2e7f0000-2e7f0fff
cells_size_fitted: e800000-2e7fffff
cells_size_fitted: 396c0000-3975ffff
cells_size_fitted: 39770000-397affff
cells_size_fitted: 398a0000-398bffff
 / {
    #size-cells = <0x00000002>;
    #address-cells = <0x00000002>;
    chosen {
        linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000
0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000
0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000
0x00020000>;
        linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>;
        linux,uefi-mmap-desc-ver = <0x00000001>;
        linux,uefi-mmap-desc-size = <0x00000030>;
        linux,uefi-mmap-size = <0x00000e40>;
        linux,uefi-mmap-start = <0x00000000 0x30288018>;
        linux,uefi-system-table = <0x00000000 0x3ed50018>;
        bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200";
        linux,initrd-end = <0x00000000 0x2fbff9e0>;
        linux,initrd-start = <0x00000000 0x2e84d000>;
    };
 };
initrd: base fe70000, size 13b29e0h (20654560), end 112229e0

[snip..]

sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c
sym: sha256_starts value: 11240eb0 addr: 11240018
machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6
sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c
sym: sha256_update value: 11245158 addr: 11240034
machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449
sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc
sym: sha256_finish value: 11245164 addr: 11240050
machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445
sym:     memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34
sym: memcmp value: 11240634 addr: 11240060
machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240070
machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240078
machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240088
machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400a8
machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400b0
machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400c0
machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400d4
machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112453a8 addr: 112400f0
machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245338 addr: 112400f8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245358 addr: 11240100
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245368 addr: 11240108
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 1124536e addr: 11240110
machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245370 addr: 11240118
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 1124012c
machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106
sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4
sym: setup_arch value: 11240ea8 addr: 11240130
machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e
sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0
sym: verify_sha256_digest value: 11240000 addr: 11240134
machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3
sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4
sym: post_verification_setup_arch value: 11240ea4 addr: 11240144
machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245380 addr: 11240148
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 112401ac
machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240220
machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240478
machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245392 addr: 112404b8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 11240538
machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 112405c8
machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2
sym:  purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28
sym: purgatory value: 11240120 addr: 11240678
machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa
sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8
sym: arm64_kernel_entry value: 112454c8 addr: 1124067c
machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271
sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8
sym: arm64_dtb_addr value: 112454d0 addr: 11240680
machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 112450bc
machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245118
machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245130
machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 1124513c
machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112454d8 addr: 11245330
machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8
kexec_load: entry = 0x11240670 flags = 0xb70001
nr_segments = 5
segment[0].buf   = 0xffff968d0010
segment[0].bufsz = 0xdf9200
segment[0].mem   = 0xe880000
segment[0].memsz = 0x15f0000
segment[1].buf   = 0xffff950e0010
segment[1].bufsz = 0x13b29e0
segment[1].mem   = 0xfe70000
segment[1].memsz = 0x13c0000
segment[2].buf   = 0x1115b440
segment[2].bufsz = 0x33d
segment[2].mem   = 0x11230000
segment[2].memsz = 0x10000
segment[3].buf   = 0x1115bb70
segment[3].bufsz = 0x5518
segment[3].mem   = 0x11240000
segment[3].memsz = 0x10000
segment[4].buf   = 0x11159ca0
segment[4].bufsz = 0x1000
segment[4].mem   = 0x2e7f0000
segment[4].memsz = 0x10000

Regards,
Bhupesh

>
>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec at lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-18 22:28                                                                       ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma,
	Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, linux-efi, Mark Rutland,
	Matt Fleming

On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote:
>> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
>> >> to kexec@lists.infradead.org
>> >>
>> >> Also add linux-acpi list
>> >
>> > Thank you.
>> >
>> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
>> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
>> >> > <ard.biesheuvel@linaro.org> wrote:
>> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro
>> >> > > <takahiro.akashi@linaro.org> wrote:
>> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> > >>> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> > >>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > >>> >> > Bhupesh, Ard,
>> >> > >>> >> >
>> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> > >>> >> >> Hi Ard, Akashi
>> >> > >>> >> >>
>> >> > >>> >> > (snip)
>> >> > >>> >> >
>> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any
>> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory.
>> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> > >>> >> >> , for details)
>> >> > >>> >> >
>> >> > >>> >> > Right.
>> >> > >>> >> >
>> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> > >>> >> >> with the crashkernel memory range:
>> >> > >>> >> >>
>> >> > >>> >> >>                 /* add linux,usable-memory-range */
>> >> > >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> > >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> > >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> > >>> >> >>                                 address_cells, size_cells);
>> >> > >>> >> >>
>> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> > >>> >> >> , for details)
>> >> > >>> >> >>
>> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> > >>> >> >> they are marked as System RAM or as RESERVED. As,
>> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> > >>> >> >>
>> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> > >>> >> >> ACPI memory and crashes while trying to access the same:
>> >> > >>> >> >>
>> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> > >>> >> >> -r`.img --reuse-cmdline -d
>> >> > >>> >> >>
>> >> > >>> >> >> [snip..]
>> >> > >>> >> >>
>> >> > >>> >> >> Reserved memory range
>> >> > >>> >> >> 000000000e800000-000000002e7fffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> Coredump memory ranges
>> >> > >>> >> >> 0000000000000000-000000000e7fffff (0)
>> >> > >>> >> >> 000000002e800000-000000003961ffff (0)
>> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0)
>> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0)
>> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0)
>> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0)
>> >> > >>> >> >> 000000a000000000-000000affbffffff (0)
>> >> > >>> >> >>
>> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> > >>> >> >> memory cap'ing passed to the crash kernel inside
>> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below):
>> >> > >>> >> >>
>> >> > >>> >> >> static void __init fdt_enforce_memory_region(void)
>> >> > >>> >> >> {
>> >> > >>> >> >>         struct memblock_region reg = {
>> >> > >>> >> >>                 .size = 0,
>> >> > >>> >> >>         };
>> >> > >>> >> >>
>> >> > >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> > >>> >> >>
>> >> > >>> >> >>         if (reg.size)
>> >> > >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> > >>> >> >> comment this out */
>> >> > >>> >> >> }
>> >> > >>> >> >
>> >> > >>> >> > Please just don't do that. It can cause a fatal damage on
>> >> > >>> >> > memory contents of the *crashed* kernel.
>> >> > >>> >> >
>> >> > >>> >> >> 5). Both the above temporary solutions fix the problem.
>> >> > >>> >> >>
>> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> > >>> >> >> fail.
>> >> > >>> >> >>
>> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> > >>> >> >> dt node 'linux,usable-memory-range'
>> >> > >>> >> >
>> >> > >>> >> > I still don't understand why we need to carry over the information
>> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> > >>> >> > such regions are free to be reused by the kernel after some point of
>> >> > >>> >> > initialization. Why does crash dump kernel need to know about them?
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> > >>> >> kernel, those regions needs to be preserved, which is why they are
>> >> > >>> >> memblock_reserve()'d now.
>> >> > >>> >
>> >> > >>> > For my better understandings, who is actually accessing such regions
>> >> > >>> > during boot time, uefi itself or efistub?
>> >> > >>> >
>> >> > >>>
>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For
>> >> > >>> instance, on QEMU we have
>> >> > >>>
>> >> > >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> > >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> > >>>   01000013)
>> >> > >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> > >>> BXPC 00000001)
>> >> > >>>
>> >> > >>> covered by
>> >> > >>>
>> >> > >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> > >>>  ...
>> >> > >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> > >>
>> >> > >> OK. I mistakenly understood those regions could be freed after exiting
>> >> > >> UEFI boot services.
>> >> > >>
>> >> > >>>
>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table
>> >> > >>> >> when booting the next kernel.
>> >> > >>> >
>> >> > >>> > not really.
>> >> > >>> >
>> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> > >>> >> > on crash dump kernel?)
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim
>> >> > >>> >> regions only revealed the bug, not created it (given that other
>> >> > >>> >> memblock_reserve regions may be affected as well)
>> >> > >>> >
>> >> > >>> > As whether we should honor such reserved regions over kexec'ing
>> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one.
>> >> > >>> > As a matter of fact, no information about "reserved" memblocks is
>> >> > >>> > exposed to user space (via proc/iomem).
>> >> > >>> >
>> >> > >>>
>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them
>> >> > >>> as 'System RAM'. Do you think that could solve this?
>> >> > >>
>> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> > >> marking them under another name in /proc/iomem would also be good in order
>> >> > >> not to allocate them as part of crash kernel's memory.
>> >> > >>
>> >> > >
>> >> > > I agree. However, this may not be entirely trivial, since iterating
>> >> > > over the memblock_reserved table and creating iomem entries may result
>> >> > > in collisions.
>> >> >
>> >> > I found a method (using the patch I shared earlier in this thread) to mark these
>> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or
>> >> > reserved regions.
>> >> >
>> >> > >> But I'm not still convinced that we should export them in useable-
>> >> > >> memory-range to crash dump kernel. They will be accessed through
>> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram
>> >> > >> (or memblocks), I guess.
>> >> > >
>> >> > > Agreed. They will be covered by the linear mapping in the boot kernel,
>> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel,
>> >> > > which is exactly what we want in this case.
>> >> >
>> >> > Now this is what is confusing me. I don't see the above happening.
>> >> >
>> >> > I see that the primary kernel boots up and adds the ACPI regions via:
>> >> > acpi_os_ioremap
>> >> >     -> ioremap_cache
>> >> >
>> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls
>> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
>> >> > variant.
>> >
>> > It is natural if that region is out of memblocks.
>>
>> Thanks for the confirmation. This was my understanding as well.
>>
>> >> > And it fails while accessing the ACPI tables:
>> >> >
>> >> > [    0.039205] ACPI: Core revision 20170728
>> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
>> >> > [    0.095098] Internal error: Oops: 96000021 [#1] SMP
>> >
>> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened.
>> > As ioremap() makes the mapping as "Device memory", unaligned memory
>> > access won't be allowed.
>> >
>> >> > [    0.100022] Modules linked in:
>> >> > [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
>> >> > [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
>> >> > [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
>> >> > pstate: 60000045
>> >> > [    0.132647] sp : ffff000008ccfb40
>> >> > [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
>> >> > [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
>> >> > [    0.146718] x25: 000000000000001b x24: 0000000000000001
>> >> > [    0.152083] x23: 0000000000000001 x22: ffff000009710027
>> >> > [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
>> >> > [    0.162812] x19: 000000000000001b x18: 0000000000000005
>> >> > [    0.168176] x17: 0000000000000000 x16: 0000000000000000
>> >> > [    0.173541] x15: 0000000000000000 x14: 000000000000038e
>> >> > [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
>> >> > [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
>> >> > [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
>> >> > [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
>> >> > [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
>> >> > [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
>> >> > [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
>> >> > [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
>> >> > [    0.223224] Call trace:
>> >> > [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
>> >> > [    0.232194] fa00: 0000000000000000 ffff000009710027
>> >> > ffff0000095e3980 ffff000008ccfbe0
>> >> > [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
>> >> > ffff000008ccfc50 0000000000000000
>> >> > [    0.248018] fa40: ffff8000126d0140 000000000000005f
>> >> > 00000000ffffff76 0000000000000006
>> >> > [    0.255931] fa60: ffffffffffffffff ffffffff00000000
>> >> > 000000000000038e 0000000000000000
>> >> > [    0.263843] fa80: 0000000000000000 0000000000000000
>> >> > 0000000000000005 000000000000001b
>> >> > [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
>> >> > ffff000009710027 0000000000000001
>> >> > [    0.279667] fac0: 0000000000000001 000000000000001b
>> >> > 0000000000000000 ffff0000088be820
>> >> > [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
>> >> > ffff00000849b4f8 ffff000008ccfb40
>> >> > [    0.295491] fb00: ffff0000084a6764 0000000060000045
>> >> > ffff000008ccfb40 ffff000008260a18
>> >> > [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
>> >> > ffff000008ccfb40 ffff0000084a6764
>> >> > [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
>> >> > [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
>> >> > [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
>> >> > [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
>> >> > [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
>> >> > [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
>> >> > [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
>> >> > [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
>> >> > [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
>> >> > [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
>> >> > [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
>> >> > [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
>> >> > [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
>> >> > [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
>> >> > [    0.394500] ---[ end trace c46ed37f9651c58e ]---
>> >> > [    0.399160] Kernel panic - not syncing: Fatal exception
>> >> > [    0.404437] Rebooting in 10 seconds.
>> >> >
>> >> > So, I think the linear mapping done by the primary kernel does not
>> >> > make these accessible in the crash kernel directly.
>> >> >
>> >> > Any pointers?
>> >>
>> >> Can you get the code line number for acpi_ns_lookup+0x25c?
>> >
>> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or
>> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned
>> > accesses?
>> > (I didn't find out how unaligned accesses could happen there.)
>> >
>>
>> Right. Like I captured somewhere in this thread (perhaps the first
>> email on this subject),
>> this is indeed an unaligned address access.
>>
>> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding
>> assigning this memory range
>> as device memory doesn't seem a neat solution as it means we are not
>> marking some thing with the right memory attribute and we can fall in
>> similar/related issues later.
>>
>> Regarding the later suggestion, what I am seeing now is that the acpi
>> table access functions are perhaps reused from the earlier x86
>> implementation, but on the arm64 (or even arm) arch we should not be
>> allowing unaligned accesses which might cause UNDEFINED behaviour and
>> resultant crash.
>>
>> So I can try going this approach and see if it works for me.
>>
>> However, I am still not very sure as to why the crashkernel ranges
>> historically do not include the System RAM regions (which may include
>> the ACPI regions as well). These regions are available for the kernel
>> usage and perhaps should be exported to the crashkernel as well.
>>
>> I am not fully aware of the previous discussions on capp'ing the
>> crashkernel memory being passed to the kdump kernel, but did we run
>> into any issues while doing so?
>>
>> Also, even if I extend the kexec-tools to modify the
>> linux,usable-memory-range and add the ACPI regions to it, the
>> crashkernel fails to boot with the below message (I have added some
>> logic to print the DTB on the crash kernel boot start):
>>
>> [    0.000000]     chosen {
>> [    0.000000]         linux,usable-memory-range
>> [    0.000000]  = <
>> [    0.000000] 0x00000000
>> [    0.000000] 0x0e800000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x20000000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x396c0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x000a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x39770000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00040000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x398a0000
>> [    0.000000] 0x00000000
>> [    0.000000] 0x00020000
>> [    0.000000] >
>> [    0.000000] ;
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>
> Please show me:
>  * "Virtual kernel memory layout" in dmesg
>  * /proc/iomem
>  * debug messages from kexec-tools (kexec -d)

So here are the changes which I have done so far in the kernel and
kexec-tools to allow mapping ACPI reclaim regions as identifiable
regions in '/proc/iomem' and to append them to the DTB property:
linux,usable-memory-range:

Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>,
and

<https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848>

kexec-tools patch:
<https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650>

Note that I am not very clear about the hole margins that the
kexec-tools adds (so that the crashkernel's expectation that the
kernel image and initrd lie within a 1G boundary), so I have not added
my temporary changes to the github code - but any suggestions on how
to correctly put them in place would be appreciated.

And here are the rest of the inputs you asked for:

(1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout"

[    0.000000] Kernel command line:
BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
[    0.000000] PCIe ASPM is disabled
[    0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB)
mapped at [        (ptrval)-        (ptrval)]
[    0.000000] Memory: 267251520K/268169216K available (7868K kernel
code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K
reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0x        (ptrval) - 0x        (ptrval)
(  7872 KB)
[    0.000000]     .rodata : 0x        (ptrval) - 0x        (ptrval)
(  3392 KB)
[    0.000000]       .init : 0x        (ptrval) - 0x        (ptrval)
(  1280 KB)
[    0.000000]       .data : 0x        (ptrval) - 0x        (ptrval)
(  1765 KB)
[    0.000000]        .bss : 0x        (ptrval) - 0x        (ptrval)
(  7728 KB)
[    0.000000]     fixed   : 0xffff7fdffe7b0000 - 0xffff7fdffec00000
(  4416 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7fe000000000 - 0xffff7fe02bff0000
(   703 MB actual)
[    0.000000]     memory  : 0xffff800000000000 - 0xffff80affc000000
(720832 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4
[    0.000000] ftrace: allocating 29903 entries in 8 pages
[    0.000000] Hierarchical RCU implementation.

(2) # cat /proc/iomem
00000000-3961ffff : System RAM
  00080000-00b7ffff : Kernel code
  00cc0000-0166ffff : Kernel data
  0e800000-2e7fffff : Crash kernel
39620000-396bffff : reserved
396c0000-3975ffff : ACPI reclaim region
39760000-3976ffff : reserved
39770000-397affff : ACPI reclaim region
397b0000-3989ffff : reserved
398a0000-398bffff : ACPI reclaim region
398c0000-39d3ffff : reserved
39d40000-3ed2ffff : System RAM
3ed30000-3ed5ffff : reserved
3ed60000-3fbfffff : System RAM
40500000-40500fff : sbsa-gwdt.0
  40500000-40500fff : sbsa-gwdt.0
40600000-40600fff : sbsa-gwdt.0
  40600000-40600fff : sbsa-gwdt.0
60080000-6008ffff : HISI0152:00
602b0000-602b0fff : ARMH0011:00
  602b0000-602b0fff : ARMH0011:00
603c0000-603cffff : HISI0141:00
  603c0000-603cffff : HISI0141:00
a0080000-a008ffff : HISI0152:05
  a0080000-a008ffff : HISI0152:04
    a0080000-a008ffff : HISI0152:03
a00a0000-a00affff : pnp 00:01
a01b0000-a01b0fff : HISI0191:00
a2000000-a200ffff : HISI0162:01
  a2000000-a200ffff : HISI0162:01
a3000000-a300ffff : HISI0162:02
  a3000000-a300ffff : HISI0162:02
a7020000-a702ffff : PNP0D20:00
  a7020000-a702ffff : PNP0D20:00
b0000000-be7fffff : PCI Bus 0002:e8
  b0000000-b06fffff : PCI Bus 0002:e9
    b0000000-b00fffff : 0002:e9:00.0
      b0000000-b00fffff : igb
    b0100000-b01fffff : 0002:e9:00.0
    b0200000-b02fffff : 0002:e9:00.1
      b0200000-b02fffff : igb
    b0300000-b03fffff : 0002:e9:00.1
    b0400000-b04fffff : 0002:e9:00.2
      b0400000-b04fffff : igb
    b0500000-b05fffff : 0002:e9:00.3
      b0500000-b05fffff : igb
    b0600000-b0603fff : 0002:e9:00.0
      b0600000-b0603fff : igb
    b0604000-b0607fff : 0002:e9:00.1
      b0604000-b0607fff : igb
    b0608000-b060bfff : 0002:e9:00.2
      b0608000-b060bfff : igb
    b060c000-b060ffff : 0002:e9:00.3
      b060c000-b060ffff : igb
  b0700000-b0afffff : PCI Bus 0002:e9
    b0700000-b077ffff : 0002:e9:00.0
    b0780000-b07fffff : 0002:e9:00.0
    b0800000-b087ffff : 0002:e9:00.1
    b0880000-b08fffff : 0002:e9:00.1
    b0900000-b097ffff : 0002:e9:00.2
    b0980000-b09fffff : 0002:e9:00.2
    b0a00000-b0a7ffff : 0002:e9:00.3
    b0a80000-b0afffff : 0002:e9:00.3
  b0b00000-b0b0ffff : 0002:e8:00.0
be800000-beffffff : PCI ECAM
c0080000-c008ffff : HISI0152:02
  c0080000-c008ffff : HISI0152:01
c3000000-c300ffff : HISI0162:00
  c3000000-c300ffff : HISI0162:00
c5000000-c588ffff : HISI00B2:00
  c5000000-c588ffff : HISI00B2:00
c7000000-c705ffff : HISI00B2:00
  c7000000-c705ffff : HISI00B2:00
d0080000-d008ffff : HISI0152:07
  d0080000-d008ffff : HISI0152:06
d0100000-d010ffff : HISI02A1:00
  d0100000-d010ffff : HISI02A1:00
400000000-4007fffff : PCI ECAM
440000000-4ffffffff : PCI Bus 0005:00
  440000000-4407fffff : PCI Bus 0005:01
    440000000-4403fffff : 0005:01:00.0
    440400000-4407fffff : 0005:01:00.1
  440800000-4421fffff : PCI Bus 0005:01
    440800000-440bfffff : 0005:01:00.0
      440800000-440bfffff : ixgbe
    440c00000-440ffffff : 0005:01:00.1
      440c00000-440ffffff : ixgbe
    441000000-4413fffff : 0005:01:00.0
    441400000-4417fffff : 0005:01:00.0
    441800000-441bfffff : 0005:01:00.1
    441c00000-441ffffff : 0005:01:00.1
    442000000-442003fff : 0005:01:00.0
      442000000-442003fff : ixgbe
    442004000-442007fff : 0005:01:00.1
      442004000-442007fff : ixgbe
  442200000-442200fff : 0005:00:00.0
700090000-70009ffff : pnp 00:03
7000a0000-7000affff : pnp 00:05
7000b0000-7000bffff : pnp 00:06
700200000-70020ffff : pnp 00:04
740800000-740ffffff : PCI ECAM
741000000-77ffeffff : PCI Bus 0006:08
  741000000-74100ffff : 0006:08:00.0
784000000-7847fffff : PCI ECAM
784800000-7bffeffff : PCI Bus 0007:40
  784800000-7849fffff : PCI Bus 0007:41
    784800000-7849fffff : 0007:41:00.0
  786000000-787ffffff : PCI Bus 0007:41
    786000000-787ffffff : 0007:41:00.0
7c4800000-7c4ffffff : PCI ECAM
7c5000000-7fffeffff : PCI Bus 0004:48
  7c5000000-7c51fffff : PCI Bus 0004:49
    7c5000000-7c50fffff : 0004:49:00.0
    7c5100000-7c513ffff : 0004:49:00.0
      7c5100000-7c513ffff : mpt3sas
    7c5140000-7c514ffff : 0004:49:00.0
      7c5140000-7c514ffff : mpt3sas
  7c5200000-7c520ffff : 0004:48:00.0
1040000000-1ffbffffff : System RAM
2000000000-2ffbffffff : System RAM
9000000000-9ffbffffff : System RAM
a000000000-affbffffff : System RAM
400c0080000-400c008ffff : HISI0152:08
600a00a0000-600a00affff : pnp 00:08
64001000000-64001ffffff : PCI ECAM
65040000000-650ffffffff : PCI Bus 000a:10
  65040000000-6504000ffff : 000a:10:00.0
700a0090000-700a009ffff : pnp 00:0a
700a0200000-700a020ffff : pnp 00:0b
74002000000-74002ffffff : PCI ECAM
75040000000-750ffffffff : PCI Bus 000c:20
  75040000000-7504000ffff : 000c:20:00.0
78003000000-78003ffffff : PCI ECAM
79040000000-790ffffffff : PCI Bus 000d:30
  79040000000-79040000fff : 000d:30:00.0

(3)

# kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
-r`.img --reuse-cmdline -d
arch_process_options:149: command_line:
root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200
arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img
arch_process_options:152: dtb: (null)
Try gzip decompression.
kernel: 0xffff968d0010 kernel_size: 0xdf9200
get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM
get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved
get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved
get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved
get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved
get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM
get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved
get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM
get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM
get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM
get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM
elf_arm64_probe: Not an ELF executable.
image_arm64_load: kernel_segment: 000000000e800000
image_arm64_load: text_offset:    0000000000080000
image_arm64_load: image_size:     00000000015f0000
image_arm64_load: phys_offset:    0000000000000000
image_arm64_load: vp_offset:      ffffffffffffffff
image_arm64_load: PE format:      yes
Reserved memory range
000000000e800000-000000002e7fffff (0)
Coredump memory ranges
0000000000000000-000000000e7fffff (0)
000000002e800000-000000003961ffff (0)
0000000039d40000-000000003ed2ffff (0)
000000003ed60000-000000003fbfffff (0)
0000001040000000-0000001ffbffffff (0)
0000002000000000-0000002ffbffffff (0)
0000009000000000-0000009ffbffffff (0)
000000a000000000-000000affbffffff (0)
ACPI reclaim memory ranges
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
crashkernel memory ranges
000000000e800000-000000002e7fffff (0)
00000000396c0000-000000003975ffff (0)
0000000039770000-00000000397affff (0)
00000000398a0000-00000000398bffff (0)
kernel symbol _text vaddr = ffff000008080000
load_crashdump_segments: page_offset:   ffff800000000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr =
0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000
Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000
p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000
Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000
p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000
Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000
p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000
Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000
p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz =
0xfbc000000
Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000
p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000
p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000
p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz =
0xffc000000
get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424
Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424
Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424
Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424
Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200
p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8
vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr =
0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024
Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr =
0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz =
0x15f0000
Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000
p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000
Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000
p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000
Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000
p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000
load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff
read_1st_dtb: found /sys/firmware/fdt
get_cells_size: #address-cells:2 #size-cells:2
cells_size_fitted: 2e7f0000-2e7f0fff
cells_size_fitted: e800000-2e7fffff
cells_size_fitted: 396c0000-3975ffff
cells_size_fitted: 39770000-397affff
cells_size_fitted: 398a0000-398bffff
 / {
    #size-cells = <0x00000002>;
    #address-cells = <0x00000002>;
    chosen {
        linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000
0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000
0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000
0x00020000>;
        linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>;
        linux,uefi-mmap-desc-ver = <0x00000001>;
        linux,uefi-mmap-desc-size = <0x00000030>;
        linux,uefi-mmap-size = <0x00000e40>;
        linux,uefi-mmap-start = <0x00000000 0x30288018>;
        linux,uefi-system-table = <0x00000000 0x3ed50018>;
        bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro
earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1
pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root
rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force
console=ttyAMA0,115200";
        linux,initrd-end = <0x00000000 0x2fbff9e0>;
        linux,initrd-start = <0x00000000 0x2e84d000>;
    };
 };
initrd: base fe70000, size 13b29e0h (20654560), end 112229e0

[snip..]

sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c
sym: sha256_starts value: 11240eb0 addr: 11240018
machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6
sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c
sym: sha256_update value: 11245158 addr: 11240034
machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449
sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc
sym: sha256_finish value: 11245164 addr: 11240050
machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445
sym:     memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34
sym: memcmp value: 11240634 addr: 11240060
machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240070
machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240078
machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 11240088
machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400a8
machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400b0
machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400c0
machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 112400d4
machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112453a8 addr: 112400f0
machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245338 addr: 112400f8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245358 addr: 11240100
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245368 addr: 11240108
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 1124536e addr: 11240110
machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245370 addr: 11240118
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370
sym:     printf info: 12 other: 00 shndx: 1 value: 544 size: 90
sym: printf value: 11240544 addr: 1124012c
machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106
sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4
sym: setup_arch value: 11240ea8 addr: 11240130
machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e
sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0
sym: verify_sha256_digest value: 11240000 addr: 11240134
machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3
sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4
sym: post_verification_setup_arch value: 11240ea4 addr: 11240144
machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245380 addr: 11240148
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 112401ac
machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240220
machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320
sym:    putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4
sym: putchar value: 11240ea0 addr: 11240478
machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a
sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0
sym: .rodata.str1.1 value: 11245392 addr: 112404b8
machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 11240538
machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06
sym:   vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364
sym: vsprintf value: 11240150 addr: 112405c8
machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2
sym:  purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28
sym: purgatory value: 11240120 addr: 11240678
machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa
sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8
sym: arm64_kernel_entry value: 112454c8 addr: 1124067c
machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271
sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8
sym: arm64_dtb_addr value: 112454d0 addr: 11240680
machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 112450bc
machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245118
machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f
sym:     memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20
sym: memcpy value: 11240614 addr: 11245130
machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39
sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134
sym: sha256_process value: 11240f1c addr: 1124513c
machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78
sym:      .data info: 03 other: 00 shndx: 4 value: 0 size: 0
sym: .data value: 112454d8 addr: 11245330
machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8
kexec_load: entry = 0x11240670 flags = 0xb70001
nr_segments = 5
segment[0].buf   = 0xffff968d0010
segment[0].bufsz = 0xdf9200
segment[0].mem   = 0xe880000
segment[0].memsz = 0x15f0000
segment[1].buf   = 0xffff950e0010
segment[1].bufsz = 0x13b29e0
segment[1].mem   = 0xfe70000
segment[1].memsz = 0x13c0000
segment[2].buf   = 0x1115b440
segment[2].bufsz = 0x33d
segment[2].mem   = 0x11230000
segment[2].memsz = 0x10000
segment[3].buf   = 0x1115bb70
segment[3].bufsz = 0x5518
segment[3].mem   = 0x11240000
segment[3].memsz = 0x10000
segment[4].buf   = 0x11159ca0
segment[4].bufsz = 0x1000
segment[4].mem   = 0x2e7f0000
segment[4].memsz = 0x10000

Regards,
Bhupesh

>
>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec@lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18  8:59                                                                 ` Bhupesh SHARMA
  (?)
  (?)
@ 2017-12-19  5:01                                                                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:01 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...

This is an odd message coming from:
|void __init arm64_memblock_init(void)
|...
|
|                if (WARN(base < memblock_start_of_DRAM() ||
|                         base + size > memblock_start_of_DRAM() +
|                                       linear_region_size,
|                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {

Can you confirm how the condition breaks here?
I suppose
    base: 0xfe70000
    size: 0x13c0000
    memblock_start_of_DRAM(): 0xe800000
according to the information you gave me.

Thanks,
-Takahiro AKASHI


> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.
> 
> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org
> >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  5:01                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:01 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi,
	linux-kernel, linux-arm-kernel, James Morse, linux-efi,
	Mark Rutland, Matt Fleming

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...

This is an odd message coming from:
|void __init arm64_memblock_init(void)
|...
|
|                if (WARN(base < memblock_start_of_DRAM() ||
|                         base + size > memblock_start_of_DRAM() +
|                                       linear_region_size,
|                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {

Can you confirm how the condition breaks here?
I suppose
    base: 0xfe70000
    size: 0x13c0000
    memblock_start_of_DRAM(): 0xe800000
according to the information you gave me.

Thanks,
-Takahiro AKASHI


> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.
> 
> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec@lists.fedoraproject.org
> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  5:01                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...

This is an odd message coming from:
|void __init arm64_memblock_init(void)
|...
|
|                if (WARN(base < memblock_start_of_DRAM() ||
|                         base + size > memblock_start_of_DRAM() +
|                                       linear_region_size,
|                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {

Can you confirm how the condition breaks here?
I suppose
    base: 0xfe70000
    size: 0x13c0000
    memblock_start_of_DRAM(): 0xe800000
according to the information you gave me.

Thanks,
-Takahiro AKASHI


> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.
> 
> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec at lists.fedoraproject.org
> >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  5:01                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:01 UTC (permalink / raw)
  To: Bhupesh SHARMA
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming,
	Bhupesh Sharma, kexec, linux-kernel, linux-acpi, James Morse,
	Dave Young, linux-arm-kernel

On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
> 
> [snip..]
> 
> [    0.000000] linux,usable-memory-range base e800000, size 20000000
> [    0.000000]  - e800000 ,  20000000
> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
> [    0.000000]  - 396c0000 ,  a0000
> [    0.000000] linux,usable-memory-range base 39770000, size 40000
> [    0.000000]  - 39770000 ,  40000
> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
> [    0.000000]  - 398a0000 ,  20000
> [    0.000000] initrd not fully accessible via the linear mapping --
> please check your bootloader ...

This is an odd message coming from:
|void __init arm64_memblock_init(void)
|...
|
|                if (WARN(base < memblock_start_of_DRAM() ||
|                         base + size > memblock_start_of_DRAM() +
|                                       linear_region_size,
|                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {

Can you confirm how the condition breaks here?
I suppose
    base: 0xfe70000
    size: 0x13c0000
    memblock_start_of_DRAM(): 0xe800000
according to the information you gave me.

Thanks,
-Takahiro AKASHI


> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
> arm64_memblock_init+0x210/0x484
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
> pstate: 600000c5
> [    0.000000] sp : ffff000008ccfe80
> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
> [    0.000000] Call trace:
> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
> [    0.000000] fd40: 0000000000000056 0000000000000000
> 0000000000000000 0000000000000000
> [    0.000000] fd60: 0000000000000001 ffff000008c96360
> 000000000000000d 746f6f622072756f
> [    0.000000] fd80: ffff000008517414 00000000000000f4
> 2065687420616976 6d207261656e696c
> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
> 79206b6365686320 000000002be00842
> [    0.000000] fdc0: ffff000008d05580 0000000000000000
> 000000000c283806 ffff000008afa000
> [    0.000000] fde0: ffff000008080000 ffff000008afa000
> ffff000009680000 ffff000008ec0000
> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
> 00000000013b0000 0000000011230000
> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
> ffff000008b76984 ffff000008ccfe80
> [    0.000000] fe40: ffff000008b76984 00000000600000c5
> ffff00000959b7a8 ffff000008ec0000
> [    0.000000] fe60: ffffffffffffffff 0000000000000005
> ffff000008ccfe80 ffff000008b76984
> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x50/0x6c with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
> [    0.000000] cma: Failed to reserve 512 MiB
> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
> 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
> ------------   4.14.0+ #7
> [    0.000000] Call trace:
> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
> allocate 0x0000000000010000 bytes below 0x0000000000000000.
> [    0.000000]
> 
> I guess it is because of the 1G alignment requirement between the
> kernel image and the initrd and how we populate the holes between the
> kernel image, segments (including dtb) and the initrd from the
> kexec-tools.
> 
> Akashi, any pointers on this will be helpful as well.
> 
> Regards,
> Bhupesh
> 
> 
> >> >
> >> > Regards,
> >> > Bhupesh
> >> >
> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> > >> via a kernel command line parameter, "memmap=".
> >> > >>
> >> > _______________________________________________
> >> > kexec mailing list -- kexec@lists.fedoraproject.org
> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18 21:28                                                               ` Bhupesh Sharma
  (?)
  (?)
@ 2017-12-19  5:25                                                                 ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:25 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Dave Young, Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote:
> Hi Dave,
> 
> On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> > to kexec@lists.infradead.org
> >
> > Also add linux-acpi list
> > On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >> > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> <takahiro.akashi@linaro.org> wrote:
> >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> >> <takahiro.akashi@linaro.org> wrote:
> >> >>> >> > Bhupesh, Ard,
> >> >>> >> >
> >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> >> >> Hi Ard, Akashi
> >> >>> >> >>
> >> >>> >> > (snip)
> >> >>> >> >
> >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> >> >> , for details)
> >> >>> >> >
> >> >>> >> > Right.
> >> >>> >> >
> >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> >> >> with the crashkernel memory range:
> >> >>> >> >>
> >> >>> >> >>                 /* add linux,usable-memory-range */
> >> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> >> >>                                 address_cells, size_cells);
> >> >>> >> >>
> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> >> >> , for details)
> >> >>> >> >>
> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> >> >>
> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> >> >> ACPI memory and crashes while trying to access the same:
> >> >>> >> >>
> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> >> >> -r`.img --reuse-cmdline -d
> >> >>> >> >>
> >> >>> >> >> [snip..]
> >> >>> >> >>
> >> >>> >> >> Reserved memory range
> >> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> >> >>
> >> >>> >> >> Coredump memory ranges
> >> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> >> >> 000000002e800000-000000003961ffff (0)
> >> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> >> >> 000000a000000000-000000affbffffff (0)
> >> >>> >> >>
> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> >> >> memory cap'ing passed to the crash kernel inside
> >> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> >> >>
> >> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> >> >> {
> >> >>> >> >>         struct memblock_region reg = {
> >> >>> >> >>                 .size = 0,
> >> >>> >> >>         };
> >> >>> >> >>
> >> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> >> >>
> >> >>> >> >>         if (reg.size)
> >> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> >> >> comment this out */
> >> >>> >> >> }
> >> >>> >> >
> >> >>> >> > Please just don't do that. It can cause a fatal damage on
> >> >>> >> > memory contents of the *crashed* kernel.
> >> >>> >> >
> >> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> >> >>
> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> >> >> fail.
> >> >>> >> >>
> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> >> >> dt node 'linux,usable-memory-range'
> >> >>> >> >
> >> >>> >> > I still don't understand why we need to carry over the information
> >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> >> > such regions are free to be reused by the kernel after some point of
> >> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> >> >
> >> >>> >>
> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> >> kernel, those regions needs to be preserved, which is why they are
> >> >>> >> memblock_reserve()'d now.
> >> >>> >
> >> >>> > For my better understandings, who is actually accessing such regions
> >> >>> > during boot time, uefi itself or efistub?
> >> >>> >
> >> >>>
> >> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> instance, on QEMU we have
> >> >>>
> >> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>>   01000013)
> >> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> BXPC 00000001)
> >> >>>
> >> >>> covered by
> >> >>>
> >> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>>  ...
> >> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>
> >> >> OK. I mistakenly understood those regions could be freed after exiting
> >> >> UEFI boot services.
> >> >>
> >> >>>
> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> >> when booting the next kernel.
> >> >>> >
> >> >>> > not really.
> >> >>> >
> >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> >> > on crash dump kernel?)
> >> >>> >> >
> >> >>> >>
> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> >> regions only revealed the bug, not created it (given that other
> >> >>> >> memblock_reserve regions may be affected as well)
> >> >>> >
> >> >>> > As whether we should honor such reserved regions over kexec'ing
> >> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > exposed to user space (via proc/iomem).
> >> >>> >
> >> >>>
> >> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> as 'System RAM'. Do you think that could solve this?
> >> >>
> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> marking them under another name in /proc/iomem would also be good in order
> >> >> not to allocate them as part of crash kernel's memory.
> >> >>
> >> >
> >> > I agree. However, this may not be entirely trivial, since iterating
> >> > over the memblock_reserved table and creating iomem entries may result
> >> > in collisions.
> >>
> >> I found a method (using the patch I shared earlier in this thread) to mark these
> >> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> reserved regions.
> >>
> >> >> But I'm not still convinced that we should export them in useable-
> >> >> memory-range to crash dump kernel. They will be accessed through
> >> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> (or memblocks), I guess.
> >> >
> >> > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > which is exactly what we want in this case.
> >>
> >> Now this is what is confusing me. I don't see the above happening.
> >>
> >> I see that the primary kernel boots up and adds the ACPI regions via:
> >> acpi_os_ioremap
> >>     -> ioremap_cache
> >>
> >> But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> variant.
> >>
> >> And it fails while accessing the ACPI tables:
> >>
> >> [    0.039205] ACPI: Core revision 20170728
> >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >> [    0.100022] Modules linked in:
> >> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> pstate: 60000045
> >> [    0.132647] sp : ffff000008ccfb40
> >> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> [    0.223224] Call trace:
> >> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> ffff0000095e3980 ffff000008ccfbe0
> >> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> ffff000008ccfc50 0000000000000000
> >> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> 00000000ffffff76 0000000000000006
> >> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> 000000000000038e 0000000000000000
> >> [    0.263843] fa80: 0000000000000000 0000000000000000
> >> 0000000000000005 000000000000001b
> >> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> ffff000009710027 0000000000000001
> >> [    0.279667] fac0: 0000000000000001 000000000000001b
> >> 0000000000000000 ffff0000088be820
> >> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> ffff00000849b4f8 ffff000008ccfb40
> >> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> ffff000008ccfb40 ffff000008260a18
> >> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> ffff000008ccfb40 ffff0000084a6764
> >> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> [    0.399160] Kernel panic - not syncing: Fatal exception
> >> [    0.404437] Rebooting in 10 seconds.
> >>
> >> So, I think the linear mapping done by the primary kernel does not
> >> make these accessible in the crash kernel directly.
> >>
> >> Any pointers?
> >
> > Can you get the code line number for acpi_ns_lookup+0x25c?
> 
> gdb points to the following code line number:
> 
> (gdb) list *(acpi_ns_lookup+0x25c)
> 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
> 572                }
> 573            }
> 574
> 575            /* Extract one ACPI name from the front of the pathname */
> 576
> 577            ACPI_MOVE_32_TO_32(&simple_name, path);
> 578
> 579            /* Try to find the single (4 character) ACPI name */
> 580
> 581            status =
> (gdb)
> 
> i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

This macro can be defined in two ways depending on
ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h.
So, in principle, any use of ioremap() in acpi_os_ioremap() may be
in conflict with those definitions here.

This suggests that, under the current code base, we must expose
ACPI reclaim regions as memblocks (i.e. via usable-memory-range)
in order to avoid the reported issue.

Thanks,
-Takahiro AKASHI

> addr2line also confirms the same:
> 
> # addr2line -e  vmlinux ffff0000084aa250
> /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577
> 
> 
> Regards,
> Bhupesh
> 
> 
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> via a kernel command line parameter, "memmap=".
> >> >>
> >> _______________________________________________
> >> kexec mailing list -- kexec@lists.fedoraproject.org
> >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  5:25                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:25 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Dave Young, Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi,
	Mark Rutland, Matt Fleming

On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote:
> Hi Dave,
> 
> On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> > to kexec@lists.infradead.org
> >
> > Also add linux-acpi list
> > On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >> > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> <takahiro.akashi@linaro.org> wrote:
> >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> >> <takahiro.akashi@linaro.org> wrote:
> >> >>> >> > Bhupesh, Ard,
> >> >>> >> >
> >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> >> >> Hi Ard, Akashi
> >> >>> >> >>
> >> >>> >> > (snip)
> >> >>> >> >
> >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> >> >> , for details)
> >> >>> >> >
> >> >>> >> > Right.
> >> >>> >> >
> >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> >> >> with the crashkernel memory range:
> >> >>> >> >>
> >> >>> >> >>                 /* add linux,usable-memory-range */
> >> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> >> >>                                 address_cells, size_cells);
> >> >>> >> >>
> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> >> >> , for details)
> >> >>> >> >>
> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> >> >>
> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> >> >> ACPI memory and crashes while trying to access the same:
> >> >>> >> >>
> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> >> >> -r`.img --reuse-cmdline -d
> >> >>> >> >>
> >> >>> >> >> [snip..]
> >> >>> >> >>
> >> >>> >> >> Reserved memory range
> >> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> >> >>
> >> >>> >> >> Coredump memory ranges
> >> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> >> >> 000000002e800000-000000003961ffff (0)
> >> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> >> >> 000000a000000000-000000affbffffff (0)
> >> >>> >> >>
> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> >> >> memory cap'ing passed to the crash kernel inside
> >> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> >> >>
> >> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> >> >> {
> >> >>> >> >>         struct memblock_region reg = {
> >> >>> >> >>                 .size = 0,
> >> >>> >> >>         };
> >> >>> >> >>
> >> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> >> >>
> >> >>> >> >>         if (reg.size)
> >> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> >> >> comment this out */
> >> >>> >> >> }
> >> >>> >> >
> >> >>> >> > Please just don't do that. It can cause a fatal damage on
> >> >>> >> > memory contents of the *crashed* kernel.
> >> >>> >> >
> >> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> >> >>
> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> >> >> fail.
> >> >>> >> >>
> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> >> >> dt node 'linux,usable-memory-range'
> >> >>> >> >
> >> >>> >> > I still don't understand why we need to carry over the information
> >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> >> > such regions are free to be reused by the kernel after some point of
> >> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> >> >
> >> >>> >>
> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> >> kernel, those regions needs to be preserved, which is why they are
> >> >>> >> memblock_reserve()'d now.
> >> >>> >
> >> >>> > For my better understandings, who is actually accessing such regions
> >> >>> > during boot time, uefi itself or efistub?
> >> >>> >
> >> >>>
> >> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> instance, on QEMU we have
> >> >>>
> >> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>>   01000013)
> >> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> BXPC 00000001)
> >> >>>
> >> >>> covered by
> >> >>>
> >> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>>  ...
> >> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>
> >> >> OK. I mistakenly understood those regions could be freed after exiting
> >> >> UEFI boot services.
> >> >>
> >> >>>
> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> >> when booting the next kernel.
> >> >>> >
> >> >>> > not really.
> >> >>> >
> >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> >> > on crash dump kernel?)
> >> >>> >> >
> >> >>> >>
> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> >> regions only revealed the bug, not created it (given that other
> >> >>> >> memblock_reserve regions may be affected as well)
> >> >>> >
> >> >>> > As whether we should honor such reserved regions over kexec'ing
> >> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > exposed to user space (via proc/iomem).
> >> >>> >
> >> >>>
> >> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> as 'System RAM'. Do you think that could solve this?
> >> >>
> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> marking them under another name in /proc/iomem would also be good in order
> >> >> not to allocate them as part of crash kernel's memory.
> >> >>
> >> >
> >> > I agree. However, this may not be entirely trivial, since iterating
> >> > over the memblock_reserved table and creating iomem entries may result
> >> > in collisions.
> >>
> >> I found a method (using the patch I shared earlier in this thread) to mark these
> >> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> reserved regions.
> >>
> >> >> But I'm not still convinced that we should export them in useable-
> >> >> memory-range to crash dump kernel. They will be accessed through
> >> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> (or memblocks), I guess.
> >> >
> >> > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > which is exactly what we want in this case.
> >>
> >> Now this is what is confusing me. I don't see the above happening.
> >>
> >> I see that the primary kernel boots up and adds the ACPI regions via:
> >> acpi_os_ioremap
> >>     -> ioremap_cache
> >>
> >> But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> variant.
> >>
> >> And it fails while accessing the ACPI tables:
> >>
> >> [    0.039205] ACPI: Core revision 20170728
> >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >> [    0.100022] Modules linked in:
> >> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> pstate: 60000045
> >> [    0.132647] sp : ffff000008ccfb40
> >> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> [    0.223224] Call trace:
> >> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> ffff0000095e3980 ffff000008ccfbe0
> >> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> ffff000008ccfc50 0000000000000000
> >> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> 00000000ffffff76 0000000000000006
> >> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> 000000000000038e 0000000000000000
> >> [    0.263843] fa80: 0000000000000000 0000000000000000
> >> 0000000000000005 000000000000001b
> >> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> ffff000009710027 0000000000000001
> >> [    0.279667] fac0: 0000000000000001 000000000000001b
> >> 0000000000000000 ffff0000088be820
> >> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> ffff00000849b4f8 ffff000008ccfb40
> >> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> ffff000008ccfb40 ffff000008260a18
> >> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> ffff000008ccfb40 ffff0000084a6764
> >> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> [    0.399160] Kernel panic - not syncing: Fatal exception
> >> [    0.404437] Rebooting in 10 seconds.
> >>
> >> So, I think the linear mapping done by the primary kernel does not
> >> make these accessible in the crash kernel directly.
> >>
> >> Any pointers?
> >
> > Can you get the code line number for acpi_ns_lookup+0x25c?
> 
> gdb points to the following code line number:
> 
> (gdb) list *(acpi_ns_lookup+0x25c)
> 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
> 572                }
> 573            }
> 574
> 575            /* Extract one ACPI name from the front of the pathname */
> 576
> 577            ACPI_MOVE_32_TO_32(&simple_name, path);
> 578
> 579            /* Try to find the single (4 character) ACPI name */
> 580
> 581            status =
> (gdb)
> 
> i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

This macro can be defined in two ways depending on
ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h.
So, in principle, any use of ioremap() in acpi_os_ioremap() may be
in conflict with those definitions here.

This suggests that, under the current code base, we must expose
ACPI reclaim regions as memblocks (i.e. via usable-memory-range)
in order to avoid the reported issue.

Thanks,
-Takahiro AKASHI

> addr2line also confirms the same:
> 
> # addr2line -e  vmlinux ffff0000084aa250
> /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577
> 
> 
> Regards,
> Bhupesh
> 
> 
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> via a kernel command line parameter, "memmap=".
> >> >>
> >> _______________________________________________
> >> kexec mailing list -- kexec@lists.fedoraproject.org
> >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  5:25                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote:
> Hi Dave,
> 
> On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> > kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it
> > to kexec at lists.infradead.org
> >
> > Also add linux-acpi list
> > On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >> > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> <takahiro.akashi@linaro.org> wrote:
> >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> >> <takahiro.akashi@linaro.org> wrote:
> >> >>> >> > Bhupesh, Ard,
> >> >>> >> >
> >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> >> >> Hi Ard, Akashi
> >> >>> >> >>
> >> >>> >> > (snip)
> >> >>> >> >
> >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> >> >> , for details)
> >> >>> >> >
> >> >>> >> > Right.
> >> >>> >> >
> >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> >> >> with the crashkernel memory range:
> >> >>> >> >>
> >> >>> >> >>                 /* add linux,usable-memory-range */
> >> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> >> >>                                 address_cells, size_cells);
> >> >>> >> >>
> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> >> >> , for details)
> >> >>> >> >>
> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> >> >>
> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> >> >> ACPI memory and crashes while trying to access the same:
> >> >>> >> >>
> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> >> >> -r`.img --reuse-cmdline -d
> >> >>> >> >>
> >> >>> >> >> [snip..]
> >> >>> >> >>
> >> >>> >> >> Reserved memory range
> >> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> >> >>
> >> >>> >> >> Coredump memory ranges
> >> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> >> >> 000000002e800000-000000003961ffff (0)
> >> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> >> >> 000000a000000000-000000affbffffff (0)
> >> >>> >> >>
> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> >> >> memory cap'ing passed to the crash kernel inside
> >> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> >> >>
> >> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> >> >> {
> >> >>> >> >>         struct memblock_region reg = {
> >> >>> >> >>                 .size = 0,
> >> >>> >> >>         };
> >> >>> >> >>
> >> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> >> >>
> >> >>> >> >>         if (reg.size)
> >> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> >> >> comment this out */
> >> >>> >> >> }
> >> >>> >> >
> >> >>> >> > Please just don't do that. It can cause a fatal damage on
> >> >>> >> > memory contents of the *crashed* kernel.
> >> >>> >> >
> >> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> >> >>
> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> >> >> fail.
> >> >>> >> >>
> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> >> >> dt node 'linux,usable-memory-range'
> >> >>> >> >
> >> >>> >> > I still don't understand why we need to carry over the information
> >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> >> > such regions are free to be reused by the kernel after some point of
> >> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> >> >
> >> >>> >>
> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> >> kernel, those regions needs to be preserved, which is why they are
> >> >>> >> memblock_reserve()'d now.
> >> >>> >
> >> >>> > For my better understandings, who is actually accessing such regions
> >> >>> > during boot time, uefi itself or efistub?
> >> >>> >
> >> >>>
> >> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> instance, on QEMU we have
> >> >>>
> >> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>>   01000013)
> >> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> BXPC 00000001)
> >> >>>
> >> >>> covered by
> >> >>>
> >> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>>  ...
> >> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>
> >> >> OK. I mistakenly understood those regions could be freed after exiting
> >> >> UEFI boot services.
> >> >>
> >> >>>
> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> >> when booting the next kernel.
> >> >>> >
> >> >>> > not really.
> >> >>> >
> >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> >> > on crash dump kernel?)
> >> >>> >> >
> >> >>> >>
> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> >> regions only revealed the bug, not created it (given that other
> >> >>> >> memblock_reserve regions may be affected as well)
> >> >>> >
> >> >>> > As whether we should honor such reserved regions over kexec'ing
> >> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > exposed to user space (via proc/iomem).
> >> >>> >
> >> >>>
> >> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> as 'System RAM'. Do you think that could solve this?
> >> >>
> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> marking them under another name in /proc/iomem would also be good in order
> >> >> not to allocate them as part of crash kernel's memory.
> >> >>
> >> >
> >> > I agree. However, this may not be entirely trivial, since iterating
> >> > over the memblock_reserved table and creating iomem entries may result
> >> > in collisions.
> >>
> >> I found a method (using the patch I shared earlier in this thread) to mark these
> >> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> reserved regions.
> >>
> >> >> But I'm not still convinced that we should export them in useable-
> >> >> memory-range to crash dump kernel. They will be accessed through
> >> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> (or memblocks), I guess.
> >> >
> >> > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > which is exactly what we want in this case.
> >>
> >> Now this is what is confusing me. I don't see the above happening.
> >>
> >> I see that the primary kernel boots up and adds the ACPI regions via:
> >> acpi_os_ioremap
> >>     -> ioremap_cache
> >>
> >> But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> variant.
> >>
> >> And it fails while accessing the ACPI tables:
> >>
> >> [    0.039205] ACPI: Core revision 20170728
> >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >> [    0.100022] Modules linked in:
> >> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> pstate: 60000045
> >> [    0.132647] sp : ffff000008ccfb40
> >> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> [    0.223224] Call trace:
> >> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> ffff0000095e3980 ffff000008ccfbe0
> >> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> ffff000008ccfc50 0000000000000000
> >> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> 00000000ffffff76 0000000000000006
> >> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> 000000000000038e 0000000000000000
> >> [    0.263843] fa80: 0000000000000000 0000000000000000
> >> 0000000000000005 000000000000001b
> >> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> ffff000009710027 0000000000000001
> >> [    0.279667] fac0: 0000000000000001 000000000000001b
> >> 0000000000000000 ffff0000088be820
> >> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> ffff00000849b4f8 ffff000008ccfb40
> >> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> ffff000008ccfb40 ffff000008260a18
> >> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> ffff000008ccfb40 ffff0000084a6764
> >> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> [    0.399160] Kernel panic - not syncing: Fatal exception
> >> [    0.404437] Rebooting in 10 seconds.
> >>
> >> So, I think the linear mapping done by the primary kernel does not
> >> make these accessible in the crash kernel directly.
> >>
> >> Any pointers?
> >
> > Can you get the code line number for acpi_ns_lookup+0x25c?
> 
> gdb points to the following code line number:
> 
> (gdb) list *(acpi_ns_lookup+0x25c)
> 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
> 572                }
> 573            }
> 574
> 575            /* Extract one ACPI name from the front of the pathname */
> 576
> 577            ACPI_MOVE_32_TO_32(&simple_name, path);
> 578
> 579            /* Try to find the single (4 character) ACPI name */
> 580
> 581            status =
> (gdb)
> 
> i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

This macro can be defined in two ways depending on
ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h.
So, in principle, any use of ioremap() in acpi_os_ioremap() may be
in conflict with those definitions here.

This suggests that, under the current code base, we must expose
ACPI reclaim regions as memblocks (i.e. via usable-memory-range)
in order to avoid the reported issue.

Thanks,
-Takahiro AKASHI

> addr2line also confirms the same:
> 
> # addr2line -e  vmlinux ffff0000084aa250
> /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577
> 
> 
> Regards,
> Bhupesh
> 
> 
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> via a kernel command line parameter, "memmap=".
> >> >>
> >> _______________________________________________
> >> kexec mailing list -- kexec at lists.fedoraproject.org
> >> To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  5:25                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  5:25 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	linux-kernel, linux-acpi, James Morse, Bhupesh SHARMA,
	Dave Young, linux-arm-kernel

On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote:
> Hi Dave,
> 
> On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote:
> > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it
> > to kexec@lists.infradead.org
> >
> > Also add linux-acpi list
> > On 12/18/17 at 02:31am, Bhupesh Sharma wrote:
> >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >> > On 15 December 2017 at 09:59, AKASHI Takahiro
> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> <takahiro.akashi@linaro.org> wrote:
> >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> >> <takahiro.akashi@linaro.org> wrote:
> >> >>> >> > Bhupesh, Ard,
> >> >>> >> >
> >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> >> >> Hi Ard, Akashi
> >> >>> >> >>
> >> >>> >> > (snip)
> >> >>> >> >
> >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> >> >> , for details)
> >> >>> >> >
> >> >>> >> > Right.
> >> >>> >> >
> >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> >> >> with the crashkernel memory range:
> >> >>> >> >>
> >> >>> >> >>                 /* add linux,usable-memory-range */
> >> >>> >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> >> >>                                 address_cells, size_cells);
> >> >>> >> >>
> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> >> >> , for details)
> >> >>> >> >>
> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> >> >>
> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> >> >> ACPI memory and crashes while trying to access the same:
> >> >>> >> >>
> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> >> >> -r`.img --reuse-cmdline -d
> >> >>> >> >>
> >> >>> >> >> [snip..]
> >> >>> >> >>
> >> >>> >> >> Reserved memory range
> >> >>> >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> >> >>
> >> >>> >> >> Coredump memory ranges
> >> >>> >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> >> >> 000000002e800000-000000003961ffff (0)
> >> >>> >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> >> >> 000000a000000000-000000affbffffff (0)
> >> >>> >> >>
> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> >> >> memory cap'ing passed to the crash kernel inside
> >> >>> >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> >> >>
> >> >>> >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> >> >> {
> >> >>> >> >>         struct memblock_region reg = {
> >> >>> >> >>                 .size = 0,
> >> >>> >> >>         };
> >> >>> >> >>
> >> >>> >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> >> >>
> >> >>> >> >>         if (reg.size)
> >> >>> >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> >> >> comment this out */
> >> >>> >> >> }
> >> >>> >> >
> >> >>> >> > Please just don't do that. It can cause a fatal damage on
> >> >>> >> > memory contents of the *crashed* kernel.
> >> >>> >> >
> >> >>> >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> >> >>
> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> >> >> fail.
> >> >>> >> >>
> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> >> >> dt node 'linux,usable-memory-range'
> >> >>> >> >
> >> >>> >> > I still don't understand why we need to carry over the information
> >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> >> > such regions are free to be reused by the kernel after some point of
> >> >>> >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> >> >
> >> >>> >>
> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> >> kernel, those regions needs to be preserved, which is why they are
> >> >>> >> memblock_reserve()'d now.
> >> >>> >
> >> >>> > For my better understandings, who is actually accessing such regions
> >> >>> > during boot time, uefi itself or efistub?
> >> >>> >
> >> >>>
> >> >>> No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> instance, on QEMU we have
> >> >>>
> >> >>>  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>>  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>>   01000013)
> >> >>>  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> BXPC 00000001)
> >> >>>  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> BXPC 00000001)
> >> >>>
> >> >>> covered by
> >> >>>
> >> >>>  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>>  ...
> >> >>>  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>
> >> >> OK. I mistakenly understood those regions could be freed after exiting
> >> >> UEFI boot services.
> >> >>
> >> >>>
> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> >> when booting the next kernel.
> >> >>> >
> >> >>> > not really.
> >> >>> >
> >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> >> > on crash dump kernel?)
> >> >>> >> >
> >> >>> >>
> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> >> regions only revealed the bug, not created it (given that other
> >> >>> >> memblock_reserve regions may be affected as well)
> >> >>> >
> >> >>> > As whether we should honor such reserved regions over kexec'ing
> >> >>> > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > exposed to user space (via proc/iomem).
> >> >>> >
> >> >>>
> >> >>> That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> as 'System RAM'. Do you think that could solve this?
> >> >>
> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> marking them under another name in /proc/iomem would also be good in order
> >> >> not to allocate them as part of crash kernel's memory.
> >> >>
> >> >
> >> > I agree. However, this may not be entirely trivial, since iterating
> >> > over the memblock_reserved table and creating iomem entries may result
> >> > in collisions.
> >>
> >> I found a method (using the patch I shared earlier in this thread) to mark these
> >> entries as 'ACPI reclaim memory' ranges rather than System RAM or
> >> reserved regions.
> >>
> >> >> But I'm not still convinced that we should export them in useable-
> >> >> memory-range to crash dump kernel. They will be accessed through
> >> >> acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> (or memblocks), I guess.
> >> >
> >> > Agreed. They will be covered by the linear mapping in the boot kernel,
> >> > and be mapped explicitly via ioremap_cache() in the kexec kernel,
> >> > which is exactly what we want in this case.
> >>
> >> Now this is what is confusing me. I don't see the above happening.
> >>
> >> I see that the primary kernel boots up and adds the ACPI regions via:
> >> acpi_os_ioremap
> >>     -> ioremap_cache
> >>
> >> But during the crashkernel boot, ''acpi_os_ioremap' calls
> >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache
> >> variant.
> >>
> >> And it fails while accessing the ACPI tables:
> >>
> >> [    0.039205] ACPI: Core revision 20170728
> >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707
> >> [    0.095098] Internal error: Oops: 96000021 [#1] SMP
> >> [    0.100022] Modules linked in:
> >> [    0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> >> [    0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000
> >> [    0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0
> >> [    0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>]
> >> pstate: 60000045
> >> [    0.132647] sp : ffff000008ccfb40
> >> [    0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4
> >> [    0.141354] x27: ffff0000088be820 x26: 0000000000000000
> >> [    0.146718] x25: 000000000000001b x24: 0000000000000001
> >> [    0.152083] x23: 0000000000000001 x22: ffff000009710027
> >> [    0.157447] x21: ffff000008ccfc50 x20: 0000000000000001
> >> [    0.162812] x19: 000000000000001b x18: 0000000000000005
> >> [    0.168176] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.173541] x15: 0000000000000000 x14: 000000000000038e
> >> [    0.178905] x13: ffffffff00000000 x12: ffffffffffffffff
> >> [    0.184270] x11: 0000000000000006 x10: 00000000ffffff76
> >> [    0.189634] x9 : 000000000000005f x8 : ffff8000126d0140
> >> [    0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50
> >> [    0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001
> >> [    0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980
> >> [    0.211091] x1 : ffff000009710027 x0 : 0000000000000000
> >> [    0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000)
> >> [    0.223224] Call trace:
> >> [    0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40)
> >> [    0.232194] fa00: 0000000000000000 ffff000009710027
> >> ffff0000095e3980 ffff000008ccfbe0
> >> [    0.240106] fa20: 0000000000000001 ffff80000fe62c00
> >> ffff000008ccfc50 0000000000000000
> >> [    0.248018] fa40: ffff8000126d0140 000000000000005f
> >> 00000000ffffff76 0000000000000006
> >> [    0.255931] fa60: ffffffffffffffff ffffffff00000000
> >> 000000000000038e 0000000000000000
> >> [    0.263843] fa80: 0000000000000000 0000000000000000
> >> 0000000000000005 000000000000001b
> >> [    0.271754] faa0: 0000000000000001 ffff000008ccfc50
> >> ffff000009710027 0000000000000001
> >> [    0.279667] fac0: 0000000000000001 000000000000001b
> >> 0000000000000000 ffff0000088be820
> >> [    0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40
> >> ffff00000849b4f8 ffff000008ccfb40
> >> [    0.295491] fb00: ffff0000084a6764 0000000060000045
> >> ffff000008ccfb40 ffff000008260a18
> >> [    0.303403] fb20: ffffffffffffffff ffff0000087f3fb0
> >> ffff000008ccfb40 ffff0000084a6764
> >> [    0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0
> >> [    0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294
> >> [    0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198
> >> [    0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270
> >> [    0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8
> >> [    0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8
> >> [    0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184
> >> [    0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68
> >> [    0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc
> >> [    0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264
> >> [    0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0
> >> [    0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0
> >> [    0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c
> >> [    0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> >> [    0.394500] ---[ end trace c46ed37f9651c58e ]---
> >> [    0.399160] Kernel panic - not syncing: Fatal exception
> >> [    0.404437] Rebooting in 10 seconds.
> >>
> >> So, I think the linear mapping done by the primary kernel does not
> >> make these accessible in the crash kernel directly.
> >>
> >> Any pointers?
> >
> > Can you get the code line number for acpi_ns_lookup+0x25c?
> 
> gdb points to the following code line number:
> 
> (gdb) list *(acpi_ns_lookup+0x25c)
> 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577).
> 572                }
> 573            }
> 574
> 575            /* Extract one ACPI name from the front of the pathname */
> 576
> 577            ACPI_MOVE_32_TO_32(&simple_name, path);
> 578
> 579            /* Try to find the single (4 character) ACPI name */
> 580
> 581            status =
> (gdb)
> 
> i.e. ACPI_MOVE_32_TO_32(&simple_name, path);

This macro can be defined in two ways depending on
ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h.
So, in principle, any use of ioremap() in acpi_os_ioremap() may be
in conflict with those definitions here.

This suggests that, under the current code base, we must expose
ACPI reclaim regions as memblocks (i.e. via usable-memory-range)
in order to avoid the reported issue.

Thanks,
-Takahiro AKASHI

> addr2line also confirms the same:
> 
> # addr2line -e  vmlinux ffff0000084aa250
> /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577
> 
> 
> Regards,
> Bhupesh
> 
> 
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> via a kernel command line parameter, "memmap=".
> >> >>
> >> _______________________________________________
> >> kexec mailing list -- kexec@lists.fedoraproject.org
> >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-18  5:40                                                       ` Dave Young
@ 2017-12-19  6:09                                                           ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  6:09 UTC (permalink / raw)
  To: Dave Young
  Cc: Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > > >> > Bhupesh, Ard,
> > > >> >
> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > > >> >> Hi Ard, Akashi
> > > >> >>
> > > >> > (snip)
> > > >> >
> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > > >> >> identify its own usable memory and exclude, at its boot time, any
> > > >> >> other memory areas that are part of the panicked kernel's memory.
> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > > >> >> , for details)
> > > >> >
> > > >> > Right.
> > > >> >
> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > > >> >> with the crashkernel memory range:
> > > >> >>
> > > >> >>                 /* add linux,usable-memory-range */
> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > > >> >>                                 address_cells, size_cells);
> > > >> >>
> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > > >> >> , for details)
> > > >> >>
> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > > >> >> they are marked as System RAM or as RESERVED. As,
> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > > >> >>
> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > > >> >> ACPI memory and crashes while trying to access the same:
> > > >> >>
> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > > >> >> -r`.img --reuse-cmdline -d
> > > >> >>
> > > >> >> [snip..]
> > > >> >>
> > > >> >> Reserved memory range
> > > >> >> 000000000e800000-000000002e7fffff (0)
> > > >> >>
> > > >> >> Coredump memory ranges
> > > >> >> 0000000000000000-000000000e7fffff (0)
> > > >> >> 000000002e800000-000000003961ffff (0)
> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> > > >> >> 000000003ed60000-000000003fbfffff (0)
> > > >> >> 0000001040000000-0000001ffbffffff (0)
> > > >> >> 0000002000000000-0000002ffbffffff (0)
> > > >> >> 0000009000000000-0000009ffbffffff (0)
> > > >> >> 000000a000000000-000000affbffffff (0)
> > > >> >>
> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > > >> >> memory cap'ing passed to the crash kernel inside
> > > >> >> 'arch/arm64/mm/init.c' (see below):
> > > >> >>
> > > >> >> static void __init fdt_enforce_memory_region(void)
> > > >> >> {
> > > >> >>         struct memblock_region reg = {
> > > >> >>                 .size = 0,
> > > >> >>         };
> > > >> >>
> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > >> >>
> > > >> >>         if (reg.size)
> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > > >> >> comment this out */
> > > >> >> }
> > > >> >
> > > >> > Please just don't do that. It can cause a fatal damage on
> > > >> > memory contents of the *crashed* kernel.
> > > >> >
> > > >> >> 5). Both the above temporary solutions fix the problem.
> > > >> >>
> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > > >> >> fail.
> > > >> >>
> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > > >> >> dt node 'linux,usable-memory-range'
> > > >> >
> > > >> > I still don't understand why we need to carry over the information
> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > > >> > such regions are free to be reused by the kernel after some point of
> > > >> > initialization. Why does crash dump kernel need to know about them?
> > > >> >
> > > >>
> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> > > >> kernel, those regions needs to be preserved, which is why they are
> > > >> memblock_reserve()'d now.
> > > >
> > > > For my better understandings, who is actually accessing such regions
> > > > during boot time, uefi itself or efistub?
> > > >
> > > 
> > > No, only the kernel. This is where the ACPI tables are stored. For
> > > instance, on QEMU we have
> > > 
> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >   01000013)
> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > > BXPC 00000001)
> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > > BXPC 00000001)
> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > > BXPC 00000001)
> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > > BXPC 00000001)
> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > > BXPC 00000001)
> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > > BXPC 00000001)
> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > > BXPC 00000001)
> > > 
> > > covered by
> > > 
> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >  ...
> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > 
> > OK. I mistakenly understood those regions could be freed after exiting
> > UEFI boot services.
> > 
> > > 
> > > >> So it seems that kexec does not honour the memblock_reserve() table
> > > >> when booting the next kernel.
> > > >
> > > > not really.
> > > >
> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> > > >> > on crash dump kernel?)
> > > >> >
> > > >>
> > > >> I don't think so. And the change to the handling of ACPI reclaim
> > > >> regions only revealed the bug, not created it (given that other
> > > >> memblock_reserve regions may be affected as well)
> > > >
> > > > As whether we should honor such reserved regions over kexec'ing
> > > > depends on each one's specific nature, we will have to take care one-by-one.
> > > > As a matter of fact, no information about "reserved" memblocks is
> > > > exposed to user space (via proc/iomem).
> > > >
> > > 
> > > That is why I suggested (somewhere in this thread?) to not expose them
> > > as 'System RAM'. Do you think that could solve this?
> > 
> > Memblock-reserv'ing them is necessary to prevent their corruption and
> > marking them under another name in /proc/iomem would also be good in order
> > not to allocate them as part of crash kernel's memory.
> > 
> > But I'm not still convinced that we should export them in useable-
> > memory-range to crash dump kernel. They will be accessed through
> > acpi_os_map_memory() and so won't be required to be part of system ram
> > (or memblocks), I guess.
> > 	-> Bhupesh?
> 
> I forgot how arm64 kernel retrieve the memory ranges and initialize
> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> the memory according to the efi memmap?  For kdump kernel anything other
> than usable memory (which is from the dt node instead) should be
> reinitialized according to efi passed info, no?

All the regions exported in efi memmap will be added to memblock.memory
in (u)efi_init() and then trimmed down to the exact range specified as
usable-memory-range by fdt_enforce_memory_region().

Now I noticed that the current fdt_enforce_memory_region() may not work well
with multiple entries in usable-memory-range.

> > 
> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > via a kernel command line parameter, "memmap=".
> 
> memmap= is only used in old kexec-tools, now we are passing them via
> e820 table.

Thanks. I remember that you have explained it before.

-Takahiro AKASHI

> [snip]
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19  6:09                                                           ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-19  6:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> > > <takahiro.akashi@linaro.org> wrote:
> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> > > >> <takahiro.akashi@linaro.org> wrote:
> > > >> > Bhupesh, Ard,
> > > >> >
> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> > > >> >> Hi Ard, Akashi
> > > >> >>
> > > >> > (snip)
> > > >> >
> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> > > >> >> identify its own usable memory and exclude, at its boot time, any
> > > >> >> other memory areas that are part of the panicked kernel's memory.
> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> > > >> >> , for details)
> > > >> >
> > > >> > Right.
> > > >> >
> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> > > >> >> with the crashkernel memory range:
> > > >> >>
> > > >> >>                 /* add linux,usable-memory-range */
> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> > > >> >>                                 address_cells, size_cells);
> > > >> >>
> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> > > >> >> , for details)
> > > >> >>
> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> > > >> >> they are marked as System RAM or as RESERVED. As,
> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> > > >> >>
> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> > > >> >> ACPI memory and crashes while trying to access the same:
> > > >> >>
> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > > >> >> -r`.img --reuse-cmdline -d
> > > >> >>
> > > >> >> [snip..]
> > > >> >>
> > > >> >> Reserved memory range
> > > >> >> 000000000e800000-000000002e7fffff (0)
> > > >> >>
> > > >> >> Coredump memory ranges
> > > >> >> 0000000000000000-000000000e7fffff (0)
> > > >> >> 000000002e800000-000000003961ffff (0)
> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> > > >> >> 000000003ed60000-000000003fbfffff (0)
> > > >> >> 0000001040000000-0000001ffbffffff (0)
> > > >> >> 0000002000000000-0000002ffbffffff (0)
> > > >> >> 0000009000000000-0000009ffbffffff (0)
> > > >> >> 000000a000000000-000000affbffffff (0)
> > > >> >>
> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> > > >> >> memory cap'ing passed to the crash kernel inside
> > > >> >> 'arch/arm64/mm/init.c' (see below):
> > > >> >>
> > > >> >> static void __init fdt_enforce_memory_region(void)
> > > >> >> {
> > > >> >>         struct memblock_region reg = {
> > > >> >>                 .size = 0,
> > > >> >>         };
> > > >> >>
> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> > > >> >>
> > > >> >>         if (reg.size)
> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> > > >> >> comment this out */
> > > >> >> }
> > > >> >
> > > >> > Please just don't do that. It can cause a fatal damage on
> > > >> > memory contents of the *crashed* kernel.
> > > >> >
> > > >> >> 5). Both the above temporary solutions fix the problem.
> > > >> >>
> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> > > >> >> fail.
> > > >> >>
> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> > > >> >> dt node 'linux,usable-memory-range'
> > > >> >
> > > >> > I still don't understand why we need to carry over the information
> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> > > >> > such regions are free to be reused by the kernel after some point of
> > > >> > initialization. Why does crash dump kernel need to know about them?
> > > >> >
> > > >>
> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> > > >> kernel, those regions needs to be preserved, which is why they are
> > > >> memblock_reserve()'d now.
> > > >
> > > > For my better understandings, who is actually accessing such regions
> > > > during boot time, uefi itself or efistub?
> > > >
> > > 
> > > No, only the kernel. This is where the ACPI tables are stored. For
> > > instance, on QEMU we have
> > > 
> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> > >   01000013)
> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> > > BXPC 00000001)
> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> > > BXPC 00000001)
> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> > > BXPC 00000001)
> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> > > BXPC 00000001)
> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> > > BXPC 00000001)
> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> > > BXPC 00000001)
> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> > > BXPC 00000001)
> > > 
> > > covered by
> > > 
> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> > >  ...
> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> > 
> > OK. I mistakenly understood those regions could be freed after exiting
> > UEFI boot services.
> > 
> > > 
> > > >> So it seems that kexec does not honour the memblock_reserve() table
> > > >> when booting the next kernel.
> > > >
> > > > not really.
> > > >
> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> > > >> > on crash dump kernel?)
> > > >> >
> > > >>
> > > >> I don't think so. And the change to the handling of ACPI reclaim
> > > >> regions only revealed the bug, not created it (given that other
> > > >> memblock_reserve regions may be affected as well)
> > > >
> > > > As whether we should honor such reserved regions over kexec'ing
> > > > depends on each one's specific nature, we will have to take care one-by-one.
> > > > As a matter of fact, no information about "reserved" memblocks is
> > > > exposed to user space (via proc/iomem).
> > > >
> > > 
> > > That is why I suggested (somewhere in this thread?) to not expose them
> > > as 'System RAM'. Do you think that could solve this?
> > 
> > Memblock-reserv'ing them is necessary to prevent their corruption and
> > marking them under another name in /proc/iomem would also be good in order
> > not to allocate them as part of crash kernel's memory.
> > 
> > But I'm not still convinced that we should export them in useable-
> > memory-range to crash dump kernel. They will be accessed through
> > acpi_os_map_memory() and so won't be required to be part of system ram
> > (or memblocks), I guess.
> > 	-> Bhupesh?
> 
> I forgot how arm64 kernel retrieve the memory ranges and initialize
> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> the memory according to the efi memmap?  For kdump kernel anything other
> than usable memory (which is from the dt node instead) should be
> reinitialized according to efi passed info, no?

All the regions exported in efi memmap will be added to memblock.memory
in (u)efi_init() and then trimmed down to the exact range specified as
usable-memory-range by fdt_enforce_memory_region().

Now I noticed that the current fdt_enforce_memory_region() may not work well
with multiple entries in usable-memory-range.

> > 
> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> > via a kernel command line parameter, "memmap=".
> 
> memmap= is only used in old kexec-tools, now we are passing them via
> e820 table.

Thanks. I remember that you have explained it before.

-Takahiro AKASHI

> [snip]
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-19  6:09                                                           ` AKASHI Takahiro
@ 2017-12-19 13:09                                                               ` Ard Biesheuvel
  -1 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-19 13:09 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Ard Biesheuvel, Bhupesh Sharma,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On 19 December 2017 at 07:09, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > > >> > Bhupesh, Ard,
>> > > >> >
>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> > > >> >> Hi Ard, Akashi
>> > > >> >>
>> > > >> > (snip)
>> > > >> >
>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> > > >> >> , for details)
>> > > >> >
>> > > >> > Right.
>> > > >> >
>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> > > >> >> with the crashkernel memory range:
>> > > >> >>
>> > > >> >>                 /* add linux,usable-memory-range */
>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> > > >> >>                                 address_cells, size_cells);
>> > > >> >>
>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> > > >> >> , for details)
>> > > >> >>
>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> > > >> >>
>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> > > >> >> ACPI memory and crashes while trying to access the same:
>> > > >> >>
>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> > > >> >> -r`.img --reuse-cmdline -d
>> > > >> >>
>> > > >> >> [snip..]
>> > > >> >>
>> > > >> >> Reserved memory range
>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> > > >> >>
>> > > >> >> Coredump memory ranges
>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> > > >> >> 000000002e800000-000000003961ffff (0)
>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> > > >> >> 000000a000000000-000000affbffffff (0)
>> > > >> >>
>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> > > >> >> memory cap'ing passed to the crash kernel inside
>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> > > >> >>
>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> > > >> >> {
>> > > >> >>         struct memblock_region reg = {
>> > > >> >>                 .size = 0,
>> > > >> >>         };
>> > > >> >>
>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> > > >> >>
>> > > >> >>         if (reg.size)
>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> > > >> >> comment this out */
>> > > >> >> }
>> > > >> >
>> > > >> > Please just don't do that. It can cause a fatal damage on
>> > > >> > memory contents of the *crashed* kernel.
>> > > >> >
>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> > > >> >>
>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> > > >> >> fail.
>> > > >> >>
>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> > > >> >> dt node 'linux,usable-memory-range'
>> > > >> >
>> > > >> > I still don't understand why we need to carry over the information
>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > > >> > such regions are free to be reused by the kernel after some point of
>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> > > >> >
>> > > >>
>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> > > >> kernel, those regions needs to be preserved, which is why they are
>> > > >> memblock_reserve()'d now.
>> > > >
>> > > > For my better understandings, who is actually accessing such regions
>> > > > during boot time, uefi itself or efistub?
>> > > >
>> > >
>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> > > instance, on QEMU we have
>> > >
>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> > >   01000013)
>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> > > BXPC 00000001)
>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> > > BXPC 00000001)
>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> > > BXPC 00000001)
>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> > > BXPC 00000001)
>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> > > BXPC 00000001)
>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> > > BXPC 00000001)
>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> > > BXPC 00000001)
>> > >
>> > > covered by
>> > >
>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> > >  ...
>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >
>> > OK. I mistakenly understood those regions could be freed after exiting
>> > UEFI boot services.
>> >
>> > >
>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> > > >> when booting the next kernel.
>> > > >
>> > > > not really.
>> > > >
>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> > > >> > on crash dump kernel?)
>> > > >> >
>> > > >>
>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> > > >> regions only revealed the bug, not created it (given that other
>> > > >> memblock_reserve regions may be affected as well)
>> > > >
>> > > > As whether we should honor such reserved regions over kexec'ing
>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> > > > As a matter of fact, no information about "reserved" memblocks is
>> > > > exposed to user space (via proc/iomem).
>> > > >
>> > >
>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> > > as 'System RAM'. Do you think that could solve this?
>> >
>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> > marking them under another name in /proc/iomem would also be good in order
>> > not to allocate them as part of crash kernel's memory.
>> >
>> > But I'm not still convinced that we should export them in useable-
>> > memory-range to crash dump kernel. They will be accessed through
>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> > (or memblocks), I guess.
>> >     -> Bhupesh?
>>
>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> the memory according to the efi memmap?  For kdump kernel anything other
>> than usable memory (which is from the dt node instead) should be
>> reinitialized according to efi passed info, no?
>
> All the regions exported in efi memmap will be added to memblock.memory
> in (u)efi_init() and then trimmed down to the exact range specified as
> usable-memory-range by fdt_enforce_memory_region().
>
> Now I noticed that the current fdt_enforce_memory_region() may not work well
> with multiple entries in usable-memory-range.
>

In any case, the root of the problem is that memory regions lose their
'memory' annotation due to the way the memory map is mangled before
being supplied to the kexec kernel.

Would it be possible to classify all memory that we want to hide from
the kexec kernel as NOMAP instead? That way, it will not be mapped
implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
so this seems to be the most appropriate way to deal with the host
kernel's memory contents.

>> >
>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> > via a kernel command line parameter, "memmap=".
>>
>> memmap= is only used in old kexec-tools, now we are passing them via
>> e820 table.
>
> Thanks. I remember that you have explained it before.
>
> -Takahiro AKASHI
>
>> [snip]
>>
>> Thanks
>> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-19 13:09                                                               ` Ard Biesheuvel
  0 siblings, 0 replies; 135+ messages in thread
From: Ard Biesheuvel @ 2017-12-19 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 December 2017 at 07:09, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> > > <takahiro.akashi@linaro.org> wrote:
>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> > > >> <takahiro.akashi@linaro.org> wrote:
>> > > >> > Bhupesh, Ard,
>> > > >> >
>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> > > >> >> Hi Ard, Akashi
>> > > >> >>
>> > > >> > (snip)
>> > > >> >
>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> > > >> >> , for details)
>> > > >> >
>> > > >> > Right.
>> > > >> >
>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> > > >> >> with the crashkernel memory range:
>> > > >> >>
>> > > >> >>                 /* add linux,usable-memory-range */
>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> > > >> >>                                 address_cells, size_cells);
>> > > >> >>
>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> > > >> >> , for details)
>> > > >> >>
>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> > > >> >>
>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> > > >> >> ACPI memory and crashes while trying to access the same:
>> > > >> >>
>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> > > >> >> -r`.img --reuse-cmdline -d
>> > > >> >>
>> > > >> >> [snip..]
>> > > >> >>
>> > > >> >> Reserved memory range
>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> > > >> >>
>> > > >> >> Coredump memory ranges
>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> > > >> >> 000000002e800000-000000003961ffff (0)
>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> > > >> >> 000000a000000000-000000affbffffff (0)
>> > > >> >>
>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> > > >> >> memory cap'ing passed to the crash kernel inside
>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> > > >> >>
>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> > > >> >> {
>> > > >> >>         struct memblock_region reg = {
>> > > >> >>                 .size = 0,
>> > > >> >>         };
>> > > >> >>
>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> > > >> >>
>> > > >> >>         if (reg.size)
>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> > > >> >> comment this out */
>> > > >> >> }
>> > > >> >
>> > > >> > Please just don't do that. It can cause a fatal damage on
>> > > >> > memory contents of the *crashed* kernel.
>> > > >> >
>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> > > >> >>
>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> > > >> >> fail.
>> > > >> >>
>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> > > >> >> dt node 'linux,usable-memory-range'
>> > > >> >
>> > > >> > I still don't understand why we need to carry over the information
>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> > > >> > such regions are free to be reused by the kernel after some point of
>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> > > >> >
>> > > >>
>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> > > >> kernel, those regions needs to be preserved, which is why they are
>> > > >> memblock_reserve()'d now.
>> > > >
>> > > > For my better understandings, who is actually accessing such regions
>> > > > during boot time, uefi itself or efistub?
>> > > >
>> > >
>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> > > instance, on QEMU we have
>> > >
>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> > >   01000013)
>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> > > BXPC 00000001)
>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> > > BXPC 00000001)
>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> > > BXPC 00000001)
>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> > > BXPC 00000001)
>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> > > BXPC 00000001)
>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> > > BXPC 00000001)
>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> > > BXPC 00000001)
>> > >
>> > > covered by
>> > >
>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> > >  ...
>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >
>> > OK. I mistakenly understood those regions could be freed after exiting
>> > UEFI boot services.
>> >
>> > >
>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> > > >> when booting the next kernel.
>> > > >
>> > > > not really.
>> > > >
>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> > > >> > on crash dump kernel?)
>> > > >> >
>> > > >>
>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> > > >> regions only revealed the bug, not created it (given that other
>> > > >> memblock_reserve regions may be affected as well)
>> > > >
>> > > > As whether we should honor such reserved regions over kexec'ing
>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> > > > As a matter of fact, no information about "reserved" memblocks is
>> > > > exposed to user space (via proc/iomem).
>> > > >
>> > >
>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> > > as 'System RAM'. Do you think that could solve this?
>> >
>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> > marking them under another name in /proc/iomem would also be good in order
>> > not to allocate them as part of crash kernel's memory.
>> >
>> > But I'm not still convinced that we should export them in useable-
>> > memory-range to crash dump kernel. They will be accessed through
>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> > (or memblocks), I guess.
>> >     -> Bhupesh?
>>
>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> the memory according to the efi memmap?  For kdump kernel anything other
>> than usable memory (which is from the dt node instead) should be
>> reinitialized according to efi passed info, no?
>
> All the regions exported in efi memmap will be added to memblock.memory
> in (u)efi_init() and then trimmed down to the exact range specified as
> usable-memory-range by fdt_enforce_memory_region().
>
> Now I noticed that the current fdt_enforce_memory_region() may not work well
> with multiple entries in usable-memory-range.
>

In any case, the root of the problem is that memory regions lose their
'memory' annotation due to the way the memory map is mangled before
being supplied to the kexec kernel.

Would it be possible to classify all memory that we want to hide from
the kexec kernel as NOMAP instead? That way, it will not be mapped
implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
so this seems to be the most appropriate way to deal with the host
kernel's memory contents.

>> >
>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> > via a kernel command line parameter, "memmap=".
>>
>> memmap= is only used in old kexec-tools, now we are passing them via
>> e820 table.
>
> Thanks. I remember that you have explained it before.
>
> -Takahiro AKASHI
>
>> [snip]
>>
>> Thanks
>> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-19  5:01                                                                     ` AKASHI Takahiro
  (?)
  (?)
@ 2017-12-20 19:52                                                                         ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma,
	Ard Biesheuvel, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming

On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>
> This is an odd message coming from:
> |void __init arm64_memblock_init(void)
> |...
> |
> |                if (WARN(base < memblock_start_of_DRAM() ||
> |                         base + size > memblock_start_of_DRAM() +
> |                                       linear_region_size,
> |                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>
> Can you confirm how the condition breaks here?
> I suppose
>     base: 0xfe70000
>     size: 0x13c0000
>     memblock_start_of_DRAM(): 0xe800000
> according to the information you gave me.

Indeed, the first check 'base < memblock_start_of_DRAM()' in the
following check fails:

        if (WARN(base < memblock_start_of_DRAM() ||
             base + size > memblock_start_of_DRAM() +
                       linear_region_size,

Here are the values I am seeing on this board using the kernel and
kexec-tools which have been modified to append the
'linux,usable-memory-range' with the acpi reclaim regions:

base=fe70000,
size=13c0000,
memblock_start_of_DRAM=39620000
linear_region_size=800000000000

I suspect that the holes introduced by kexec-tools inside
'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see
the code leg below):

    /* Put the other segments after the image. */

    hole_min = image_base + arm64_mem.image_size;
    if (info->kexec_flags & KEXEC_ON_CRASH)
        hole_max = crash_reserved_mem.end;
    else
        hole_max = ULONG_MAX;


should be updated to introduce appropriate handling of the acpi reclaim regions.
I am not aware of the background of this handling in the kexec-tools.
Do you think this can be at fault, Akashi?

Regards,
Bhupesh



>
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org
>> >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-20 19:52                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma,
	Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, linux-efi, Mark Rutland,
	Matt Fleming

On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>
> This is an odd message coming from:
> |void __init arm64_memblock_init(void)
> |...
> |
> |                if (WARN(base < memblock_start_of_DRAM() ||
> |                         base + size > memblock_start_of_DRAM() +
> |                                       linear_region_size,
> |                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>
> Can you confirm how the condition breaks here?
> I suppose
>     base: 0xfe70000
>     size: 0x13c0000
>     memblock_start_of_DRAM(): 0xe800000
> according to the information you gave me.

Indeed, the first check 'base < memblock_start_of_DRAM()' in the
following check fails:

        if (WARN(base < memblock_start_of_DRAM() ||
             base + size > memblock_start_of_DRAM() +
                       linear_region_size,

Here are the values I am seeing on this board using the kernel and
kexec-tools which have been modified to append the
'linux,usable-memory-range' with the acpi reclaim regions:

base=fe70000,
size=13c0000,
memblock_start_of_DRAM=39620000
linear_region_size=800000000000

I suspect that the holes introduced by kexec-tools inside
'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see
the code leg below):

    /* Put the other segments after the image. */

    hole_min = image_base + arm64_mem.image_size;
    if (info->kexec_flags & KEXEC_ON_CRASH)
        hole_max = crash_reserved_mem.end;
    else
        hole_max = ULONG_MAX;


should be updated to introduce appropriate handling of the acpi reclaim regions.
I am not aware of the background of this handling in the kexec-tools.
Do you think this can be at fault, Akashi?

Regards,
Bhupesh



>
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec@lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-20 19:52                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>
> This is an odd message coming from:
> |void __init arm64_memblock_init(void)
> |...
> |
> |                if (WARN(base < memblock_start_of_DRAM() ||
> |                         base + size > memblock_start_of_DRAM() +
> |                                       linear_region_size,
> |                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>
> Can you confirm how the condition breaks here?
> I suppose
>     base: 0xfe70000
>     size: 0x13c0000
>     memblock_start_of_DRAM(): 0xe800000
> according to the information you gave me.

Indeed, the first check 'base < memblock_start_of_DRAM()' in the
following check fails:

        if (WARN(base < memblock_start_of_DRAM() ||
             base + size > memblock_start_of_DRAM() +
                       linear_region_size,

Here are the values I am seeing on this board using the kernel and
kexec-tools which have been modified to append the
'linux,usable-memory-range' with the acpi reclaim regions:

base=fe70000,
size=13c0000,
memblock_start_of_DRAM=39620000
linear_region_size=800000000000

I suspect that the holes introduced by kexec-tools inside
'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see
the code leg below):

    /* Put the other segments after the image. */

    hole_min = image_base + arm64_mem.image_size;
    if (info->kexec_flags & KEXEC_ON_CRASH)
        hole_max = crash_reserved_mem.end;
    else
        hole_max = ULONG_MAX;


should be updated to introduce appropriate handling of the acpi reclaim regions.
I am not aware of the background of this handling in the kexec-tools.
Do you think this can be at fault, Akashi?

Regards,
Bhupesh



>
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec at lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-20 19:52                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma,
	Ard Biesheuvel, kexec, linux-acpi, linux-kernel,
	linux-arm-kernel, James Morse, linux-efi, Mark Rutland,
	Matt Fleming

On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote:
>>
>> [snip..]
>>
>> [    0.000000] linux,usable-memory-range base e800000, size 20000000
>> [    0.000000]  - e800000 ,  20000000
>> [    0.000000] linux,usable-memory-range base 396c0000, size a0000
>> [    0.000000]  - 396c0000 ,  a0000
>> [    0.000000] linux,usable-memory-range base 39770000, size 40000
>> [    0.000000]  - 39770000 ,  40000
>> [    0.000000] linux,usable-memory-range base 398a0000, size 20000
>> [    0.000000]  - 398a0000 ,  20000
>> [    0.000000] initrd not fully accessible via the linear mapping --
>> please check your bootloader ...
>
> This is an odd message coming from:
> |void __init arm64_memblock_init(void)
> |...
> |
> |                if (WARN(base < memblock_start_of_DRAM() ||
> |                         base + size > memblock_start_of_DRAM() +
> |                                       linear_region_size,
> |                        "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
>
> Can you confirm how the condition breaks here?
> I suppose
>     base: 0xfe70000
>     size: 0x13c0000
>     memblock_start_of_DRAM(): 0xe800000
> according to the information you gave me.

Indeed, the first check 'base < memblock_start_of_DRAM()' in the
following check fails:

        if (WARN(base < memblock_start_of_DRAM() ||
             base + size > memblock_start_of_DRAM() +
                       linear_region_size,

Here are the values I am seeing on this board using the kernel and
kexec-tools which have been modified to append the
'linux,usable-memory-range' with the acpi reclaim regions:

base=fe70000,
size=13c0000,
memblock_start_of_DRAM=39620000
linear_region_size=800000000000

I suspect that the holes introduced by kexec-tools inside
'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see
the code leg below):

    /* Put the other segments after the image. */

    hole_min = image_base + arm64_mem.image_size;
    if (info->kexec_flags & KEXEC_ON_CRASH)
        hole_max = crash_reserved_mem.end;
    else
        hole_max = ULONG_MAX;


should be updated to introduce appropriate handling of the acpi reclaim regions.
I am not aware of the background of this handling in the kexec-tools.
Do you think this can be at fault, Akashi?

Regards,
Bhupesh



>
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597
>> arm64_memblock_init+0x210/0x484
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7
>> [    0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000
>> [    0.000000] PC is at arm64_memblock_init+0x210/0x484
>> [    0.000000] LR is at arm64_memblock_init+0x210/0x484
>> [    0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>]
>> pstate: 600000c5
>> [    0.000000] sp : ffff000008ccfe80
>> [    0.000000] x29: ffff000008ccfe80 x28: 000000000f370018
>> [    0.000000] x27: 0000000011230000 x26: 00000000013b0000
>> [    0.000000] x25: 000000000fe80000 x24: ffff000008cf3000
>> [    0.000000] x23: ffff000008ec0000 x22: ffff000009680000
>> [    0.000000] x21: ffff000008afa000 x20: ffff000008080000
>> [    0.000000] x19: ffff000008afa000 x18: 000000000c283806
>> [    0.000000] x17: 0000000000000000 x16: ffff000008d05580
>> [    0.000000] x15: 000000002be00842 x14: 79206b6365686320
>> [    0.000000] x13: 657361656c70202d x12: 2d20676e69707061
>> [    0.000000] x11: 6d207261656e696c x10: 2065687420616976
>> [    0.000000] x9 : 00000000000000f4 x8 : ffff000008517414
>> [    0.000000] x7 : 746f6f622072756f x6 : 000000000000000d
>> [    0.000000] x5 : ffff000008c96360 x4 : 0000000000000001
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000000
>> [    0.000000] x1 : 0000000000000000 x0 : 0000000000000056
>> [    0.000000] Call trace:
>> [    0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80)
>> [    0.000000] fd40: 0000000000000056 0000000000000000
>> 0000000000000000 0000000000000000
>> [    0.000000] fd60: 0000000000000001 ffff000008c96360
>> 000000000000000d 746f6f622072756f
>> [    0.000000] fd80: ffff000008517414 00000000000000f4
>> 2065687420616976 6d207261656e696c
>> [    0.000000] fda0: 2d20676e69707061 657361656c70202d
>> 79206b6365686320 000000002be00842
>> [    0.000000] fdc0: ffff000008d05580 0000000000000000
>> 000000000c283806 ffff000008afa000
>> [    0.000000] fde0: ffff000008080000 ffff000008afa000
>> ffff000009680000 ffff000008ec0000
>> [    0.000000] fe00: ffff000008cf3000 000000000fe80000
>> 00000000013b0000 0000000011230000
>> [    0.000000] fe20: 000000000f370018 ffff000008ccfe80
>> ffff000008b76984 ffff000008ccfe80
>> [    0.000000] fe40: ffff000008b76984 00000000600000c5
>> ffff00000959b7a8 ffff000008ec0000
>> [    0.000000] fe60: ffffffffffffffff 0000000000000005
>> ffff000008ccfe80 ffff000008b76984
>> [    0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484
>> [    0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] random: get_random_bytes called from
>> print_oops_end_marker+0x50/0x6c with crng_init=0
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr
>> [    0.000000] cma: Failed to reserve 512 MiB
>> [    0.000000] Kernel panic - not syncing: ERROR: Failed to allocate
>> 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W
>> ------------   4.14.0+ #7
>> [    0.000000] Call trace:
>> [    0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c
>> [    0.000000] [<ffff000008089008>] show_stack+0x24/0x2c
>> [    0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8
>> [    0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0
>> [    0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c
>> [    0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38
>> [    0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74
>> [    0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544
>> [    0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4
>> [    0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c
>> [    0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to
>> allocate 0x0000000000010000 bytes below 0x0000000000000000.
>> [    0.000000]
>>
>> I guess it is because of the 1G alignment requirement between the
>> kernel image and the initrd and how we populate the holes between the
>> kernel image, segments (including dtb) and the initrd from the
>> kexec-tools.
>>
>> Akashi, any pointers on this will be helpful as well.
>>
>> Regards,
>> Bhupesh
>>
>>
>> >> >
>> >> > Regards,
>> >> > Bhupesh
>> >> >
>> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> > >> via a kernel command line parameter, "memmap=".
>> >> > >>
>> >> > _______________________________________________
>> >> > kexec mailing list -- kexec@lists.fedoraproject.org
>> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-19 13:09                                                               ` Ard Biesheuvel
@ 2017-12-20 20:00                                                                   ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-20 20:00 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: AKASHI Takahiro, Dave Young, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A

On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 19 December 2017 at 07:09, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> > > >> > Bhupesh, Ard,
>>> > > >> >
>>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>>> > > >> >> Hi Ard, Akashi
>>> > > >> >>
>>> > > >> > (snip)
>>> > > >> >
>>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>>> > > >> >> , for details)
>>> > > >> >
>>> > > >> > Right.
>>> > > >> >
>>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>>> > > >> >> with the crashkernel memory range:
>>> > > >> >>
>>> > > >> >>                 /* add linux,usable-memory-range */
>>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>> > > >> >>                                 address_cells, size_cells);
>>> > > >> >>
>>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>>> > > >> >> , for details)
>>> > > >> >>
>>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>>> > > >> >> they are marked as System RAM or as RESERVED. As,
>>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>>> > > >> >>
>>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>>> > > >> >> ACPI memory and crashes while trying to access the same:
>>> > > >> >>
>>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>>> > > >> >> -r`.img --reuse-cmdline -d
>>> > > >> >>
>>> > > >> >> [snip..]
>>> > > >> >>
>>> > > >> >> Reserved memory range
>>> > > >> >> 000000000e800000-000000002e7fffff (0)
>>> > > >> >>
>>> > > >> >> Coredump memory ranges
>>> > > >> >> 0000000000000000-000000000e7fffff (0)
>>> > > >> >> 000000002e800000-000000003961ffff (0)
>>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>>> > > >> >> 000000a000000000-000000affbffffff (0)
>>> > > >> >>
>>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>>> > > >> >> memory cap'ing passed to the crash kernel inside
>>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>>> > > >> >>
>>> > > >> >> static void __init fdt_enforce_memory_region(void)
>>> > > >> >> {
>>> > > >> >>         struct memblock_region reg = {
>>> > > >> >>                 .size = 0,
>>> > > >> >>         };
>>> > > >> >>
>>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>> > > >> >>
>>> > > >> >>         if (reg.size)
>>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>>> > > >> >> comment this out */
>>> > > >> >> }
>>> > > >> >
>>> > > >> > Please just don't do that. It can cause a fatal damage on
>>> > > >> > memory contents of the *crashed* kernel.
>>> > > >> >
>>> > > >> >> 5). Both the above temporary solutions fix the problem.
>>> > > >> >>
>>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>>> > > >> >> fail.
>>> > > >> >>
>>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>>> > > >> >> dt node 'linux,usable-memory-range'
>>> > > >> >
>>> > > >> > I still don't understand why we need to carry over the information
>>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>>> > > >> > such regions are free to be reused by the kernel after some point of
>>> > > >> > initialization. Why does crash dump kernel need to know about them?
>>> > > >> >
>>> > > >>
>>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>>> > > >> kernel, those regions needs to be preserved, which is why they are
>>> > > >> memblock_reserve()'d now.
>>> > > >
>>> > > > For my better understandings, who is actually accessing such regions
>>> > > > during boot time, uefi itself or efistub?
>>> > > >
>>> > >
>>> > > No, only the kernel. This is where the ACPI tables are stored. For
>>> > > instance, on QEMU we have
>>> > >
>>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>>> > >   01000013)
>>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>>> > > BXPC 00000001)
>>> > >
>>> > > covered by
>>> > >
>>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>>> > >  ...
>>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>>> >
>>> > OK. I mistakenly understood those regions could be freed after exiting
>>> > UEFI boot services.
>>> >
>>> > >
>>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>>> > > >> when booting the next kernel.
>>> > > >
>>> > > > not really.
>>> > > >
>>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>>> > > >> > on crash dump kernel?)
>>> > > >> >
>>> > > >>
>>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>>> > > >> regions only revealed the bug, not created it (given that other
>>> > > >> memblock_reserve regions may be affected as well)
>>> > > >
>>> > > > As whether we should honor such reserved regions over kexec'ing
>>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>>> > > > As a matter of fact, no information about "reserved" memblocks is
>>> > > > exposed to user space (via proc/iomem).
>>> > > >
>>> > >
>>> > > That is why I suggested (somewhere in this thread?) to not expose them
>>> > > as 'System RAM'. Do you think that could solve this?
>>> >
>>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>>> > marking them under another name in /proc/iomem would also be good in order
>>> > not to allocate them as part of crash kernel's memory.
>>> >
>>> > But I'm not still convinced that we should export them in useable-
>>> > memory-range to crash dump kernel. They will be accessed through
>>> > acpi_os_map_memory() and so won't be required to be part of system ram
>>> > (or memblocks), I guess.
>>> >     -> Bhupesh?
>>>
>>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>>> the memory according to the efi memmap?  For kdump kernel anything other
>>> than usable memory (which is from the dt node instead) should be
>>> reinitialized according to efi passed info, no?
>>
>> All the regions exported in efi memmap will be added to memblock.memory
>> in (u)efi_init() and then trimmed down to the exact range specified as
>> usable-memory-range by fdt_enforce_memory_region().
>>
>> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> with multiple entries in usable-memory-range.
>>
>
> In any case, the root of the problem is that memory regions lose their
> 'memory' annotation due to the way the memory map is mangled before
> being supplied to the kexec kernel.
>
> Would it be possible to classify all memory that we want to hide from
> the kexec kernel as NOMAP instead? That way, it will not be mapped
> implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> so this seems to be the most appropriate way to deal with the host
> kernel's memory contents.

Hmm. wouldn't appending the acpi reclaim regions to
'linux,usable-memory-range' in the dtb being passed to the crashkernel
be better? Because its indirectly achieving a similar objective
(although may be a subset of all System RAM regions on the primary
kernel's memory).

I am not aware of the background about the current kexec-tools
implementation where we add only the crashkernel range to the dtb
being passed to the crashkernel.

Probably Akashi can answer better, as to how we arrived at this design
approach and why we didn't want to expose all System RAM regions (i.e.
! NOMPAP regions) to the crashkernel.

I am suspecting that some issues were seen/meet when the System RAM (!
NOMAP regions) were exposed to the crashkernel, and that's why we
finalized on this design approach, but this is something which is just
my guess.

Regards,
Bhupesh

>>> >
>>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>>> > via a kernel command line parameter, "memmap=".
>>>
>>> memmap= is only used in old kexec-tools, now we are passing them via
>>> e820 table.
>>
>> Thanks. I remember that you have explained it before.
>>
>> -Takahiro AKASHI
>>
>>> [snip]
>>>
>>> Thanks
>>> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-20 20:00                                                                   ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-20 20:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 19 December 2017 at 07:09, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>>> > > <takahiro.akashi@linaro.org> wrote:
>>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>>> > > >> <takahiro.akashi@linaro.org> wrote:
>>> > > >> > Bhupesh, Ard,
>>> > > >> >
>>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>>> > > >> >> Hi Ard, Akashi
>>> > > >> >>
>>> > > >> > (snip)
>>> > > >> >
>>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>>> > > >> >> , for details)
>>> > > >> >
>>> > > >> > Right.
>>> > > >> >
>>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>>> > > >> >> with the crashkernel memory range:
>>> > > >> >>
>>> > > >> >>                 /* add linux,usable-memory-range */
>>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>>> > > >> >>                                 address_cells, size_cells);
>>> > > >> >>
>>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>>> > > >> >> , for details)
>>> > > >> >>
>>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>>> > > >> >> they are marked as System RAM or as RESERVED. As,
>>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>>> > > >> >>
>>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>>> > > >> >> ACPI memory and crashes while trying to access the same:
>>> > > >> >>
>>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>>> > > >> >> -r`.img --reuse-cmdline -d
>>> > > >> >>
>>> > > >> >> [snip..]
>>> > > >> >>
>>> > > >> >> Reserved memory range
>>> > > >> >> 000000000e800000-000000002e7fffff (0)
>>> > > >> >>
>>> > > >> >> Coredump memory ranges
>>> > > >> >> 0000000000000000-000000000e7fffff (0)
>>> > > >> >> 000000002e800000-000000003961ffff (0)
>>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>>> > > >> >> 000000a000000000-000000affbffffff (0)
>>> > > >> >>
>>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>>> > > >> >> memory cap'ing passed to the crash kernel inside
>>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>>> > > >> >>
>>> > > >> >> static void __init fdt_enforce_memory_region(void)
>>> > > >> >> {
>>> > > >> >>         struct memblock_region reg = {
>>> > > >> >>                 .size = 0,
>>> > > >> >>         };
>>> > > >> >>
>>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>>> > > >> >>
>>> > > >> >>         if (reg.size)
>>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>>> > > >> >> comment this out */
>>> > > >> >> }
>>> > > >> >
>>> > > >> > Please just don't do that. It can cause a fatal damage on
>>> > > >> > memory contents of the *crashed* kernel.
>>> > > >> >
>>> > > >> >> 5). Both the above temporary solutions fix the problem.
>>> > > >> >>
>>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>>> > > >> >> fail.
>>> > > >> >>
>>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>>> > > >> >> dt node 'linux,usable-memory-range'
>>> > > >> >
>>> > > >> > I still don't understand why we need to carry over the information
>>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>>> > > >> > such regions are free to be reused by the kernel after some point of
>>> > > >> > initialization. Why does crash dump kernel need to know about them?
>>> > > >> >
>>> > > >>
>>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>>> > > >> kernel, those regions needs to be preserved, which is why they are
>>> > > >> memblock_reserve()'d now.
>>> > > >
>>> > > > For my better understandings, who is actually accessing such regions
>>> > > > during boot time, uefi itself or efistub?
>>> > > >
>>> > >
>>> > > No, only the kernel. This is where the ACPI tables are stored. For
>>> > > instance, on QEMU we have
>>> > >
>>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>>> > >   01000013)
>>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>>> > > BXPC 00000001)
>>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>>> > > BXPC 00000001)
>>> > >
>>> > > covered by
>>> > >
>>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>>> > >  ...
>>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>>> >
>>> > OK. I mistakenly understood those regions could be freed after exiting
>>> > UEFI boot services.
>>> >
>>> > >
>>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>>> > > >> when booting the next kernel.
>>> > > >
>>> > > > not really.
>>> > > >
>>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>>> > > >> > on crash dump kernel?)
>>> > > >> >
>>> > > >>
>>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>>> > > >> regions only revealed the bug, not created it (given that other
>>> > > >> memblock_reserve regions may be affected as well)
>>> > > >
>>> > > > As whether we should honor such reserved regions over kexec'ing
>>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>>> > > > As a matter of fact, no information about "reserved" memblocks is
>>> > > > exposed to user space (via proc/iomem).
>>> > > >
>>> > >
>>> > > That is why I suggested (somewhere in this thread?) to not expose them
>>> > > as 'System RAM'. Do you think that could solve this?
>>> >
>>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>>> > marking them under another name in /proc/iomem would also be good in order
>>> > not to allocate them as part of crash kernel's memory.
>>> >
>>> > But I'm not still convinced that we should export them in useable-
>>> > memory-range to crash dump kernel. They will be accessed through
>>> > acpi_os_map_memory() and so won't be required to be part of system ram
>>> > (or memblocks), I guess.
>>> >     -> Bhupesh?
>>>
>>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>>> the memory according to the efi memmap?  For kdump kernel anything other
>>> than usable memory (which is from the dt node instead) should be
>>> reinitialized according to efi passed info, no?
>>
>> All the regions exported in efi memmap will be added to memblock.memory
>> in (u)efi_init() and then trimmed down to the exact range specified as
>> usable-memory-range by fdt_enforce_memory_region().
>>
>> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> with multiple entries in usable-memory-range.
>>
>
> In any case, the root of the problem is that memory regions lose their
> 'memory' annotation due to the way the memory map is mangled before
> being supplied to the kexec kernel.
>
> Would it be possible to classify all memory that we want to hide from
> the kexec kernel as NOMAP instead? That way, it will not be mapped
> implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> so this seems to be the most appropriate way to deal with the host
> kernel's memory contents.

Hmm. wouldn't appending the acpi reclaim regions to
'linux,usable-memory-range' in the dtb being passed to the crashkernel
be better? Because its indirectly achieving a similar objective
(although may be a subset of all System RAM regions on the primary
kernel's memory).

I am not aware of the background about the current kexec-tools
implementation where we add only the crashkernel range to the dtb
being passed to the crashkernel.

Probably Akashi can answer better, as to how we arrived at this design
approach and why we didn't want to expose all System RAM regions (i.e.
! NOMPAP regions) to the crashkernel.

I am suspecting that some issues were seen/meet when the System RAM (!
NOMAP regions) were exposed to the crashkernel, and that's why we
finalized on this design approach, but this is something which is just
my guess.

Regards,
Bhupesh

>>> >
>>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>>> > via a kernel command line parameter, "memmap=".
>>>
>>> memmap= is only used in old kexec-tools, now we are passing them via
>>> e820 table.
>>
>> Thanks. I remember that you have explained it before.
>>
>> -Takahiro AKASHI
>>
>>> [snip]
>>>
>>> Thanks
>>> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-20 20:00                                                                   ` Bhupesh Sharma
  (?)
@ 2017-12-21 10:34                                                                       ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-21 10:34 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Bhupesh,

Can you test the patch attached below, please?

It is intended to retain already-reserved regions (ACPI reclaim memory
in this case) in system ram (i.e. memblock.memory) without explicitly
exporting them via usable-memory-range.
(I still have to figure out what the side-effect of this patch is.)

Thanks,
-Takahiro AKASHI

On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > On 19 December 2017 at 07:09, AKASHI Takahiro
> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >>> > > >> > Bhupesh, Ard,
> >>> > > >> >
> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> > > >> >> Hi Ard, Akashi
> >>> > > >> >>
> >>> > > >> > (snip)
> >>> > > >> >
> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> > > >> >> , for details)
> >>> > > >> >
> >>> > > >> > Right.
> >>> > > >> >
> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> > > >> >> with the crashkernel memory range:
> >>> > > >> >>
> >>> > > >> >>                 /* add linux,usable-memory-range */
> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> > > >> >>                                 address_cells, size_cells);
> >>> > > >> >>
> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> > > >> >> , for details)
> >>> > > >> >>
> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> > > >> >>
> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >>> > > >> >>
> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> > > >> >> -r`.img --reuse-cmdline -d
> >>> > > >> >>
> >>> > > >> >> [snip..]
> >>> > > >> >>
> >>> > > >> >> Reserved memory range
> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >>> > > >> >>
> >>> > > >> >> Coredump memory ranges
> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >>> > > >> >>
> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >>> > > >> >>
> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >>> > > >> >> {
> >>> > > >> >>         struct memblock_region reg = {
> >>> > > >> >>                 .size = 0,
> >>> > > >> >>         };
> >>> > > >> >>
> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> > > >> >>
> >>> > > >> >>         if (reg.size)
> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> > > >> >> comment this out */
> >>> > > >> >> }
> >>> > > >> >
> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >>> > > >> > memory contents of the *crashed* kernel.
> >>> > > >> >
> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >>> > > >> >>
> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> > > >> >> fail.
> >>> > > >> >>
> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> > > >> >> dt node 'linux,usable-memory-range'
> >>> > > >> >
> >>> > > >> > I still don't understand why we need to carry over the information
> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >>> > > >> >
> >>> > > >>
> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >>> > > >> memblock_reserve()'d now.
> >>> > > >
> >>> > > > For my better understandings, who is actually accessing such regions
> >>> > > > during boot time, uefi itself or efistub?
> >>> > > >
> >>> > >
> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >>> > > instance, on QEMU we have
> >>> > >
> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>> > >   01000013)
> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> > > BXPC 00000001)
> >>> > >
> >>> > > covered by
> >>> > >
> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>> > >  ...
> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>> >
> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >>> > UEFI boot services.
> >>> >
> >>> > >
> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >>> > > >> when booting the next kernel.
> >>> > > >
> >>> > > > not really.
> >>> > > >
> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> > > >> > on crash dump kernel?)
> >>> > > >> >
> >>> > > >>
> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >>> > > >> regions only revealed the bug, not created it (given that other
> >>> > > >> memblock_reserve regions may be affected as well)
> >>> > > >
> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >>> > > > exposed to user space (via proc/iomem).
> >>> > > >
> >>> > >
> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >>> > > as 'System RAM'. Do you think that could solve this?
> >>> >
> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >>> > marking them under another name in /proc/iomem would also be good in order
> >>> > not to allocate them as part of crash kernel's memory.
> >>> >
> >>> > But I'm not still convinced that we should export them in useable-
> >>> > memory-range to crash dump kernel. They will be accessed through
> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >>> > (or memblocks), I guess.
> >>> >     -> Bhupesh?
> >>>
> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >>> than usable memory (which is from the dt node instead) should be
> >>> reinitialized according to efi passed info, no?
> >>
> >> All the regions exported in efi memmap will be added to memblock.memory
> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> usable-memory-range by fdt_enforce_memory_region().
> >>
> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> with multiple entries in usable-memory-range.
> >>
> >
> > In any case, the root of the problem is that memory regions lose their
> > 'memory' annotation due to the way the memory map is mangled before
> > being supplied to the kexec kernel.
> >
> > Would it be possible to classify all memory that we want to hide from
> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> > so this seems to be the most appropriate way to deal with the host
> > kernel's memory contents.
> 
> Hmm. wouldn't appending the acpi reclaim regions to
> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> be better? Because its indirectly achieving a similar objective
> (although may be a subset of all System RAM regions on the primary
> kernel's memory).
> 
> I am not aware of the background about the current kexec-tools
> implementation where we add only the crashkernel range to the dtb
> being passed to the crashkernel.
> 
> Probably Akashi can answer better, as to how we arrived at this design
> approach and why we didn't want to expose all System RAM regions (i.e.
> ! NOMPAP regions) to the crashkernel.
> 
> I am suspecting that some issues were seen/meet when the System RAM (!
> NOMAP regions) were exposed to the crashkernel, and that's why we
> finalized on this design approach, but this is something which is just
> my guess.
> 
> Regards,
> Bhupesh
> 
> >>> >
> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >>> > via a kernel command line parameter, "memmap=".
> >>>
> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >>> e820 table.
> >>
> >> Thanks. I remember that you have explained it before.
> >>
> >> -Takahiro AKASHI
> >>
> >>> [snip]
> >>>
> >>> Thanks
> >>> Dave

===8<==
>From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Date: Thu, 21 Dec 2017 19:14:23 +0900
Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP

---
 arch/arm64/mm/init.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b900ca41..8175db94257b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
 	struct memblock_region reg = {
 		.size = 0,
 	};
+	u64 idx;
+	phys_addr_t start, end;
 
 	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
 
-	if (reg.size)
-		memblock_cap_memory_range(reg.base, reg.size);
+	if (reg.size) {
+		for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
+					&start, &end, NULL)
+			memblock_mark_nomap(start, end - start);
+		memblock_clear_nomap(reg.base, reg.size);
+	}
 }
 
 void __init arm64_memblock_init(void)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-21 10:34                                                                       ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-21 10:34 UTC (permalink / raw)
  To: linux-arm-kernel

Bhupesh,

Can you test the patch attached below, please?

It is intended to retain already-reserved regions (ACPI reclaim memory
in this case) in system ram (i.e. memblock.memory) without explicitly
exporting them via usable-memory-range.
(I still have to figure out what the side-effect of this patch is.)

Thanks,
-Takahiro AKASHI

On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 19 December 2017 at 07:09, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> > > <takahiro.akashi@linaro.org> wrote:
> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >>> > > >> > Bhupesh, Ard,
> >>> > > >> >
> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> > > >> >> Hi Ard, Akashi
> >>> > > >> >>
> >>> > > >> > (snip)
> >>> > > >> >
> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> > > >> >> , for details)
> >>> > > >> >
> >>> > > >> > Right.
> >>> > > >> >
> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> > > >> >> with the crashkernel memory range:
> >>> > > >> >>
> >>> > > >> >>                 /* add linux,usable-memory-range */
> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> > > >> >>                                 address_cells, size_cells);
> >>> > > >> >>
> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> > > >> >> , for details)
> >>> > > >> >>
> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> > > >> >>
> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >>> > > >> >>
> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> > > >> >> -r`.img --reuse-cmdline -d
> >>> > > >> >>
> >>> > > >> >> [snip..]
> >>> > > >> >>
> >>> > > >> >> Reserved memory range
> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >>> > > >> >>
> >>> > > >> >> Coredump memory ranges
> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >>> > > >> >>
> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >>> > > >> >>
> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >>> > > >> >> {
> >>> > > >> >>         struct memblock_region reg = {
> >>> > > >> >>                 .size = 0,
> >>> > > >> >>         };
> >>> > > >> >>
> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> > > >> >>
> >>> > > >> >>         if (reg.size)
> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> > > >> >> comment this out */
> >>> > > >> >> }
> >>> > > >> >
> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >>> > > >> > memory contents of the *crashed* kernel.
> >>> > > >> >
> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >>> > > >> >>
> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> > > >> >> fail.
> >>> > > >> >>
> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> > > >> >> dt node 'linux,usable-memory-range'
> >>> > > >> >
> >>> > > >> > I still don't understand why we need to carry over the information
> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >>> > > >> >
> >>> > > >>
> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >>> > > >> memblock_reserve()'d now.
> >>> > > >
> >>> > > > For my better understandings, who is actually accessing such regions
> >>> > > > during boot time, uefi itself or efistub?
> >>> > > >
> >>> > >
> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >>> > > instance, on QEMU we have
> >>> > >
> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>> > >   01000013)
> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> > > BXPC 00000001)
> >>> > >
> >>> > > covered by
> >>> > >
> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>> > >  ...
> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>> >
> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >>> > UEFI boot services.
> >>> >
> >>> > >
> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >>> > > >> when booting the next kernel.
> >>> > > >
> >>> > > > not really.
> >>> > > >
> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> > > >> > on crash dump kernel?)
> >>> > > >> >
> >>> > > >>
> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >>> > > >> regions only revealed the bug, not created it (given that other
> >>> > > >> memblock_reserve regions may be affected as well)
> >>> > > >
> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >>> > > > exposed to user space (via proc/iomem).
> >>> > > >
> >>> > >
> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >>> > > as 'System RAM'. Do you think that could solve this?
> >>> >
> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >>> > marking them under another name in /proc/iomem would also be good in order
> >>> > not to allocate them as part of crash kernel's memory.
> >>> >
> >>> > But I'm not still convinced that we should export them in useable-
> >>> > memory-range to crash dump kernel. They will be accessed through
> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >>> > (or memblocks), I guess.
> >>> >     -> Bhupesh?
> >>>
> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >>> than usable memory (which is from the dt node instead) should be
> >>> reinitialized according to efi passed info, no?
> >>
> >> All the regions exported in efi memmap will be added to memblock.memory
> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> usable-memory-range by fdt_enforce_memory_region().
> >>
> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> with multiple entries in usable-memory-range.
> >>
> >
> > In any case, the root of the problem is that memory regions lose their
> > 'memory' annotation due to the way the memory map is mangled before
> > being supplied to the kexec kernel.
> >
> > Would it be possible to classify all memory that we want to hide from
> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> > so this seems to be the most appropriate way to deal with the host
> > kernel's memory contents.
> 
> Hmm. wouldn't appending the acpi reclaim regions to
> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> be better? Because its indirectly achieving a similar objective
> (although may be a subset of all System RAM regions on the primary
> kernel's memory).
> 
> I am not aware of the background about the current kexec-tools
> implementation where we add only the crashkernel range to the dtb
> being passed to the crashkernel.
> 
> Probably Akashi can answer better, as to how we arrived at this design
> approach and why we didn't want to expose all System RAM regions (i.e.
> ! NOMPAP regions) to the crashkernel.
> 
> I am suspecting that some issues were seen/meet when the System RAM (!
> NOMAP regions) were exposed to the crashkernel, and that's why we
> finalized on this design approach, but this is something which is just
> my guess.
> 
> Regards,
> Bhupesh
> 
> >>> >
> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >>> > via a kernel command line parameter, "memmap=".
> >>>
> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >>> e820 table.
> >>
> >> Thanks. I remember that you have explained it before.
> >>
> >> -Takahiro AKASHI
> >>
> >>> [snip]
> >>>
> >>> Thanks
> >>> Dave

===8<==
>From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
From: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date: Thu, 21 Dec 2017 19:14:23 +0900
Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP

---
 arch/arm64/mm/init.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b900ca41..8175db94257b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
 	struct memblock_region reg = {
 		.size = 0,
 	};
+	u64 idx;
+	phys_addr_t start, end;
 
 	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
 
-	if (reg.size)
-		memblock_cap_memory_range(reg.base, reg.size);
+	if (reg.size) {
+		for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
+					&start, &end, NULL)
+			memblock_mark_nomap(start, end - start);
+		memblock_clear_nomap(reg.base, reg.size);
+	}
 }
 
 void __init arm64_memblock_init(void)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-21 10:34                                                                       ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-21 10:34 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel

Bhupesh,

Can you test the patch attached below, please?

It is intended to retain already-reserved regions (ACPI reclaim memory
in this case) in system ram (i.e. memblock.memory) without explicitly
exporting them via usable-memory-range.
(I still have to figure out what the side-effect of this patch is.)

Thanks,
-Takahiro AKASHI

On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> > On 19 December 2017 at 07:09, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >>> > > <takahiro.akashi@linaro.org> wrote:
> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >>> > > >> > Bhupesh, Ard,
> >>> > > >> >
> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >>> > > >> >> Hi Ard, Akashi
> >>> > > >> >>
> >>> > > >> > (snip)
> >>> > > >> >
> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >>> > > >> >> , for details)
> >>> > > >> >
> >>> > > >> > Right.
> >>> > > >> >
> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >>> > > >> >> with the crashkernel memory range:
> >>> > > >> >>
> >>> > > >> >>                 /* add linux,usable-memory-range */
> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >>> > > >> >>                                 address_cells, size_cells);
> >>> > > >> >>
> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >>> > > >> >> , for details)
> >>> > > >> >>
> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >>> > > >> >>
> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >>> > > >> >>
> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >>> > > >> >> -r`.img --reuse-cmdline -d
> >>> > > >> >>
> >>> > > >> >> [snip..]
> >>> > > >> >>
> >>> > > >> >> Reserved memory range
> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >>> > > >> >>
> >>> > > >> >> Coredump memory ranges
> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >>> > > >> >>
> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >>> > > >> >>
> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >>> > > >> >> {
> >>> > > >> >>         struct memblock_region reg = {
> >>> > > >> >>                 .size = 0,
> >>> > > >> >>         };
> >>> > > >> >>
> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >>> > > >> >>
> >>> > > >> >>         if (reg.size)
> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >>> > > >> >> comment this out */
> >>> > > >> >> }
> >>> > > >> >
> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >>> > > >> > memory contents of the *crashed* kernel.
> >>> > > >> >
> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >>> > > >> >>
> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >>> > > >> >> fail.
> >>> > > >> >>
> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >>> > > >> >> dt node 'linux,usable-memory-range'
> >>> > > >> >
> >>> > > >> > I still don't understand why we need to carry over the information
> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >>> > > >> >
> >>> > > >>
> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >>> > > >> memblock_reserve()'d now.
> >>> > > >
> >>> > > > For my better understandings, who is actually accessing such regions
> >>> > > > during boot time, uefi itself or efistub?
> >>> > > >
> >>> > >
> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >>> > > instance, on QEMU we have
> >>> > >
> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >>> > >   01000013)
> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >>> > > BXPC 00000001)
> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >>> > > BXPC 00000001)
> >>> > >
> >>> > > covered by
> >>> > >
> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >>> > >  ...
> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >>> >
> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >>> > UEFI boot services.
> >>> >
> >>> > >
> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >>> > > >> when booting the next kernel.
> >>> > > >
> >>> > > > not really.
> >>> > > >
> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >>> > > >> > on crash dump kernel?)
> >>> > > >> >
> >>> > > >>
> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >>> > > >> regions only revealed the bug, not created it (given that other
> >>> > > >> memblock_reserve regions may be affected as well)
> >>> > > >
> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >>> > > > exposed to user space (via proc/iomem).
> >>> > > >
> >>> > >
> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >>> > > as 'System RAM'. Do you think that could solve this?
> >>> >
> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >>> > marking them under another name in /proc/iomem would also be good in order
> >>> > not to allocate them as part of crash kernel's memory.
> >>> >
> >>> > But I'm not still convinced that we should export them in useable-
> >>> > memory-range to crash dump kernel. They will be accessed through
> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >>> > (or memblocks), I guess.
> >>> >     -> Bhupesh?
> >>>
> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >>> than usable memory (which is from the dt node instead) should be
> >>> reinitialized according to efi passed info, no?
> >>
> >> All the regions exported in efi memmap will be added to memblock.memory
> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> usable-memory-range by fdt_enforce_memory_region().
> >>
> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> with multiple entries in usable-memory-range.
> >>
> >
> > In any case, the root of the problem is that memory regions lose their
> > 'memory' annotation due to the way the memory map is mangled before
> > being supplied to the kexec kernel.
> >
> > Would it be possible to classify all memory that we want to hide from
> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> > so this seems to be the most appropriate way to deal with the host
> > kernel's memory contents.
> 
> Hmm. wouldn't appending the acpi reclaim regions to
> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> be better? Because its indirectly achieving a similar objective
> (although may be a subset of all System RAM regions on the primary
> kernel's memory).
> 
> I am not aware of the background about the current kexec-tools
> implementation where we add only the crashkernel range to the dtb
> being passed to the crashkernel.
> 
> Probably Akashi can answer better, as to how we arrived at this design
> approach and why we didn't want to expose all System RAM regions (i.e.
> ! NOMPAP regions) to the crashkernel.
> 
> I am suspecting that some issues were seen/meet when the System RAM (!
> NOMAP regions) were exposed to the crashkernel, and that's why we
> finalized on this design approach, but this is something which is just
> my guess.
> 
> Regards,
> Bhupesh
> 
> >>> >
> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >>> > via a kernel command line parameter, "memmap=".
> >>>
> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >>> e820 table.
> >>
> >> Thanks. I remember that you have explained it before.
> >>
> >> -Takahiro AKASHI
> >>
> >>> [snip]
> >>>
> >>> Thanks
> >>> Dave

===8<==
From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
From: AKASHI Takahiro <takahiro.akashi@linaro.org>
Date: Thu, 21 Dec 2017 19:14:23 +0900
Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP

---
 arch/arm64/mm/init.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b900ca41..8175db94257b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
 	struct memblock_region reg = {
 		.size = 0,
 	};
+	u64 idx;
+	phys_addr_t start, end;
 
 	of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
 
-	if (reg.size)
-		memblock_cap_memory_range(reg.base, reg.size);
+	if (reg.size) {
+		for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
+					&start, &end, NULL)
+			memblock_mark_nomap(start, end - start);
+		memblock_clear_nomap(reg.base, reg.size);
+	}
 }
 
 void __init arm64_memblock_init(void)
-- 
2.15.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-21 10:34                                                                       ` AKASHI Takahiro
  (?)
@ 2017-12-21 12:06                                                                           ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-21 12:06 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hello Akashi,

On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Bhupesh,
>
> Can you test the patch attached below, please?
>
> It is intended to retain already-reserved regions (ACPI reclaim memory
> in this case) in system ram (i.e. memblock.memory) without explicitly
> exporting them via usable-memory-range.
> (I still have to figure out what the side-effect of this patch is.)
>
> Thanks,
> -Takahiro AKASHI
>
> On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >>> > > >> > Bhupesh, Ard,
>> >>> > > >> >
>> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> > > >> >> Hi Ard, Akashi
>> >>> > > >> >>
>> >>> > > >> > (snip)
>> >>> > > >> >
>> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> > > >> >> , for details)
>> >>> > > >> >
>> >>> > > >> > Right.
>> >>> > > >> >
>> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> > > >> >> with the crashkernel memory range:
>> >>> > > >> >>
>> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> > > >> >>                                 address_cells, size_cells);
>> >>> > > >> >>
>> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> > > >> >> , for details)
>> >>> > > >> >>
>> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> > > >> >>
>> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >>> > > >> >>
>> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >>> > > >> >>
>> >>> > > >> >> [snip..]
>> >>> > > >> >>
>> >>> > > >> >> Reserved memory range
>> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >>> > > >> >>
>> >>> > > >> >> Coredump memory ranges
>> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >>> > > >> >>
>> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> > > >> >>
>> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >>> > > >> >> {
>> >>> > > >> >>         struct memblock_region reg = {
>> >>> > > >> >>                 .size = 0,
>> >>> > > >> >>         };
>> >>> > > >> >>
>> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> > > >> >>
>> >>> > > >> >>         if (reg.size)
>> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> > > >> >> comment this out */
>> >>> > > >> >> }
>> >>> > > >> >
>> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >>> > > >> > memory contents of the *crashed* kernel.
>> >>> > > >> >
>> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >>> > > >> >>
>> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> > > >> >> fail.
>> >>> > > >> >>
>> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >>> > > >> >
>> >>> > > >> > I still don't understand why we need to carry over the information
>> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >>> > > >> memblock_reserve()'d now.
>> >>> > > >
>> >>> > > > For my better understandings, who is actually accessing such regions
>> >>> > > > during boot time, uefi itself or efistub?
>> >>> > > >
>> >>> > >
>> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >>> > > instance, on QEMU we have
>> >>> > >
>> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>> > >   01000013)
>> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >
>> >>> > > covered by
>> >>> > >
>> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>> > >  ...
>> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>> >
>> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >>> > UEFI boot services.
>> >>> >
>> >>> > >
>> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> > > >> when booting the next kernel.
>> >>> > > >
>> >>> > > > not really.
>> >>> > > >
>> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> > > >> > on crash dump kernel?)
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> > > >> regions only revealed the bug, not created it (given that other
>> >>> > > >> memblock_reserve regions may be affected as well)
>> >>> > > >
>> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >>> > > > exposed to user space (via proc/iomem).
>> >>> > > >
>> >>> > >
>> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >>> > > as 'System RAM'. Do you think that could solve this?
>> >>> >
>> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >>> > marking them under another name in /proc/iomem would also be good in order
>> >>> > not to allocate them as part of crash kernel's memory.
>> >>> >
>> >>> > But I'm not still convinced that we should export them in useable-
>> >>> > memory-range to crash dump kernel. They will be accessed through
>> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >>> > (or memblocks), I guess.
>> >>> >     -> Bhupesh?
>> >>>
>> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >>> than usable memory (which is from the dt node instead) should be
>> >>> reinitialized according to efi passed info, no?
>> >>
>> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> usable-memory-range by fdt_enforce_memory_region().
>> >>
>> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> with multiple entries in usable-memory-range.
>> >>
>> >
>> > In any case, the root of the problem is that memory regions lose their
>> > 'memory' annotation due to the way the memory map is mangled before
>> > being supplied to the kexec kernel.
>> >
>> > Would it be possible to classify all memory that we want to hide from
>> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> > so this seems to be the most appropriate way to deal with the host
>> > kernel's memory contents.
>>
>> Hmm. wouldn't appending the acpi reclaim regions to
>> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> be better? Because its indirectly achieving a similar objective
>> (although may be a subset of all System RAM regions on the primary
>> kernel's memory).
>>
>> I am not aware of the background about the current kexec-tools
>> implementation where we add only the crashkernel range to the dtb
>> being passed to the crashkernel.
>>
>> Probably Akashi can answer better, as to how we arrived at this design
>> approach and why we didn't want to expose all System RAM regions (i.e.
>> ! NOMPAP regions) to the crashkernel.
>>
>> I am suspecting that some issues were seen/meet when the System RAM (!
>> NOMAP regions) were exposed to the crashkernel, and that's why we
>> finalized on this design approach, but this is something which is just
>> my guess.
>>
>> Regards,
>> Bhupesh
>>
>> >>> >
>> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >>> > via a kernel command line parameter, "memmap=".
>> >>>
>> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >>> e820 table.
>> >>
>> >> Thanks. I remember that you have explained it before.
>> >>
>> >> -Takahiro AKASHI
>> >>
>> >>> [snip]
>> >>>
>> >>> Thanks
>> >>> Dave
>
> ===8<==
> From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Date: Thu, 21 Dec 2017 19:14:23 +0900
> Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>
> ---
>  arch/arm64/mm/init.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 00e7b900ca41..8175db94257b 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>         struct memblock_region reg = {
>                 .size = 0,
>         };
> +       u64 idx;
> +       phys_addr_t start, end;
>
>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>
> -       if (reg.size)
> -               memblock_cap_memory_range(reg.base, reg.size);
> +       if (reg.size) {
> +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> +                                       &start, &end, NULL)
> +                       memblock_mark_nomap(start, end - start);
> +               memblock_clear_nomap(reg.base, reg.size);
> +       }
>  }
>
>  void __init arm64_memblock_init(void)
> --
> 2.15.1
>

Thanks for the patch. After applying this on top of
4.15.0-rc4-next-20171220, there seems to be a improvement and the
crashkernel boot no longer hangs while trying to access the acpi
tables.

However I notice a minor issue. Please see the log below for
reference, the following message keeps spamming the console but I see
the crashkernel boot proceed further.:

[    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
[    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
[    0.000000] NUMA: NODE_DATA(1) on node 0
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
[    0.000000] NUMA: NODE_DATA(2) on node 0
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
[    0.000000] NUMA: NODE_DATA(3) on node 0
[    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
page_structs

[snip..]
[    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
page_structs

This WARNING message seems to come from vmemmap_verify() inside
'mm/sparse-vmemmap.c'

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-21 12:06                                                                           ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-21 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Akashi,

On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> Can you test the patch attached below, please?
>
> It is intended to retain already-reserved regions (ACPI reclaim memory
> in this case) in system ram (i.e. memblock.memory) without explicitly
> exporting them via usable-memory-range.
> (I still have to figure out what the side-effect of this patch is.)
>
> Thanks,
> -Takahiro AKASHI
>
> On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> > > <takahiro.akashi@linaro.org> wrote:
>> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> > > >> <takahiro.akashi@linaro.org> wrote:
>> >>> > > >> > Bhupesh, Ard,
>> >>> > > >> >
>> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> > > >> >> Hi Ard, Akashi
>> >>> > > >> >>
>> >>> > > >> > (snip)
>> >>> > > >> >
>> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> > > >> >> , for details)
>> >>> > > >> >
>> >>> > > >> > Right.
>> >>> > > >> >
>> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> > > >> >> with the crashkernel memory range:
>> >>> > > >> >>
>> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> > > >> >>                                 address_cells, size_cells);
>> >>> > > >> >>
>> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> > > >> >> , for details)
>> >>> > > >> >>
>> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> > > >> >>
>> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >>> > > >> >>
>> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >>> > > >> >>
>> >>> > > >> >> [snip..]
>> >>> > > >> >>
>> >>> > > >> >> Reserved memory range
>> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >>> > > >> >>
>> >>> > > >> >> Coredump memory ranges
>> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >>> > > >> >>
>> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> > > >> >>
>> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >>> > > >> >> {
>> >>> > > >> >>         struct memblock_region reg = {
>> >>> > > >> >>                 .size = 0,
>> >>> > > >> >>         };
>> >>> > > >> >>
>> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> > > >> >>
>> >>> > > >> >>         if (reg.size)
>> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> > > >> >> comment this out */
>> >>> > > >> >> }
>> >>> > > >> >
>> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >>> > > >> > memory contents of the *crashed* kernel.
>> >>> > > >> >
>> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >>> > > >> >>
>> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> > > >> >> fail.
>> >>> > > >> >>
>> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >>> > > >> >
>> >>> > > >> > I still don't understand why we need to carry over the information
>> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >>> > > >> memblock_reserve()'d now.
>> >>> > > >
>> >>> > > > For my better understandings, who is actually accessing such regions
>> >>> > > > during boot time, uefi itself or efistub?
>> >>> > > >
>> >>> > >
>> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >>> > > instance, on QEMU we have
>> >>> > >
>> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>> > >   01000013)
>> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >
>> >>> > > covered by
>> >>> > >
>> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>> > >  ...
>> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>> >
>> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >>> > UEFI boot services.
>> >>> >
>> >>> > >
>> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> > > >> when booting the next kernel.
>> >>> > > >
>> >>> > > > not really.
>> >>> > > >
>> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> > > >> > on crash dump kernel?)
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> > > >> regions only revealed the bug, not created it (given that other
>> >>> > > >> memblock_reserve regions may be affected as well)
>> >>> > > >
>> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >>> > > > exposed to user space (via proc/iomem).
>> >>> > > >
>> >>> > >
>> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >>> > > as 'System RAM'. Do you think that could solve this?
>> >>> >
>> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >>> > marking them under another name in /proc/iomem would also be good in order
>> >>> > not to allocate them as part of crash kernel's memory.
>> >>> >
>> >>> > But I'm not still convinced that we should export them in useable-
>> >>> > memory-range to crash dump kernel. They will be accessed through
>> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >>> > (or memblocks), I guess.
>> >>> >     -> Bhupesh?
>> >>>
>> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >>> than usable memory (which is from the dt node instead) should be
>> >>> reinitialized according to efi passed info, no?
>> >>
>> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> usable-memory-range by fdt_enforce_memory_region().
>> >>
>> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> with multiple entries in usable-memory-range.
>> >>
>> >
>> > In any case, the root of the problem is that memory regions lose their
>> > 'memory' annotation due to the way the memory map is mangled before
>> > being supplied to the kexec kernel.
>> >
>> > Would it be possible to classify all memory that we want to hide from
>> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> > so this seems to be the most appropriate way to deal with the host
>> > kernel's memory contents.
>>
>> Hmm. wouldn't appending the acpi reclaim regions to
>> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> be better? Because its indirectly achieving a similar objective
>> (although may be a subset of all System RAM regions on the primary
>> kernel's memory).
>>
>> I am not aware of the background about the current kexec-tools
>> implementation where we add only the crashkernel range to the dtb
>> being passed to the crashkernel.
>>
>> Probably Akashi can answer better, as to how we arrived at this design
>> approach and why we didn't want to expose all System RAM regions (i.e.
>> ! NOMPAP regions) to the crashkernel.
>>
>> I am suspecting that some issues were seen/meet when the System RAM (!
>> NOMAP regions) were exposed to the crashkernel, and that's why we
>> finalized on this design approach, but this is something which is just
>> my guess.
>>
>> Regards,
>> Bhupesh
>>
>> >>> >
>> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >>> > via a kernel command line parameter, "memmap=".
>> >>>
>> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >>> e820 table.
>> >>
>> >> Thanks. I remember that you have explained it before.
>> >>
>> >> -Takahiro AKASHI
>> >>
>> >>> [snip]
>> >>>
>> >>> Thanks
>> >>> Dave
>
> ===8<==
> From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Date: Thu, 21 Dec 2017 19:14:23 +0900
> Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>
> ---
>  arch/arm64/mm/init.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 00e7b900ca41..8175db94257b 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>         struct memblock_region reg = {
>                 .size = 0,
>         };
> +       u64 idx;
> +       phys_addr_t start, end;
>
>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>
> -       if (reg.size)
> -               memblock_cap_memory_range(reg.base, reg.size);
> +       if (reg.size) {
> +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> +                                       &start, &end, NULL)
> +                       memblock_mark_nomap(start, end - start);
> +               memblock_clear_nomap(reg.base, reg.size);
> +       }
>  }
>
>  void __init arm64_memblock_init(void)
> --
> 2.15.1
>

Thanks for the patch. After applying this on top of
4.15.0-rc4-next-20171220, there seems to be a improvement and the
crashkernel boot no longer hangs while trying to access the acpi
tables.

However I notice a minor issue. Please see the log below for
reference, the following message keeps spamming the console but I see
the crashkernel boot proceed further.:

[    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
[    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
[    0.000000] NUMA: NODE_DATA(1) on node 0
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
[    0.000000] NUMA: NODE_DATA(2) on node 0
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
[    0.000000] NUMA: NODE_DATA(3) on node 0
[    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
page_structs

[snip..]
[    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
page_structs

This WARNING message seems to come from vmemmap_verify() inside
'mm/sparse-vmemmap.c'

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-21 12:06                                                                           ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-21 12:06 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young,
	Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi,
	Mark Rutland, James Morse, kexec

Hello Akashi,

On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> Can you test the patch attached below, please?
>
> It is intended to retain already-reserved regions (ACPI reclaim memory
> in this case) in system ram (i.e. memblock.memory) without explicitly
> exporting them via usable-memory-range.
> (I still have to figure out what the side-effect of this patch is.)
>
> Thanks,
> -Takahiro AKASHI
>
> On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >>> > > <takahiro.akashi@linaro.org> wrote:
>> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >>> > > >> <takahiro.akashi@linaro.org> wrote:
>> >>> > > >> > Bhupesh, Ard,
>> >>> > > >> >
>> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >>> > > >> >> Hi Ard, Akashi
>> >>> > > >> >>
>> >>> > > >> > (snip)
>> >>> > > >> >
>> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >>> > > >> >> , for details)
>> >>> > > >> >
>> >>> > > >> > Right.
>> >>> > > >> >
>> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >>> > > >> >> with the crashkernel memory range:
>> >>> > > >> >>
>> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >>> > > >> >>                                 address_cells, size_cells);
>> >>> > > >> >>
>> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >>> > > >> >> , for details)
>> >>> > > >> >>
>> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >>> > > >> >>
>> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >>> > > >> >>
>> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >>> > > >> >>
>> >>> > > >> >> [snip..]
>> >>> > > >> >>
>> >>> > > >> >> Reserved memory range
>> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >>> > > >> >>
>> >>> > > >> >> Coredump memory ranges
>> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >>> > > >> >>
>> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >>> > > >> >>
>> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >>> > > >> >> {
>> >>> > > >> >>         struct memblock_region reg = {
>> >>> > > >> >>                 .size = 0,
>> >>> > > >> >>         };
>> >>> > > >> >>
>> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >>> > > >> >>
>> >>> > > >> >>         if (reg.size)
>> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >>> > > >> >> comment this out */
>> >>> > > >> >> }
>> >>> > > >> >
>> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >>> > > >> > memory contents of the *crashed* kernel.
>> >>> > > >> >
>> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >>> > > >> >>
>> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >>> > > >> >> fail.
>> >>> > > >> >>
>> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >>> > > >> >
>> >>> > > >> > I still don't understand why we need to carry over the information
>> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >>> > > >> memblock_reserve()'d now.
>> >>> > > >
>> >>> > > > For my better understandings, who is actually accessing such regions
>> >>> > > > during boot time, uefi itself or efistub?
>> >>> > > >
>> >>> > >
>> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >>> > > instance, on QEMU we have
>> >>> > >
>> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >>> > >   01000013)
>> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >>> > > BXPC 00000001)
>> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >>> > > BXPC 00000001)
>> >>> > >
>> >>> > > covered by
>> >>> > >
>> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >>> > >  ...
>> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >>> >
>> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >>> > UEFI boot services.
>> >>> >
>> >>> > >
>> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >>> > > >> when booting the next kernel.
>> >>> > > >
>> >>> > > > not really.
>> >>> > > >
>> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >>> > > >> > on crash dump kernel?)
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >>> > > >> regions only revealed the bug, not created it (given that other
>> >>> > > >> memblock_reserve regions may be affected as well)
>> >>> > > >
>> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >>> > > > exposed to user space (via proc/iomem).
>> >>> > > >
>> >>> > >
>> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >>> > > as 'System RAM'. Do you think that could solve this?
>> >>> >
>> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >>> > marking them under another name in /proc/iomem would also be good in order
>> >>> > not to allocate them as part of crash kernel's memory.
>> >>> >
>> >>> > But I'm not still convinced that we should export them in useable-
>> >>> > memory-range to crash dump kernel. They will be accessed through
>> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >>> > (or memblocks), I guess.
>> >>> >     -> Bhupesh?
>> >>>
>> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >>> than usable memory (which is from the dt node instead) should be
>> >>> reinitialized according to efi passed info, no?
>> >>
>> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> usable-memory-range by fdt_enforce_memory_region().
>> >>
>> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> with multiple entries in usable-memory-range.
>> >>
>> >
>> > In any case, the root of the problem is that memory regions lose their
>> > 'memory' annotation due to the way the memory map is mangled before
>> > being supplied to the kexec kernel.
>> >
>> > Would it be possible to classify all memory that we want to hide from
>> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> > so this seems to be the most appropriate way to deal with the host
>> > kernel's memory contents.
>>
>> Hmm. wouldn't appending the acpi reclaim regions to
>> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> be better? Because its indirectly achieving a similar objective
>> (although may be a subset of all System RAM regions on the primary
>> kernel's memory).
>>
>> I am not aware of the background about the current kexec-tools
>> implementation where we add only the crashkernel range to the dtb
>> being passed to the crashkernel.
>>
>> Probably Akashi can answer better, as to how we arrived at this design
>> approach and why we didn't want to expose all System RAM regions (i.e.
>> ! NOMPAP regions) to the crashkernel.
>>
>> I am suspecting that some issues were seen/meet when the System RAM (!
>> NOMAP regions) were exposed to the crashkernel, and that's why we
>> finalized on this design approach, but this is something which is just
>> my guess.
>>
>> Regards,
>> Bhupesh
>>
>> >>> >
>> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >>> > via a kernel command line parameter, "memmap=".
>> >>>
>> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >>> e820 table.
>> >>
>> >> Thanks. I remember that you have explained it before.
>> >>
>> >> -Takahiro AKASHI
>> >>
>> >>> [snip]
>> >>>
>> >>> Thanks
>> >>> Dave
>
> ===8<==
> From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Date: Thu, 21 Dec 2017 19:14:23 +0900
> Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>
> ---
>  arch/arm64/mm/init.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 00e7b900ca41..8175db94257b 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>         struct memblock_region reg = {
>                 .size = 0,
>         };
> +       u64 idx;
> +       phys_addr_t start, end;
>
>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>
> -       if (reg.size)
> -               memblock_cap_memory_range(reg.base, reg.size);
> +       if (reg.size) {
> +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> +                                       &start, &end, NULL)
> +                       memblock_mark_nomap(start, end - start);
> +               memblock_clear_nomap(reg.base, reg.size);
> +       }
>  }
>
>  void __init arm64_memblock_init(void)
> --
> 2.15.1
>

Thanks for the patch. After applying this on top of
4.15.0-rc4-next-20171220, there seems to be a improvement and the
crashkernel boot no longer hangs while trying to access the acpi
tables.

However I notice a minor issue. Please see the log below for
reference, the following message keeps spamming the console but I see
the crashkernel boot proceed further.:

[    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
[    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
[    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
[    0.000000] NUMA: NODE_DATA(1) on node 0
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
[    0.000000] NUMA: NODE_DATA(2) on node 0
[    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
[    0.000000] NUMA: NODE_DATA(3) on node 0
[    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
page_structs
[    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
page_structs

[snip..]
[    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
page_structs

This WARNING message seems to come from vmemmap_verify() inside
'mm/sparse-vmemmap.c'

Regards,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-21 12:06                                                                           ` Bhupesh Sharma
  (?)
@ 2017-12-22  8:33                                                                               ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-22  8:33 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> Hello Akashi,
> 
> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > Bhupesh,
> >
> > Can you test the patch attached below, please?
> >
> > It is intended to retain already-reserved regions (ACPI reclaim memory
> > in this case) in system ram (i.e. memblock.memory) without explicitly
> > exporting them via usable-memory-range.
> > (I still have to figure out what the side-effect of this patch is.)
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >>> > > >> > Bhupesh, Ard,
> >> >>> > > >> >
> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> > > >> >> Hi Ard, Akashi
> >> >>> > > >> >>
> >> >>> > > >> > (snip)
> >> >>> > > >> >
> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> > > >> >> , for details)
> >> >>> > > >> >
> >> >>> > > >> > Right.
> >> >>> > > >> >
> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> > > >> >> with the crashkernel memory range:
> >> >>> > > >> >>
> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >>> > > >> >>
> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> > > >> >> , for details)
> >> >>> > > >> >>
> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> > > >> >>
> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >>> > > >> >>
> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >>> > > >> >>
> >> >>> > > >> >> [snip..]
> >> >>> > > >> >>
> >> >>> > > >> >> Reserved memory range
> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> > > >> >>
> >> >>> > > >> >> Coredump memory ranges
> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >>> > > >> >>
> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> > > >> >>
> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> > > >> >> {
> >> >>> > > >> >>         struct memblock_region reg = {
> >> >>> > > >> >>                 .size = 0,
> >> >>> > > >> >>         };
> >> >>> > > >> >>
> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> > > >> >>
> >> >>> > > >> >>         if (reg.size)
> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> > > >> >> comment this out */
> >> >>> > > >> >> }
> >> >>> > > >> >
> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >>> > > >> >
> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> > > >> >>
> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> > > >> >> fail.
> >> >>> > > >> >>
> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >>> > > >> >
> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >>> > > >> memblock_reserve()'d now.
> >> >>> > > >
> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >>> > > > during boot time, uefi itself or efistub?
> >> >>> > > >
> >> >>> > >
> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> > > instance, on QEMU we have
> >> >>> > >
> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>> > >   01000013)
> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >
> >> >>> > > covered by
> >> >>> > >
> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>> > >  ...
> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>> >
> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >>> > UEFI boot services.
> >> >>> >
> >> >>> > >
> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> > > >> when booting the next kernel.
> >> >>> > > >
> >> >>> > > > not really.
> >> >>> > > >
> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> > > >> > on crash dump kernel?)
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >>> > > >
> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > > > exposed to user space (via proc/iomem).
> >> >>> > > >
> >> >>> > >
> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >>> >
> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >>> >
> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >>> > (or memblocks), I guess.
> >> >>> >     -> Bhupesh?
> >> >>>
> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >>> than usable memory (which is from the dt node instead) should be
> >> >>> reinitialized according to efi passed info, no?
> >> >>
> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >>
> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> with multiple entries in usable-memory-range.
> >> >>
> >> >
> >> > In any case, the root of the problem is that memory regions lose their
> >> > 'memory' annotation due to the way the memory map is mangled before
> >> > being supplied to the kexec kernel.
> >> >
> >> > Would it be possible to classify all memory that we want to hide from
> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> > so this seems to be the most appropriate way to deal with the host
> >> > kernel's memory contents.
> >>
> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> be better? Because its indirectly achieving a similar objective
> >> (although may be a subset of all System RAM regions on the primary
> >> kernel's memory).
> >>
> >> I am not aware of the background about the current kexec-tools
> >> implementation where we add only the crashkernel range to the dtb
> >> being passed to the crashkernel.
> >>
> >> Probably Akashi can answer better, as to how we arrived at this design
> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> ! NOMPAP regions) to the crashkernel.
> >>
> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> finalized on this design approach, but this is something which is just
> >> my guess.
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >>> >
> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >>> > via a kernel command line parameter, "memmap=".
> >> >>>
> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >>> e820 table.
> >> >>
> >> >> Thanks. I remember that you have explained it before.
> >> >>
> >> >> -Takahiro AKASHI
> >> >>
> >> >>> [snip]
> >> >>>
> >> >>> Thanks
> >> >>> Dave
> >
> > ===8<==
> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >
> > ---
> >  arch/arm64/mm/init.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 00e7b900ca41..8175db94257b 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >         struct memblock_region reg = {
> >                 .size = 0,
> >         };
> > +       u64 idx;
> > +       phys_addr_t start, end;
> >
> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >
> > -       if (reg.size)
> > -               memblock_cap_memory_range(reg.base, reg.size);
> > +       if (reg.size) {
> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> > +                                       &start, &end, NULL)
> > +                       memblock_mark_nomap(start, end - start);
> > +               memblock_clear_nomap(reg.base, reg.size);
> > +       }
> >  }
> >
> >  void __init arm64_memblock_init(void)
> > --
> > 2.15.1
> >
> 
> Thanks for the patch. After applying this on top of
> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> crashkernel boot no longer hangs while trying to access the acpi
> tables.
> 
> However I notice a minor issue. Please see the log below for
> reference, the following message keeps spamming the console but I see
> the crashkernel boot proceed further.:
> 
> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> [    0.000000] NUMA: NODE_DATA(1) on node 0
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> [    0.000000] NUMA: NODE_DATA(2) on node 0
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> [    0.000000] NUMA: NODE_DATA(3) on node 0
> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> page_structs
> 
> [snip..]
> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> page_structs

These messages shows that some "struct page" data are allocated on remote
(numa) nodes.
Since on your crash dump kernel, all the usable system memory (starting
0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.

In my best guess, you can ingore them except for some performance penality.
This may be one side-effect.

So does your crash dump kernel now boot successfully?

Thanks,
-Takahiro AKASHI

> This WARNING message seems to come from vmemmap_verify() inside
> 'mm/sparse-vmemmap.c'
> 
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-22  8:33                                                                               ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-22  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> Hello Akashi,
> 
> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > Bhupesh,
> >
> > Can you test the patch attached below, please?
> >
> > It is intended to retain already-reserved regions (ACPI reclaim memory
> > in this case) in system ram (i.e. memblock.memory) without explicitly
> > exporting them via usable-memory-range.
> > (I still have to figure out what the side-effect of this patch is.)
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> > > <takahiro.akashi@linaro.org> wrote:
> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >> >>> > > >> > Bhupesh, Ard,
> >> >>> > > >> >
> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> > > >> >> Hi Ard, Akashi
> >> >>> > > >> >>
> >> >>> > > >> > (snip)
> >> >>> > > >> >
> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> > > >> >> , for details)
> >> >>> > > >> >
> >> >>> > > >> > Right.
> >> >>> > > >> >
> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> > > >> >> with the crashkernel memory range:
> >> >>> > > >> >>
> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >>> > > >> >>
> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> > > >> >> , for details)
> >> >>> > > >> >>
> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> > > >> >>
> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >>> > > >> >>
> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >>> > > >> >>
> >> >>> > > >> >> [snip..]
> >> >>> > > >> >>
> >> >>> > > >> >> Reserved memory range
> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> > > >> >>
> >> >>> > > >> >> Coredump memory ranges
> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >>> > > >> >>
> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> > > >> >>
> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> > > >> >> {
> >> >>> > > >> >>         struct memblock_region reg = {
> >> >>> > > >> >>                 .size = 0,
> >> >>> > > >> >>         };
> >> >>> > > >> >>
> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> > > >> >>
> >> >>> > > >> >>         if (reg.size)
> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> > > >> >> comment this out */
> >> >>> > > >> >> }
> >> >>> > > >> >
> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >>> > > >> >
> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> > > >> >>
> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> > > >> >> fail.
> >> >>> > > >> >>
> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >>> > > >> >
> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >>> > > >> memblock_reserve()'d now.
> >> >>> > > >
> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >>> > > > during boot time, uefi itself or efistub?
> >> >>> > > >
> >> >>> > >
> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> > > instance, on QEMU we have
> >> >>> > >
> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>> > >   01000013)
> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >
> >> >>> > > covered by
> >> >>> > >
> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>> > >  ...
> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>> >
> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >>> > UEFI boot services.
> >> >>> >
> >> >>> > >
> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> > > >> when booting the next kernel.
> >> >>> > > >
> >> >>> > > > not really.
> >> >>> > > >
> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> > > >> > on crash dump kernel?)
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >>> > > >
> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > > > exposed to user space (via proc/iomem).
> >> >>> > > >
> >> >>> > >
> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >>> >
> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >>> >
> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >>> > (or memblocks), I guess.
> >> >>> >     -> Bhupesh?
> >> >>>
> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >>> than usable memory (which is from the dt node instead) should be
> >> >>> reinitialized according to efi passed info, no?
> >> >>
> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >>
> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> with multiple entries in usable-memory-range.
> >> >>
> >> >
> >> > In any case, the root of the problem is that memory regions lose their
> >> > 'memory' annotation due to the way the memory map is mangled before
> >> > being supplied to the kexec kernel.
> >> >
> >> > Would it be possible to classify all memory that we want to hide from
> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> > so this seems to be the most appropriate way to deal with the host
> >> > kernel's memory contents.
> >>
> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> be better? Because its indirectly achieving a similar objective
> >> (although may be a subset of all System RAM regions on the primary
> >> kernel's memory).
> >>
> >> I am not aware of the background about the current kexec-tools
> >> implementation where we add only the crashkernel range to the dtb
> >> being passed to the crashkernel.
> >>
> >> Probably Akashi can answer better, as to how we arrived at this design
> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> ! NOMPAP regions) to the crashkernel.
> >>
> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> finalized on this design approach, but this is something which is just
> >> my guess.
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >>> >
> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >>> > via a kernel command line parameter, "memmap=".
> >> >>>
> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >>> e820 table.
> >> >>
> >> >> Thanks. I remember that you have explained it before.
> >> >>
> >> >> -Takahiro AKASHI
> >> >>
> >> >>> [snip]
> >> >>>
> >> >>> Thanks
> >> >>> Dave
> >
> > ===8<==
> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >
> > ---
> >  arch/arm64/mm/init.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 00e7b900ca41..8175db94257b 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >         struct memblock_region reg = {
> >                 .size = 0,
> >         };
> > +       u64 idx;
> > +       phys_addr_t start, end;
> >
> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >
> > -       if (reg.size)
> > -               memblock_cap_memory_range(reg.base, reg.size);
> > +       if (reg.size) {
> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> > +                                       &start, &end, NULL)
> > +                       memblock_mark_nomap(start, end - start);
> > +               memblock_clear_nomap(reg.base, reg.size);
> > +       }
> >  }
> >
> >  void __init arm64_memblock_init(void)
> > --
> > 2.15.1
> >
> 
> Thanks for the patch. After applying this on top of
> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> crashkernel boot no longer hangs while trying to access the acpi
> tables.
> 
> However I notice a minor issue. Please see the log below for
> reference, the following message keeps spamming the console but I see
> the crashkernel boot proceed further.:
> 
> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> [    0.000000] NUMA: NODE_DATA(1) on node 0
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> [    0.000000] NUMA: NODE_DATA(2) on node 0
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> [    0.000000] NUMA: NODE_DATA(3) on node 0
> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> page_structs
> 
> [snip..]
> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> page_structs

These messages shows that some "struct page" data are allocated on remote
(numa) nodes.
Since on your crash dump kernel, all the usable system memory (starting
0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.

In my best guess, you can ingore them except for some performance penality.
This may be one side-effect.

So does your crash dump kernel now boot successfully?

Thanks,
-Takahiro AKASHI

> This WARNING message seems to come from vmemmap_verify() inside
> 'mm/sparse-vmemmap.c'
> 
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-22  8:33                                                                               ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-22  8:33 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel

On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> Hello Akashi,
> 
> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > Bhupesh,
> >
> > Can you test the patch attached below, please?
> >
> > It is intended to retain already-reserved regions (ACPI reclaim memory
> > in this case) in system ram (i.e. memblock.memory) without explicitly
> > exporting them via usable-memory-range.
> > (I still have to figure out what the side-effect of this patch is.)
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >>> > > <takahiro.akashi@linaro.org> wrote:
> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >> >>> > > >> > Bhupesh, Ard,
> >> >>> > > >> >
> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >>> > > >> >> Hi Ard, Akashi
> >> >>> > > >> >>
> >> >>> > > >> > (snip)
> >> >>> > > >> >
> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >>> > > >> >> , for details)
> >> >>> > > >> >
> >> >>> > > >> > Right.
> >> >>> > > >> >
> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >>> > > >> >> with the crashkernel memory range:
> >> >>> > > >> >>
> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >>> > > >> >>
> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >>> > > >> >> , for details)
> >> >>> > > >> >>
> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >>> > > >> >>
> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >>> > > >> >>
> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >>> > > >> >>
> >> >>> > > >> >> [snip..]
> >> >>> > > >> >>
> >> >>> > > >> >> Reserved memory range
> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >>> > > >> >>
> >> >>> > > >> >> Coredump memory ranges
> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >>> > > >> >>
> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >>> > > >> >>
> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >>> > > >> >> {
> >> >>> > > >> >>         struct memblock_region reg = {
> >> >>> > > >> >>                 .size = 0,
> >> >>> > > >> >>         };
> >> >>> > > >> >>
> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >>> > > >> >>
> >> >>> > > >> >>         if (reg.size)
> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >>> > > >> >> comment this out */
> >> >>> > > >> >> }
> >> >>> > > >> >
> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >>> > > >> >
> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >>> > > >> >>
> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >>> > > >> >> fail.
> >> >>> > > >> >>
> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >>> > > >> >
> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >>> > > >> memblock_reserve()'d now.
> >> >>> > > >
> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >>> > > > during boot time, uefi itself or efistub?
> >> >>> > > >
> >> >>> > >
> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >>> > > instance, on QEMU we have
> >> >>> > >
> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >>> > >   01000013)
> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >>> > > BXPC 00000001)
> >> >>> > >
> >> >>> > > covered by
> >> >>> > >
> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >>> > >  ...
> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >>> >
> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >>> > UEFI boot services.
> >> >>> >
> >> >>> > >
> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >>> > > >> when booting the next kernel.
> >> >>> > > >
> >> >>> > > > not really.
> >> >>> > > >
> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >>> > > >> > on crash dump kernel?)
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >>> > > >
> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >>> > > > exposed to user space (via proc/iomem).
> >> >>> > > >
> >> >>> > >
> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >>> >
> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >>> >
> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >>> > (or memblocks), I guess.
> >> >>> >     -> Bhupesh?
> >> >>>
> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >>> than usable memory (which is from the dt node instead) should be
> >> >>> reinitialized according to efi passed info, no?
> >> >>
> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >>
> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> with multiple entries in usable-memory-range.
> >> >>
> >> >
> >> > In any case, the root of the problem is that memory regions lose their
> >> > 'memory' annotation due to the way the memory map is mangled before
> >> > being supplied to the kexec kernel.
> >> >
> >> > Would it be possible to classify all memory that we want to hide from
> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> > so this seems to be the most appropriate way to deal with the host
> >> > kernel's memory contents.
> >>
> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> be better? Because its indirectly achieving a similar objective
> >> (although may be a subset of all System RAM regions on the primary
> >> kernel's memory).
> >>
> >> I am not aware of the background about the current kexec-tools
> >> implementation where we add only the crashkernel range to the dtb
> >> being passed to the crashkernel.
> >>
> >> Probably Akashi can answer better, as to how we arrived at this design
> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> ! NOMPAP regions) to the crashkernel.
> >>
> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> finalized on this design approach, but this is something which is just
> >> my guess.
> >>
> >> Regards,
> >> Bhupesh
> >>
> >> >>> >
> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >>> > via a kernel command line parameter, "memmap=".
> >> >>>
> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >>> e820 table.
> >> >>
> >> >> Thanks. I remember that you have explained it before.
> >> >>
> >> >> -Takahiro AKASHI
> >> >>
> >> >>> [snip]
> >> >>>
> >> >>> Thanks
> >> >>> Dave
> >
> > ===8<==
> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >
> > ---
> >  arch/arm64/mm/init.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 00e7b900ca41..8175db94257b 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >         struct memblock_region reg = {
> >                 .size = 0,
> >         };
> > +       u64 idx;
> > +       phys_addr_t start, end;
> >
> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >
> > -       if (reg.size)
> > -               memblock_cap_memory_range(reg.base, reg.size);
> > +       if (reg.size) {
> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> > +                                       &start, &end, NULL)
> > +                       memblock_mark_nomap(start, end - start);
> > +               memblock_clear_nomap(reg.base, reg.size);
> > +       }
> >  }
> >
> >  void __init arm64_memblock_init(void)
> > --
> > 2.15.1
> >
> 
> Thanks for the patch. After applying this on top of
> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> crashkernel boot no longer hangs while trying to access the acpi
> tables.
> 
> However I notice a minor issue. Please see the log below for
> reference, the following message keeps spamming the console but I see
> the crashkernel boot proceed further.:
> 
> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> [    0.000000] NUMA: NODE_DATA(1) on node 0
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> [    0.000000] NUMA: NODE_DATA(2) on node 0
> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> [    0.000000] NUMA: NODE_DATA(3) on node 0
> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> page_structs
> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> page_structs
> 
> [snip..]
> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> page_structs

These messages shows that some "struct page" data are allocated on remote
(numa) nodes.
Since on your crash dump kernel, all the usable system memory (starting
0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.

In my best guess, you can ingore them except for some performance penality.
This may be one side-effect.

So does your crash dump kernel now boot successfully?

Thanks,
-Takahiro AKASHI

> This WARNING message seems to come from vmemmap_verify() inside
> 'mm/sparse-vmemmap.c'
> 
> Regards,
> Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-22  8:33                                                                               ` AKASHI Takahiro
  (?)
@ 2017-12-23 19:51                                                                                 ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-23 19:51 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
>> Hello Akashi,
>>
>> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > Bhupesh,
>> >
>> > Can you test the patch attached below, please?
>> >
>> > It is intended to retain already-reserved regions (ACPI reclaim memory
>> > in this case) in system ram (i.e. memblock.memory) without explicitly
>> > exporting them via usable-memory-range.
>> > (I still have to figure out what the side-effect of this patch is.)
>> >
>> > Thanks,
>> > -Takahiro AKASHI
>> >
>> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >>> > > >> > Bhupesh, Ard,
>> >> >>> > > >> >
>> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >>> > > >> >> Hi Ard, Akashi
>> >> >>> > > >> >>
>> >> >>> > > >> > (snip)
>> >> >>> > > >> >
>> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >>> > > >> >> , for details)
>> >> >>> > > >> >
>> >> >>> > > >> > Right.
>> >> >>> > > >> >
>> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >>> > > >> >> with the crashkernel memory range:
>> >> >>> > > >> >>
>> >> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >>> > > >> >>                                 address_cells, size_cells);
>> >> >>> > > >> >>
>> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >>> > > >> >> , for details)
>> >> >>> > > >> >>
>> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >>> > > >> >>
>> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >> >>> > > >> >>
>> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >> >>> > > >> >>
>> >> >>> > > >> >> [snip..]
>> >> >>> > > >> >>
>> >> >>> > > >> >> Reserved memory range
>> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >> >>> > > >> >>
>> >> >>> > > >> >> Coredump memory ranges
>> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >> >>> > > >> >>
>> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >>> > > >> >>
>> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >> >>> > > >> >> {
>> >> >>> > > >> >>         struct memblock_region reg = {
>> >> >>> > > >> >>                 .size = 0,
>> >> >>> > > >> >>         };
>> >> >>> > > >> >>
>> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >>> > > >> >>
>> >> >>> > > >> >>         if (reg.size)
>> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >>> > > >> >> comment this out */
>> >> >>> > > >> >> }
>> >> >>> > > >> >
>> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >> >>> > > >> > memory contents of the *crashed* kernel.
>> >> >>> > > >> >
>> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >> >>> > > >> >>
>> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >>> > > >> >> fail.
>> >> >>> > > >> >>
>> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >> >>> > > >> >
>> >> >>> > > >> > I still don't understand why we need to carry over the information
>> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >> >>> > > >> >
>> >> >>> > > >>
>> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >> >>> > > >> memblock_reserve()'d now.
>> >> >>> > > >
>> >> >>> > > > For my better understandings, who is actually accessing such regions
>> >> >>> > > > during boot time, uefi itself or efistub?
>> >> >>> > > >
>> >> >>> > >
>> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >> >>> > > instance, on QEMU we have
>> >> >>> > >
>> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> >>> > >   01000013)
>> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >
>> >> >>> > > covered by
>> >> >>> > >
>> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> >>> > >  ...
>> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> >>> >
>> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >> >>> > UEFI boot services.
>> >> >>> >
>> >> >>> > >
>> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >> >>> > > >> when booting the next kernel.
>> >> >>> > > >
>> >> >>> > > > not really.
>> >> >>> > > >
>> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> >>> > > >> > on crash dump kernel?)
>> >> >>> > > >> >
>> >> >>> > > >>
>> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >> >>> > > >> regions only revealed the bug, not created it (given that other
>> >> >>> > > >> memblock_reserve regions may be affected as well)
>> >> >>> > > >
>> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >> >>> > > > exposed to user space (via proc/iomem).
>> >> >>> > > >
>> >> >>> > >
>> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >> >>> > > as 'System RAM'. Do you think that could solve this?
>> >> >>> >
>> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> >>> > marking them under another name in /proc/iomem would also be good in order
>> >> >>> > not to allocate them as part of crash kernel's memory.
>> >> >>> >
>> >> >>> > But I'm not still convinced that we should export them in useable-
>> >> >>> > memory-range to crash dump kernel. They will be accessed through
>> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >> >>> > (or memblocks), I guess.
>> >> >>> >     -> Bhupesh?
>> >> >>>
>> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >> >>> than usable memory (which is from the dt node instead) should be
>> >> >>> reinitialized according to efi passed info, no?
>> >> >>
>> >> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> >> usable-memory-range by fdt_enforce_memory_region().
>> >> >>
>> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> >> with multiple entries in usable-memory-range.
>> >> >>
>> >> >
>> >> > In any case, the root of the problem is that memory regions lose their
>> >> > 'memory' annotation due to the way the memory map is mangled before
>> >> > being supplied to the kexec kernel.
>> >> >
>> >> > Would it be possible to classify all memory that we want to hide from
>> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> >> > so this seems to be the most appropriate way to deal with the host
>> >> > kernel's memory contents.
>> >>
>> >> Hmm. wouldn't appending the acpi reclaim regions to
>> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> >> be better? Because its indirectly achieving a similar objective
>> >> (although may be a subset of all System RAM regions on the primary
>> >> kernel's memory).
>> >>
>> >> I am not aware of the background about the current kexec-tools
>> >> implementation where we add only the crashkernel range to the dtb
>> >> being passed to the crashkernel.
>> >>
>> >> Probably Akashi can answer better, as to how we arrived at this design
>> >> approach and why we didn't want to expose all System RAM regions (i.e.
>> >> ! NOMPAP regions) to the crashkernel.
>> >>
>> >> I am suspecting that some issues were seen/meet when the System RAM (!
>> >> NOMAP regions) were exposed to the crashkernel, and that's why we
>> >> finalized on this design approach, but this is something which is just
>> >> my guess.
>> >>
>> >> Regards,
>> >> Bhupesh
>> >>
>> >> >>> >
>> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> >>> > via a kernel command line parameter, "memmap=".
>> >> >>>
>> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >> >>> e820 table.
>> >> >>
>> >> >> Thanks. I remember that you have explained it before.
>> >> >>
>> >> >> -Takahiro AKASHI
>> >> >>
>> >> >>> [snip]
>> >> >>>
>> >> >>> Thanks
>> >> >>> Dave
>> >
>> > ===8<==
>> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
>> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> > Date: Thu, 21 Dec 2017 19:14:23 +0900
>> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>> >
>> > ---
>> >  arch/arm64/mm/init.c | 10 ++++++++--
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> > index 00e7b900ca41..8175db94257b 100644
>> > --- a/arch/arm64/mm/init.c
>> > +++ b/arch/arm64/mm/init.c
>> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>> >         struct memblock_region reg = {
>> >                 .size = 0,
>> >         };
>> > +       u64 idx;
>> > +       phys_addr_t start, end;
>> >
>> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >
>> > -       if (reg.size)
>> > -               memblock_cap_memory_range(reg.base, reg.size);
>> > +       if (reg.size) {
>> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
>> > +                                       &start, &end, NULL)
>> > +                       memblock_mark_nomap(start, end - start);
>> > +               memblock_clear_nomap(reg.base, reg.size);
>> > +       }
>> >  }
>> >
>> >  void __init arm64_memblock_init(void)
>> > --
>> > 2.15.1
>> >
>>
>> Thanks for the patch. After applying this on top of
>> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
>> crashkernel boot no longer hangs while trying to access the acpi
>> tables.
>>
>> However I notice a minor issue. Please see the log below for
>> reference, the following message keeps spamming the console but I see
>> the crashkernel boot proceed further.:
>>
>> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
>> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
>> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
>> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
>> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
>> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
>> [    0.000000] NUMA: NODE_DATA(1) on node 0
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
>> [    0.000000] NUMA: NODE_DATA(2) on node 0
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
>> [    0.000000] NUMA: NODE_DATA(3) on node 0
>> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
>> page_structs
>>
>> [snip..]
>> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
>> page_structs
>
> These messages shows that some "struct page" data are allocated on remote
> (numa) nodes.
> Since on your crash dump kernel, all the usable system memory (starting
> 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
>
> In my best guess, you can ingore them except for some performance penality.
> This may be one side-effect.
>
> So does your crash dump kernel now boot successfully?
>

Indeed. The crash dump kernel now boots successfully and the crash
dump core can be saved properly as well (I tried saving it to local
disk).

However, the 'potential offnode page_structs' WARN messages hog the
console and delay crashkernel boot for a significant duration, which
can be irritating.

Can we also consider ratelimiting this WARNING message [which seems to
come from vmemmap_verify()] if invoked in the context of crash kernel,
in addition to making the above change suggested by  you.

Thanks for the help.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-23 19:51                                                                                 ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-23 19:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
>> Hello Akashi,
>>
>> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > Bhupesh,
>> >
>> > Can you test the patch attached below, please?
>> >
>> > It is intended to retain already-reserved regions (ACPI reclaim memory
>> > in this case) in system ram (i.e. memblock.memory) without explicitly
>> > exporting them via usable-memory-range.
>> > (I still have to figure out what the side-effect of this patch is.)
>> >
>> > Thanks,
>> > -Takahiro AKASHI
>> >
>> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> >> <ard.biesheuvel@linaro.org> wrote:
>> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> >> > <takahiro.akashi@linaro.org> wrote:
>> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> >>> > > <takahiro.akashi@linaro.org> wrote:
>> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
>> >> >>> > > >> > Bhupesh, Ard,
>> >> >>> > > >> >
>> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >>> > > >> >> Hi Ard, Akashi
>> >> >>> > > >> >>
>> >> >>> > > >> > (snip)
>> >> >>> > > >> >
>> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >>> > > >> >> , for details)
>> >> >>> > > >> >
>> >> >>> > > >> > Right.
>> >> >>> > > >> >
>> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >>> > > >> >> with the crashkernel memory range:
>> >> >>> > > >> >>
>> >> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >>> > > >> >>                                 address_cells, size_cells);
>> >> >>> > > >> >>
>> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >>> > > >> >> , for details)
>> >> >>> > > >> >>
>> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >>> > > >> >>
>> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >> >>> > > >> >>
>> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >> >>> > > >> >>
>> >> >>> > > >> >> [snip..]
>> >> >>> > > >> >>
>> >> >>> > > >> >> Reserved memory range
>> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >> >>> > > >> >>
>> >> >>> > > >> >> Coredump memory ranges
>> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >> >>> > > >> >>
>> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >>> > > >> >>
>> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >> >>> > > >> >> {
>> >> >>> > > >> >>         struct memblock_region reg = {
>> >> >>> > > >> >>                 .size = 0,
>> >> >>> > > >> >>         };
>> >> >>> > > >> >>
>> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >>> > > >> >>
>> >> >>> > > >> >>         if (reg.size)
>> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >>> > > >> >> comment this out */
>> >> >>> > > >> >> }
>> >> >>> > > >> >
>> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >> >>> > > >> > memory contents of the *crashed* kernel.
>> >> >>> > > >> >
>> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >> >>> > > >> >>
>> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >>> > > >> >> fail.
>> >> >>> > > >> >>
>> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >> >>> > > >> >
>> >> >>> > > >> > I still don't understand why we need to carry over the information
>> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >> >>> > > >> >
>> >> >>> > > >>
>> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >> >>> > > >> memblock_reserve()'d now.
>> >> >>> > > >
>> >> >>> > > > For my better understandings, who is actually accessing such regions
>> >> >>> > > > during boot time, uefi itself or efistub?
>> >> >>> > > >
>> >> >>> > >
>> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >> >>> > > instance, on QEMU we have
>> >> >>> > >
>> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> >>> > >   01000013)
>> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >
>> >> >>> > > covered by
>> >> >>> > >
>> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> >>> > >  ...
>> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> >>> >
>> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >> >>> > UEFI boot services.
>> >> >>> >
>> >> >>> > >
>> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >> >>> > > >> when booting the next kernel.
>> >> >>> > > >
>> >> >>> > > > not really.
>> >> >>> > > >
>> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> >>> > > >> > on crash dump kernel?)
>> >> >>> > > >> >
>> >> >>> > > >>
>> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >> >>> > > >> regions only revealed the bug, not created it (given that other
>> >> >>> > > >> memblock_reserve regions may be affected as well)
>> >> >>> > > >
>> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >> >>> > > > exposed to user space (via proc/iomem).
>> >> >>> > > >
>> >> >>> > >
>> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >> >>> > > as 'System RAM'. Do you think that could solve this?
>> >> >>> >
>> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> >>> > marking them under another name in /proc/iomem would also be good in order
>> >> >>> > not to allocate them as part of crash kernel's memory.
>> >> >>> >
>> >> >>> > But I'm not still convinced that we should export them in useable-
>> >> >>> > memory-range to crash dump kernel. They will be accessed through
>> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >> >>> > (or memblocks), I guess.
>> >> >>> >     -> Bhupesh?
>> >> >>>
>> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >> >>> than usable memory (which is from the dt node instead) should be
>> >> >>> reinitialized according to efi passed info, no?
>> >> >>
>> >> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> >> usable-memory-range by fdt_enforce_memory_region().
>> >> >>
>> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> >> with multiple entries in usable-memory-range.
>> >> >>
>> >> >
>> >> > In any case, the root of the problem is that memory regions lose their
>> >> > 'memory' annotation due to the way the memory map is mangled before
>> >> > being supplied to the kexec kernel.
>> >> >
>> >> > Would it be possible to classify all memory that we want to hide from
>> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> >> > so this seems to be the most appropriate way to deal with the host
>> >> > kernel's memory contents.
>> >>
>> >> Hmm. wouldn't appending the acpi reclaim regions to
>> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> >> be better? Because its indirectly achieving a similar objective
>> >> (although may be a subset of all System RAM regions on the primary
>> >> kernel's memory).
>> >>
>> >> I am not aware of the background about the current kexec-tools
>> >> implementation where we add only the crashkernel range to the dtb
>> >> being passed to the crashkernel.
>> >>
>> >> Probably Akashi can answer better, as to how we arrived at this design
>> >> approach and why we didn't want to expose all System RAM regions (i.e.
>> >> ! NOMPAP regions) to the crashkernel.
>> >>
>> >> I am suspecting that some issues were seen/meet when the System RAM (!
>> >> NOMAP regions) were exposed to the crashkernel, and that's why we
>> >> finalized on this design approach, but this is something which is just
>> >> my guess.
>> >>
>> >> Regards,
>> >> Bhupesh
>> >>
>> >> >>> >
>> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> >>> > via a kernel command line parameter, "memmap=".
>> >> >>>
>> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >> >>> e820 table.
>> >> >>
>> >> >> Thanks. I remember that you have explained it before.
>> >> >>
>> >> >> -Takahiro AKASHI
>> >> >>
>> >> >>> [snip]
>> >> >>>
>> >> >>> Thanks
>> >> >>> Dave
>> >
>> > ===8<==
>> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
>> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> > Date: Thu, 21 Dec 2017 19:14:23 +0900
>> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>> >
>> > ---
>> >  arch/arm64/mm/init.c | 10 ++++++++--
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> > index 00e7b900ca41..8175db94257b 100644
>> > --- a/arch/arm64/mm/init.c
>> > +++ b/arch/arm64/mm/init.c
>> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>> >         struct memblock_region reg = {
>> >                 .size = 0,
>> >         };
>> > +       u64 idx;
>> > +       phys_addr_t start, end;
>> >
>> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >
>> > -       if (reg.size)
>> > -               memblock_cap_memory_range(reg.base, reg.size);
>> > +       if (reg.size) {
>> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
>> > +                                       &start, &end, NULL)
>> > +                       memblock_mark_nomap(start, end - start);
>> > +               memblock_clear_nomap(reg.base, reg.size);
>> > +       }
>> >  }
>> >
>> >  void __init arm64_memblock_init(void)
>> > --
>> > 2.15.1
>> >
>>
>> Thanks for the patch. After applying this on top of
>> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
>> crashkernel boot no longer hangs while trying to access the acpi
>> tables.
>>
>> However I notice a minor issue. Please see the log below for
>> reference, the following message keeps spamming the console but I see
>> the crashkernel boot proceed further.:
>>
>> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
>> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
>> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
>> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
>> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
>> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
>> [    0.000000] NUMA: NODE_DATA(1) on node 0
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
>> [    0.000000] NUMA: NODE_DATA(2) on node 0
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
>> [    0.000000] NUMA: NODE_DATA(3) on node 0
>> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
>> page_structs
>>
>> [snip..]
>> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
>> page_structs
>
> These messages shows that some "struct page" data are allocated on remote
> (numa) nodes.
> Since on your crash dump kernel, all the usable system memory (starting
> 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
>
> In my best guess, you can ingore them except for some performance penality.
> This may be one side-effect.
>
> So does your crash dump kernel now boot successfully?
>

Indeed. The crash dump kernel now boots successfully and the crash
dump core can be saved properly as well (I tried saving it to local
disk).

However, the 'potential offnode page_structs' WARN messages hog the
console and delay crashkernel boot for a significant duration, which
can be irritating.

Can we also consider ratelimiting this WARNING message [which seems to
come from vmemmap_verify()] if invoked in the context of crash kernel,
in addition to making the above change suggested by  you.

Thanks for the help.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-23 19:51                                                                                 ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-23 19:51 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young,
	Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi,
	Mark Rutland, James Morse, kexec

On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
>> Hello Akashi,
>>
>> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > Bhupesh,
>> >
>> > Can you test the patch attached below, please?
>> >
>> > It is intended to retain already-reserved regions (ACPI reclaim memory
>> > in this case) in system ram (i.e. memblock.memory) without explicitly
>> > exporting them via usable-memory-range.
>> > (I still have to figure out what the side-effect of this patch is.)
>> >
>> > Thanks,
>> > -Takahiro AKASHI
>> >
>> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> >> <ard.biesheuvel@linaro.org> wrote:
>> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> >> > <takahiro.akashi@linaro.org> wrote:
>> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> >>> > > <takahiro.akashi@linaro.org> wrote:
>> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
>> >> >>> > > >> > Bhupesh, Ard,
>> >> >>> > > >> >
>> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >>> > > >> >> Hi Ard, Akashi
>> >> >>> > > >> >>
>> >> >>> > > >> > (snip)
>> >> >>> > > >> >
>> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >>> > > >> >> , for details)
>> >> >>> > > >> >
>> >> >>> > > >> > Right.
>> >> >>> > > >> >
>> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >>> > > >> >> with the crashkernel memory range:
>> >> >>> > > >> >>
>> >> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >>> > > >> >>                                 address_cells, size_cells);
>> >> >>> > > >> >>
>> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >>> > > >> >> , for details)
>> >> >>> > > >> >>
>> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >>> > > >> >>
>> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >> >>> > > >> >>
>> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >> >>> > > >> >>
>> >> >>> > > >> >> [snip..]
>> >> >>> > > >> >>
>> >> >>> > > >> >> Reserved memory range
>> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >> >>> > > >> >>
>> >> >>> > > >> >> Coredump memory ranges
>> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >> >>> > > >> >>
>> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >>> > > >> >>
>> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >> >>> > > >> >> {
>> >> >>> > > >> >>         struct memblock_region reg = {
>> >> >>> > > >> >>                 .size = 0,
>> >> >>> > > >> >>         };
>> >> >>> > > >> >>
>> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >>> > > >> >>
>> >> >>> > > >> >>         if (reg.size)
>> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >>> > > >> >> comment this out */
>> >> >>> > > >> >> }
>> >> >>> > > >> >
>> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >> >>> > > >> > memory contents of the *crashed* kernel.
>> >> >>> > > >> >
>> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >> >>> > > >> >>
>> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >>> > > >> >> fail.
>> >> >>> > > >> >>
>> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >> >>> > > >> >
>> >> >>> > > >> > I still don't understand why we need to carry over the information
>> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >> >>> > > >> >
>> >> >>> > > >>
>> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >> >>> > > >> memblock_reserve()'d now.
>> >> >>> > > >
>> >> >>> > > > For my better understandings, who is actually accessing such regions
>> >> >>> > > > during boot time, uefi itself or efistub?
>> >> >>> > > >
>> >> >>> > >
>> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >> >>> > > instance, on QEMU we have
>> >> >>> > >
>> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> >>> > >   01000013)
>> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> >>> > > BXPC 00000001)
>> >> >>> > >
>> >> >>> > > covered by
>> >> >>> > >
>> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> >>> > >  ...
>> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> >>> >
>> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >> >>> > UEFI boot services.
>> >> >>> >
>> >> >>> > >
>> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >> >>> > > >> when booting the next kernel.
>> >> >>> > > >
>> >> >>> > > > not really.
>> >> >>> > > >
>> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> >>> > > >> > on crash dump kernel?)
>> >> >>> > > >> >
>> >> >>> > > >>
>> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >> >>> > > >> regions only revealed the bug, not created it (given that other
>> >> >>> > > >> memblock_reserve regions may be affected as well)
>> >> >>> > > >
>> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >> >>> > > > exposed to user space (via proc/iomem).
>> >> >>> > > >
>> >> >>> > >
>> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >> >>> > > as 'System RAM'. Do you think that could solve this?
>> >> >>> >
>> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> >>> > marking them under another name in /proc/iomem would also be good in order
>> >> >>> > not to allocate them as part of crash kernel's memory.
>> >> >>> >
>> >> >>> > But I'm not still convinced that we should export them in useable-
>> >> >>> > memory-range to crash dump kernel. They will be accessed through
>> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >> >>> > (or memblocks), I guess.
>> >> >>> >     -> Bhupesh?
>> >> >>>
>> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >> >>> than usable memory (which is from the dt node instead) should be
>> >> >>> reinitialized according to efi passed info, no?
>> >> >>
>> >> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> >> usable-memory-range by fdt_enforce_memory_region().
>> >> >>
>> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> >> with multiple entries in usable-memory-range.
>> >> >>
>> >> >
>> >> > In any case, the root of the problem is that memory regions lose their
>> >> > 'memory' annotation due to the way the memory map is mangled before
>> >> > being supplied to the kexec kernel.
>> >> >
>> >> > Would it be possible to classify all memory that we want to hide from
>> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> >> > so this seems to be the most appropriate way to deal with the host
>> >> > kernel's memory contents.
>> >>
>> >> Hmm. wouldn't appending the acpi reclaim regions to
>> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> >> be better? Because its indirectly achieving a similar objective
>> >> (although may be a subset of all System RAM regions on the primary
>> >> kernel's memory).
>> >>
>> >> I am not aware of the background about the current kexec-tools
>> >> implementation where we add only the crashkernel range to the dtb
>> >> being passed to the crashkernel.
>> >>
>> >> Probably Akashi can answer better, as to how we arrived at this design
>> >> approach and why we didn't want to expose all System RAM regions (i.e.
>> >> ! NOMPAP regions) to the crashkernel.
>> >>
>> >> I am suspecting that some issues were seen/meet when the System RAM (!
>> >> NOMAP regions) were exposed to the crashkernel, and that's why we
>> >> finalized on this design approach, but this is something which is just
>> >> my guess.
>> >>
>> >> Regards,
>> >> Bhupesh
>> >>
>> >> >>> >
>> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> >>> > via a kernel command line parameter, "memmap=".
>> >> >>>
>> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >> >>> e820 table.
>> >> >>
>> >> >> Thanks. I remember that you have explained it before.
>> >> >>
>> >> >> -Takahiro AKASHI
>> >> >>
>> >> >>> [snip]
>> >> >>>
>> >> >>> Thanks
>> >> >>> Dave
>> >
>> > ===8<==
>> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
>> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> > Date: Thu, 21 Dec 2017 19:14:23 +0900
>> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>> >
>> > ---
>> >  arch/arm64/mm/init.c | 10 ++++++++--
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> > index 00e7b900ca41..8175db94257b 100644
>> > --- a/arch/arm64/mm/init.c
>> > +++ b/arch/arm64/mm/init.c
>> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>> >         struct memblock_region reg = {
>> >                 .size = 0,
>> >         };
>> > +       u64 idx;
>> > +       phys_addr_t start, end;
>> >
>> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >
>> > -       if (reg.size)
>> > -               memblock_cap_memory_range(reg.base, reg.size);
>> > +       if (reg.size) {
>> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
>> > +                                       &start, &end, NULL)
>> > +                       memblock_mark_nomap(start, end - start);
>> > +               memblock_clear_nomap(reg.base, reg.size);
>> > +       }
>> >  }
>> >
>> >  void __init arm64_memblock_init(void)
>> > --
>> > 2.15.1
>> >
>>
>> Thanks for the patch. After applying this on top of
>> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
>> crashkernel boot no longer hangs while trying to access the acpi
>> tables.
>>
>> However I notice a minor issue. Please see the log below for
>> reference, the following message keeps spamming the console but I see
>> the crashkernel boot proceed further.:
>>
>> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
>> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
>> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
>> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
>> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
>> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
>> [    0.000000] NUMA: NODE_DATA(1) on node 0
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
>> [    0.000000] NUMA: NODE_DATA(2) on node 0
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
>> [    0.000000] NUMA: NODE_DATA(3) on node 0
>> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
>> page_structs
>> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
>> page_structs
>>
>> [snip..]
>> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
>> page_structs
>
> These messages shows that some "struct page" data are allocated on remote
> (numa) nodes.
> Since on your crash dump kernel, all the usable system memory (starting
> 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
>
> In my best guess, you can ingore them except for some performance penality.
> This may be one side-effect.
>
> So does your crash dump kernel now boot successfully?
>

Indeed. The crash dump kernel now boots successfully and the crash
dump core can be saved properly as well (I tried saving it to local
disk).

However, the 'potential offnode page_structs' WARN messages hog the
console and delay crashkernel boot for a significant duration, which
can be irritating.

Can we also consider ratelimiting this WARNING message [which seems to
come from vmemmap_verify()] if invoked in the context of crash kernel,
in addition to making the above change suggested by  you.

Thanks for the help.

Regards,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-23 19:51                                                                                 ` Bhupesh Sharma
  (?)
@ 2017-12-25  3:25                                                                                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-25  3:25 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> >> Hello Akashi,
> >>
> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > Bhupesh,
> >> >
> >> > Can you test the patch attached below, please?
> >> >
> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
> >> > exporting them via usable-memory-range.
> >> > (I still have to figure out what the side-effect of this patch is.)
> >> >
> >> > Thanks,
> >> > -Takahiro AKASHI
> >> >
> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >>> > > >> > Bhupesh, Ard,
> >> >> >>> > > >> >
> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> >>> > > >> >> Hi Ard, Akashi
> >> >> >>> > > >> >>
> >> >> >>> > > >> > (snip)
> >> >> >>> > > >> >
> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> >>> > > >> >> , for details)
> >> >> >>> > > >> >
> >> >> >>> > > >> > Right.
> >> >> >>> > > >> >
> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> >>> > > >> >> with the crashkernel memory range:
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> >>> > > >> >> , for details)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> [snip..]
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> Reserved memory range
> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> Coredump memory ranges
> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >> >>> > > >> >> {
> >> >> >>> > > >> >>         struct memblock_region reg = {
> >> >> >>> > > >> >>                 .size = 0,
> >> >> >>> > > >> >>         };
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>         if (reg.size)
> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> >>> > > >> >> comment this out */
> >> >> >>> > > >> >> }
> >> >> >>> > > >> >
> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >> >>> > > >> >
> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> >>> > > >> >> fail.
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >> >>> > > >> >
> >> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >> >>> > > >> >
> >> >> >>> > > >>
> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >> >>> > > >> memblock_reserve()'d now.
> >> >> >>> > > >
> >> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >> >>> > > > during boot time, uefi itself or efistub?
> >> >> >>> > > >
> >> >> >>> > >
> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >> >>> > > instance, on QEMU we have
> >> >> >>> > >
> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >> >>> > >   01000013)
> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >
> >> >> >>> > > covered by
> >> >> >>> > >
> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >> >>> > >  ...
> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >> >>> >
> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >> >>> > UEFI boot services.
> >> >> >>> >
> >> >> >>> > >
> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >> >>> > > >> when booting the next kernel.
> >> >> >>> > > >
> >> >> >>> > > > not really.
> >> >> >>> > > >
> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >> >>> > > >> > on crash dump kernel?)
> >> >> >>> > > >> >
> >> >> >>> > > >>
> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >> >>> > > >
> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >> >>> > > > exposed to user space (via proc/iomem).
> >> >> >>> > > >
> >> >> >>> > >
> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >> >>> >
> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >> >>> >
> >> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> >>> > (or memblocks), I guess.
> >> >> >>> >     -> Bhupesh?
> >> >> >>>
> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >> >>> than usable memory (which is from the dt node instead) should be
> >> >> >>> reinitialized according to efi passed info, no?
> >> >> >>
> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >> >>
> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> >> with multiple entries in usable-memory-range.
> >> >> >>
> >> >> >
> >> >> > In any case, the root of the problem is that memory regions lose their
> >> >> > 'memory' annotation due to the way the memory map is mangled before
> >> >> > being supplied to the kexec kernel.
> >> >> >
> >> >> > Would it be possible to classify all memory that we want to hide from
> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> >> > so this seems to be the most appropriate way to deal with the host
> >> >> > kernel's memory contents.
> >> >>
> >> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> >> be better? Because its indirectly achieving a similar objective
> >> >> (although may be a subset of all System RAM regions on the primary
> >> >> kernel's memory).
> >> >>
> >> >> I am not aware of the background about the current kexec-tools
> >> >> implementation where we add only the crashkernel range to the dtb
> >> >> being passed to the crashkernel.
> >> >>
> >> >> Probably Akashi can answer better, as to how we arrived at this design
> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> >> ! NOMPAP regions) to the crashkernel.
> >> >>
> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> >> finalized on this design approach, but this is something which is just
> >> >> my guess.
> >> >>
> >> >> Regards,
> >> >> Bhupesh
> >> >>
> >> >> >>> >
> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> >>> > via a kernel command line parameter, "memmap=".
> >> >> >>>
> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >> >>> e820 table.
> >> >> >>
> >> >> >> Thanks. I remember that you have explained it before.
> >> >> >>
> >> >> >> -Takahiro AKASHI
> >> >> >>
> >> >> >>> [snip]
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Dave
> >> >
> >> > ===8<==
> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >> >
> >> > ---
> >> >  arch/arm64/mm/init.c | 10 ++++++++--
> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> > index 00e7b900ca41..8175db94257b 100644
> >> > --- a/arch/arm64/mm/init.c
> >> > +++ b/arch/arm64/mm/init.c
> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >> >         struct memblock_region reg = {
> >> >                 .size = 0,
> >> >         };
> >> > +       u64 idx;
> >> > +       phys_addr_t start, end;
> >> >
> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >
> >> > -       if (reg.size)
> >> > -               memblock_cap_memory_range(reg.base, reg.size);
> >> > +       if (reg.size) {
> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> >> > +                                       &start, &end, NULL)
> >> > +                       memblock_mark_nomap(start, end - start);
> >> > +               memblock_clear_nomap(reg.base, reg.size);
> >> > +       }
> >> >  }
> >> >
> >> >  void __init arm64_memblock_init(void)
> >> > --
> >> > 2.15.1
> >> >
> >>
> >> Thanks for the patch. After applying this on top of
> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> >> crashkernel boot no longer hangs while trying to access the acpi
> >> tables.
> >>
> >> However I notice a minor issue. Please see the log below for
> >> reference, the following message keeps spamming the console but I see
> >> the crashkernel boot proceed further.:
> >>
> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> >> page_structs
> >>
> >> [snip..]
> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> >> page_structs
> >
> > These messages shows that some "struct page" data are allocated on remote
> > (numa) nodes.
> > Since on your crash dump kernel, all the usable system memory (starting
> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
> >
> > In my best guess, you can ingore them except for some performance penality.
> > This may be one side-effect.
> >
> > So does your crash dump kernel now boot successfully?
> >
> 
> Indeed. The crash dump kernel now boots successfully and the crash
> dump core can be saved properly as well (I tried saving it to local
> disk).

Thank you for the confirmation.
(I'd like to suggest you to examine the core dump with crash utility.)

> However, the 'potential offnode page_structs' WARN messages hog the
> console and delay crashkernel boot for a significant duration, which
> can be irritating.
> 
> Can we also consider ratelimiting this WARNING message [which seems to
> come from vmemmap_verify()] if invoked in the context of crash kernel,
> in addition to making the above change suggested by  you.

Well, we may be able to change pr_warn() to pr_warn_once() here, but
I hope that adding "numa=off" to kernel command line should also work.

Thanks,
-Takahiro AKASHI


> Thanks for the help.
> 
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-25  3:25                                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-25  3:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> >> Hello Akashi,
> >>
> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> >> <takahiro.akashi@linaro.org> wrote:
> >> > Bhupesh,
> >> >
> >> > Can you test the patch attached below, please?
> >> >
> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
> >> > exporting them via usable-memory-range.
> >> > (I still have to figure out what the side-effect of this patch is.)
> >> >
> >> > Thanks,
> >> > -Takahiro AKASHI
> >> >
> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> >> <ard.biesheuvel@linaro.org> wrote:
> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >> >>> > > <takahiro.akashi@linaro.org> wrote:
> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >> >> >>> > > >> > Bhupesh, Ard,
> >> >> >>> > > >> >
> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> >>> > > >> >> Hi Ard, Akashi
> >> >> >>> > > >> >>
> >> >> >>> > > >> > (snip)
> >> >> >>> > > >> >
> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> >>> > > >> >> , for details)
> >> >> >>> > > >> >
> >> >> >>> > > >> > Right.
> >> >> >>> > > >> >
> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> >>> > > >> >> with the crashkernel memory range:
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> >>> > > >> >> , for details)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> [snip..]
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> Reserved memory range
> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> Coredump memory ranges
> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >> >>> > > >> >> {
> >> >> >>> > > >> >>         struct memblock_region reg = {
> >> >> >>> > > >> >>                 .size = 0,
> >> >> >>> > > >> >>         };
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>         if (reg.size)
> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> >>> > > >> >> comment this out */
> >> >> >>> > > >> >> }
> >> >> >>> > > >> >
> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >> >>> > > >> >
> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> >>> > > >> >> fail.
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >> >>> > > >> >
> >> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >> >>> > > >> >
> >> >> >>> > > >>
> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >> >>> > > >> memblock_reserve()'d now.
> >> >> >>> > > >
> >> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >> >>> > > > during boot time, uefi itself or efistub?
> >> >> >>> > > >
> >> >> >>> > >
> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >> >>> > > instance, on QEMU we have
> >> >> >>> > >
> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >> >>> > >   01000013)
> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >
> >> >> >>> > > covered by
> >> >> >>> > >
> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >> >>> > >  ...
> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >> >>> >
> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >> >>> > UEFI boot services.
> >> >> >>> >
> >> >> >>> > >
> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >> >>> > > >> when booting the next kernel.
> >> >> >>> > > >
> >> >> >>> > > > not really.
> >> >> >>> > > >
> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >> >>> > > >> > on crash dump kernel?)
> >> >> >>> > > >> >
> >> >> >>> > > >>
> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >> >>> > > >
> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >> >>> > > > exposed to user space (via proc/iomem).
> >> >> >>> > > >
> >> >> >>> > >
> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >> >>> >
> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >> >>> >
> >> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> >>> > (or memblocks), I guess.
> >> >> >>> >     -> Bhupesh?
> >> >> >>>
> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >> >>> than usable memory (which is from the dt node instead) should be
> >> >> >>> reinitialized according to efi passed info, no?
> >> >> >>
> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >> >>
> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> >> with multiple entries in usable-memory-range.
> >> >> >>
> >> >> >
> >> >> > In any case, the root of the problem is that memory regions lose their
> >> >> > 'memory' annotation due to the way the memory map is mangled before
> >> >> > being supplied to the kexec kernel.
> >> >> >
> >> >> > Would it be possible to classify all memory that we want to hide from
> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> >> > so this seems to be the most appropriate way to deal with the host
> >> >> > kernel's memory contents.
> >> >>
> >> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> >> be better? Because its indirectly achieving a similar objective
> >> >> (although may be a subset of all System RAM regions on the primary
> >> >> kernel's memory).
> >> >>
> >> >> I am not aware of the background about the current kexec-tools
> >> >> implementation where we add only the crashkernel range to the dtb
> >> >> being passed to the crashkernel.
> >> >>
> >> >> Probably Akashi can answer better, as to how we arrived at this design
> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> >> ! NOMPAP regions) to the crashkernel.
> >> >>
> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> >> finalized on this design approach, but this is something which is just
> >> >> my guess.
> >> >>
> >> >> Regards,
> >> >> Bhupesh
> >> >>
> >> >> >>> >
> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> >>> > via a kernel command line parameter, "memmap=".
> >> >> >>>
> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >> >>> e820 table.
> >> >> >>
> >> >> >> Thanks. I remember that you have explained it before.
> >> >> >>
> >> >> >> -Takahiro AKASHI
> >> >> >>
> >> >> >>> [snip]
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Dave
> >> >
> >> > ===8<==
> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >> >
> >> > ---
> >> >  arch/arm64/mm/init.c | 10 ++++++++--
> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> > index 00e7b900ca41..8175db94257b 100644
> >> > --- a/arch/arm64/mm/init.c
> >> > +++ b/arch/arm64/mm/init.c
> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >> >         struct memblock_region reg = {
> >> >                 .size = 0,
> >> >         };
> >> > +       u64 idx;
> >> > +       phys_addr_t start, end;
> >> >
> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >
> >> > -       if (reg.size)
> >> > -               memblock_cap_memory_range(reg.base, reg.size);
> >> > +       if (reg.size) {
> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> >> > +                                       &start, &end, NULL)
> >> > +                       memblock_mark_nomap(start, end - start);
> >> > +               memblock_clear_nomap(reg.base, reg.size);
> >> > +       }
> >> >  }
> >> >
> >> >  void __init arm64_memblock_init(void)
> >> > --
> >> > 2.15.1
> >> >
> >>
> >> Thanks for the patch. After applying this on top of
> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> >> crashkernel boot no longer hangs while trying to access the acpi
> >> tables.
> >>
> >> However I notice a minor issue. Please see the log below for
> >> reference, the following message keeps spamming the console but I see
> >> the crashkernel boot proceed further.:
> >>
> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> >> page_structs
> >>
> >> [snip..]
> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> >> page_structs
> >
> > These messages shows that some "struct page" data are allocated on remote
> > (numa) nodes.
> > Since on your crash dump kernel, all the usable system memory (starting
> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
> >
> > In my best guess, you can ingore them except for some performance penality.
> > This may be one side-effect.
> >
> > So does your crash dump kernel now boot successfully?
> >
> 
> Indeed. The crash dump kernel now boots successfully and the crash
> dump core can be saved properly as well (I tried saving it to local
> disk).

Thank you for the confirmation.
(I'd like to suggest you to examine the core dump with crash utility.)

> However, the 'potential offnode page_structs' WARN messages hog the
> console and delay crashkernel boot for a significant duration, which
> can be irritating.
> 
> Can we also consider ratelimiting this WARNING message [which seems to
> come from vmemmap_verify()] if invoked in the context of crash kernel,
> in addition to making the above change suggested by  you.

Well, we may be able to change pr_warn() to pr_warn_once() here, but
I hope that adding "numa=off" to kernel command line should also work.

Thanks,
-Takahiro AKASHI


> Thanks for the help.
> 
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-25  3:25                                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-25  3:25 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel

On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> >> Hello Akashi,
> >>
> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> >> <takahiro.akashi@linaro.org> wrote:
> >> > Bhupesh,
> >> >
> >> > Can you test the patch attached below, please?
> >> >
> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
> >> > exporting them via usable-memory-range.
> >> > (I still have to figure out what the side-effect of this patch is.)
> >> >
> >> > Thanks,
> >> > -Takahiro AKASHI
> >> >
> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> >> <ard.biesheuvel@linaro.org> wrote:
> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >> >>> > > <takahiro.akashi@linaro.org> wrote:
> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >> >> >>> > > >> > Bhupesh, Ard,
> >> >> >>> > > >> >
> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> >>> > > >> >> Hi Ard, Akashi
> >> >> >>> > > >> >>
> >> >> >>> > > >> > (snip)
> >> >> >>> > > >> >
> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> >>> > > >> >> , for details)
> >> >> >>> > > >> >
> >> >> >>> > > >> > Right.
> >> >> >>> > > >> >
> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> >>> > > >> >> with the crashkernel memory range:
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> >>> > > >> >> , for details)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> [snip..]
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> Reserved memory range
> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> Coredump memory ranges
> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >> >>> > > >> >> {
> >> >> >>> > > >> >>         struct memblock_region reg = {
> >> >> >>> > > >> >>                 .size = 0,
> >> >> >>> > > >> >>         };
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >>> > > >> >>
> >> >> >>> > > >> >>         if (reg.size)
> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> >>> > > >> >> comment this out */
> >> >> >>> > > >> >> }
> >> >> >>> > > >> >
> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >> >>> > > >> >
> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> >>> > > >> >> fail.
> >> >> >>> > > >> >>
> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >> >>> > > >> >
> >> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >> >>> > > >> >
> >> >> >>> > > >>
> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >> >>> > > >> memblock_reserve()'d now.
> >> >> >>> > > >
> >> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >> >>> > > > during boot time, uefi itself or efistub?
> >> >> >>> > > >
> >> >> >>> > >
> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >> >>> > > instance, on QEMU we have
> >> >> >>> > >
> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >> >>> > >   01000013)
> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >> >>> > > BXPC 00000001)
> >> >> >>> > >
> >> >> >>> > > covered by
> >> >> >>> > >
> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >> >>> > >  ...
> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >> >>> >
> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >> >>> > UEFI boot services.
> >> >> >>> >
> >> >> >>> > >
> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >> >>> > > >> when booting the next kernel.
> >> >> >>> > > >
> >> >> >>> > > > not really.
> >> >> >>> > > >
> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >> >>> > > >> > on crash dump kernel?)
> >> >> >>> > > >> >
> >> >> >>> > > >>
> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >> >>> > > >
> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >> >>> > > > exposed to user space (via proc/iomem).
> >> >> >>> > > >
> >> >> >>> > >
> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >> >>> >
> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >> >>> >
> >> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> >>> > (or memblocks), I guess.
> >> >> >>> >     -> Bhupesh?
> >> >> >>>
> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >> >>> than usable memory (which is from the dt node instead) should be
> >> >> >>> reinitialized according to efi passed info, no?
> >> >> >>
> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >> >>
> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> >> with multiple entries in usable-memory-range.
> >> >> >>
> >> >> >
> >> >> > In any case, the root of the problem is that memory regions lose their
> >> >> > 'memory' annotation due to the way the memory map is mangled before
> >> >> > being supplied to the kexec kernel.
> >> >> >
> >> >> > Would it be possible to classify all memory that we want to hide from
> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> >> > so this seems to be the most appropriate way to deal with the host
> >> >> > kernel's memory contents.
> >> >>
> >> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> >> be better? Because its indirectly achieving a similar objective
> >> >> (although may be a subset of all System RAM regions on the primary
> >> >> kernel's memory).
> >> >>
> >> >> I am not aware of the background about the current kexec-tools
> >> >> implementation where we add only the crashkernel range to the dtb
> >> >> being passed to the crashkernel.
> >> >>
> >> >> Probably Akashi can answer better, as to how we arrived at this design
> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> >> ! NOMPAP regions) to the crashkernel.
> >> >>
> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> >> finalized on this design approach, but this is something which is just
> >> >> my guess.
> >> >>
> >> >> Regards,
> >> >> Bhupesh
> >> >>
> >> >> >>> >
> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> >>> > via a kernel command line parameter, "memmap=".
> >> >> >>>
> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >> >>> e820 table.
> >> >> >>
> >> >> >> Thanks. I remember that you have explained it before.
> >> >> >>
> >> >> >> -Takahiro AKASHI
> >> >> >>
> >> >> >>> [snip]
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Dave
> >> >
> >> > ===8<==
> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >> >
> >> > ---
> >> >  arch/arm64/mm/init.c | 10 ++++++++--
> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> > index 00e7b900ca41..8175db94257b 100644
> >> > --- a/arch/arm64/mm/init.c
> >> > +++ b/arch/arm64/mm/init.c
> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >> >         struct memblock_region reg = {
> >> >                 .size = 0,
> >> >         };
> >> > +       u64 idx;
> >> > +       phys_addr_t start, end;
> >> >
> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >
> >> > -       if (reg.size)
> >> > -               memblock_cap_memory_range(reg.base, reg.size);
> >> > +       if (reg.size) {
> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> >> > +                                       &start, &end, NULL)
> >> > +                       memblock_mark_nomap(start, end - start);
> >> > +               memblock_clear_nomap(reg.base, reg.size);
> >> > +       }
> >> >  }
> >> >
> >> >  void __init arm64_memblock_init(void)
> >> > --
> >> > 2.15.1
> >> >
> >>
> >> Thanks for the patch. After applying this on top of
> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> >> crashkernel boot no longer hangs while trying to access the acpi
> >> tables.
> >>
> >> However I notice a minor issue. Please see the log below for
> >> reference, the following message keeps spamming the console but I see
> >> the crashkernel boot proceed further.:
> >>
> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> >> page_structs
> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> >> page_structs
> >>
> >> [snip..]
> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> >> page_structs
> >
> > These messages shows that some "struct page" data are allocated on remote
> > (numa) nodes.
> > Since on your crash dump kernel, all the usable system memory (starting
> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
> >
> > In my best guess, you can ingore them except for some performance penality.
> > This may be one side-effect.
> >
> > So does your crash dump kernel now boot successfully?
> >
> 
> Indeed. The crash dump kernel now boots successfully and the crash
> dump core can be saved properly as well (I tried saving it to local
> disk).

Thank you for the confirmation.
(I'd like to suggest you to examine the core dump with crash utility.)

> However, the 'potential offnode page_structs' WARN messages hog the
> console and delay crashkernel boot for a significant duration, which
> can be irritating.
> 
> Can we also consider ratelimiting this WARNING message [which seems to
> come from vmemmap_verify()] if invoked in the context of crash kernel,
> in addition to making the above change suggested by  you.

Well, we may be able to change pr_warn() to pr_warn_once() here, but
I hope that adding "numa=off" to kernel command line should also work.

Thanks,
-Takahiro AKASHI


> Thanks for the help.
> 
> Regards,
> Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-25  3:25                                                                                     ` AKASHI Takahiro
  (?)
@ 2017-12-25 20:14                                                                                         ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-25 20:14 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
>> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
>> >> Hello Akashi,
>> >>
>> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> > Bhupesh,
>> >> >
>> >> > Can you test the patch attached below, please?
>> >> >
>> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
>> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
>> >> > exporting them via usable-memory-range.
>> >> > (I still have to figure out what the side-effect of this patch is.)
>> >> >
>> >> > Thanks,
>> >> > -Takahiro AKASHI
>> >> >
>> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> >> >>> > > >> > Bhupesh, Ard,
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >> >>> > > >> >> Hi Ard, Akashi
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> > (snip)
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >> >>> > > >> >> , for details)
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > Right.
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >> >>> > > >> >> with the crashkernel memory range:
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >> >>> > > >> >>                                 address_cells, size_cells);
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >> >>> > > >> >> , for details)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> [snip..]
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> Reserved memory range
>> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> Coredump memory ranges
>> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >> >> >>> > > >> >> {
>> >> >> >>> > > >> >>         struct memblock_region reg = {
>> >> >> >>> > > >> >>                 .size = 0,
>> >> >> >>> > > >> >>         };
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>         if (reg.size)
>> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >> >>> > > >> >> comment this out */
>> >> >> >>> > > >> >> }
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >> >> >>> > > >> > memory contents of the *crashed* kernel.
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >> >>> > > >> >> fail.
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > I still don't understand why we need to carry over the information
>> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >> >> >>> > > >> >
>> >> >> >>> > > >>
>> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >> >> >>> > > >> memblock_reserve()'d now.
>> >> >> >>> > > >
>> >> >> >>> > > > For my better understandings, who is actually accessing such regions
>> >> >> >>> > > > during boot time, uefi itself or efistub?
>> >> >> >>> > > >
>> >> >> >>> > >
>> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >> >> >>> > > instance, on QEMU we have
>> >> >> >>> > >
>> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> >> >>> > >   01000013)
>> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >
>> >> >> >>> > > covered by
>> >> >> >>> > >
>> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> >> >>> > >  ...
>> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> >> >>> >
>> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >> >> >>> > UEFI boot services.
>> >> >> >>> >
>> >> >> >>> > >
>> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >> >> >>> > > >> when booting the next kernel.
>> >> >> >>> > > >
>> >> >> >>> > > > not really.
>> >> >> >>> > > >
>> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> >> >>> > > >> > on crash dump kernel?)
>> >> >> >>> > > >> >
>> >> >> >>> > > >>
>> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
>> >> >> >>> > > >> memblock_reserve regions may be affected as well)
>> >> >> >>> > > >
>> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >> >> >>> > > > exposed to user space (via proc/iomem).
>> >> >> >>> > > >
>> >> >> >>> > >
>> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
>> >> >> >>> >
>> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
>> >> >> >>> > not to allocate them as part of crash kernel's memory.
>> >> >> >>> >
>> >> >> >>> > But I'm not still convinced that we should export them in useable-
>> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
>> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >> >> >>> > (or memblocks), I guess.
>> >> >> >>> >     -> Bhupesh?
>> >> >> >>>
>> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >> >> >>> than usable memory (which is from the dt node instead) should be
>> >> >> >>> reinitialized according to efi passed info, no?
>> >> >> >>
>> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> >> >> usable-memory-range by fdt_enforce_memory_region().
>> >> >> >>
>> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> >> >> with multiple entries in usable-memory-range.
>> >> >> >>
>> >> >> >
>> >> >> > In any case, the root of the problem is that memory regions lose their
>> >> >> > 'memory' annotation due to the way the memory map is mangled before
>> >> >> > being supplied to the kexec kernel.
>> >> >> >
>> >> >> > Would it be possible to classify all memory that we want to hide from
>> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> >> >> > so this seems to be the most appropriate way to deal with the host
>> >> >> > kernel's memory contents.
>> >> >>
>> >> >> Hmm. wouldn't appending the acpi reclaim regions to
>> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> >> >> be better? Because its indirectly achieving a similar objective
>> >> >> (although may be a subset of all System RAM regions on the primary
>> >> >> kernel's memory).
>> >> >>
>> >> >> I am not aware of the background about the current kexec-tools
>> >> >> implementation where we add only the crashkernel range to the dtb
>> >> >> being passed to the crashkernel.
>> >> >>
>> >> >> Probably Akashi can answer better, as to how we arrived at this design
>> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
>> >> >> ! NOMPAP regions) to the crashkernel.
>> >> >>
>> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
>> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
>> >> >> finalized on this design approach, but this is something which is just
>> >> >> my guess.
>> >> >>
>> >> >> Regards,
>> >> >> Bhupesh
>> >> >>
>> >> >> >>> >
>> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> >> >>> > via a kernel command line parameter, "memmap=".
>> >> >> >>>
>> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >> >> >>> e820 table.
>> >> >> >>
>> >> >> >> Thanks. I remember that you have explained it before.
>> >> >> >>
>> >> >> >> -Takahiro AKASHI
>> >> >> >>
>> >> >> >>> [snip]
>> >> >> >>>
>> >> >> >>> Thanks
>> >> >> >>> Dave
>> >> >
>> >> > ===8<==
>> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
>> >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
>> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>> >> >
>> >> > ---
>> >> >  arch/arm64/mm/init.c | 10 ++++++++--
>> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> >> > index 00e7b900ca41..8175db94257b 100644
>> >> > --- a/arch/arm64/mm/init.c
>> >> > +++ b/arch/arm64/mm/init.c
>> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>> >> >         struct memblock_region reg = {
>> >> >                 .size = 0,
>> >> >         };
>> >> > +       u64 idx;
>> >> > +       phys_addr_t start, end;
>> >> >
>> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >
>> >> > -       if (reg.size)
>> >> > -               memblock_cap_memory_range(reg.base, reg.size);
>> >> > +       if (reg.size) {
>> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
>> >> > +                                       &start, &end, NULL)
>> >> > +                       memblock_mark_nomap(start, end - start);
>> >> > +               memblock_clear_nomap(reg.base, reg.size);
>> >> > +       }
>> >> >  }
>> >> >
>> >> >  void __init arm64_memblock_init(void)
>> >> > --
>> >> > 2.15.1
>> >> >
>> >>
>> >> Thanks for the patch. After applying this on top of
>> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
>> >> crashkernel boot no longer hangs while trying to access the acpi
>> >> tables.
>> >>
>> >> However I notice a minor issue. Please see the log below for
>> >> reference, the following message keeps spamming the console but I see
>> >> the crashkernel boot proceed further.:
>> >>
>> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
>> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
>> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
>> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
>> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
>> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
>> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
>> >> page_structs
>> >>
>> >> [snip..]
>> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
>> >> page_structs
>> >
>> > These messages shows that some "struct page" data are allocated on remote
>> > (numa) nodes.
>> > Since on your crash dump kernel, all the usable system memory (starting
>> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
>> >
>> > In my best guess, you can ingore them except for some performance penality.
>> > This may be one side-effect.
>> >
>> > So does your crash dump kernel now boot successfully?
>> >
>>
>> Indeed. The crash dump kernel now boots successfully and the crash
>> dump core can be saved properly as well (I tried saving it to local
>> disk).
>
> Thank you for the confirmation.
> (I'd like to suggest you to examine the core dump with crash utility.)
>
>> However, the 'potential offnode page_structs' WARN messages hog the
>> console and delay crashkernel boot for a significant duration, which
>> can be irritating.
>>
>> Can we also consider ratelimiting this WARNING message [which seems to
>> come from vmemmap_verify()] if invoked in the context of crash kernel,
>> in addition to making the above change suggested by  you.
>
> Well, we may be able to change pr_warn() to pr_warn_once() here, but
> I hope that adding "numa=off" to kernel command line should also work.

Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
my initial thought process as well, but I am not sure if this will
cause any regressions on aarch64 systems which use crashdump feature.

I think the 2nd solution, i.e limiting the warn message print
frequency might be a better option. Can you please add the following
patch (may be as a separate one) and send it along the patch which
marks all areas other than the crashkernel region being passed to the
crashkernel as NOMAP, so that we can get this issue fixed in upstream
aarch64 kernel:

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 17acf01791fa..4c13fe3c644d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
        int actual_node = early_pfn_to_nid(pfn);

        if (node_distance(actual_node, node) > LOCAL_DISTANCE)
-               pr_warn("[%lx-%lx] potential offnode page_structs\n",
+               pr_warn_once("[%lx-%lx] potential offnode page_structs\n",
                        start, end - 1);
 }

I have tested this solution on huawei taishan board and can boot
crashkernel successfully and also save the crash core properly
(without the  console warn message flooding which used to hold up the
crashkernel boot).

Thanks,
Bhupesh

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-25 20:14                                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-25 20:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
>> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
>> >> Hello Akashi,
>> >>
>> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > Bhupesh,
>> >> >
>> >> > Can you test the patch attached below, please?
>> >> >
>> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
>> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
>> >> > exporting them via usable-memory-range.
>> >> > (I still have to figure out what the side-effect of this patch is.)
>> >> >
>> >> > Thanks,
>> >> > -Takahiro AKASHI
>> >> >
>> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> >> >> <ard.biesheuvel@linaro.org> wrote:
>> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> >> >> > <takahiro.akashi@linaro.org> wrote:
>> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> >> >>> > > <takahiro.akashi@linaro.org> wrote:
>> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
>> >> >> >>> > > >> > Bhupesh, Ard,
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >> >>> > > >> >> Hi Ard, Akashi
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> > (snip)
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >> >>> > > >> >> , for details)
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > Right.
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >> >>> > > >> >> with the crashkernel memory range:
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >> >>> > > >> >>                                 address_cells, size_cells);
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >> >>> > > >> >> , for details)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> [snip..]
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> Reserved memory range
>> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> Coredump memory ranges
>> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >> >> >>> > > >> >> {
>> >> >> >>> > > >> >>         struct memblock_region reg = {
>> >> >> >>> > > >> >>                 .size = 0,
>> >> >> >>> > > >> >>         };
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>         if (reg.size)
>> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >> >>> > > >> >> comment this out */
>> >> >> >>> > > >> >> }
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >> >> >>> > > >> > memory contents of the *crashed* kernel.
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >> >>> > > >> >> fail.
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > I still don't understand why we need to carry over the information
>> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >> >> >>> > > >> >
>> >> >> >>> > > >>
>> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >> >> >>> > > >> memblock_reserve()'d now.
>> >> >> >>> > > >
>> >> >> >>> > > > For my better understandings, who is actually accessing such regions
>> >> >> >>> > > > during boot time, uefi itself or efistub?
>> >> >> >>> > > >
>> >> >> >>> > >
>> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >> >> >>> > > instance, on QEMU we have
>> >> >> >>> > >
>> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> >> >>> > >   01000013)
>> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >
>> >> >> >>> > > covered by
>> >> >> >>> > >
>> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> >> >>> > >  ...
>> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> >> >>> >
>> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >> >> >>> > UEFI boot services.
>> >> >> >>> >
>> >> >> >>> > >
>> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >> >> >>> > > >> when booting the next kernel.
>> >> >> >>> > > >
>> >> >> >>> > > > not really.
>> >> >> >>> > > >
>> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> >> >>> > > >> > on crash dump kernel?)
>> >> >> >>> > > >> >
>> >> >> >>> > > >>
>> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
>> >> >> >>> > > >> memblock_reserve regions may be affected as well)
>> >> >> >>> > > >
>> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >> >> >>> > > > exposed to user space (via proc/iomem).
>> >> >> >>> > > >
>> >> >> >>> > >
>> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
>> >> >> >>> >
>> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
>> >> >> >>> > not to allocate them as part of crash kernel's memory.
>> >> >> >>> >
>> >> >> >>> > But I'm not still convinced that we should export them in useable-
>> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
>> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >> >> >>> > (or memblocks), I guess.
>> >> >> >>> >     -> Bhupesh?
>> >> >> >>>
>> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >> >> >>> than usable memory (which is from the dt node instead) should be
>> >> >> >>> reinitialized according to efi passed info, no?
>> >> >> >>
>> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> >> >> usable-memory-range by fdt_enforce_memory_region().
>> >> >> >>
>> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> >> >> with multiple entries in usable-memory-range.
>> >> >> >>
>> >> >> >
>> >> >> > In any case, the root of the problem is that memory regions lose their
>> >> >> > 'memory' annotation due to the way the memory map is mangled before
>> >> >> > being supplied to the kexec kernel.
>> >> >> >
>> >> >> > Would it be possible to classify all memory that we want to hide from
>> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> >> >> > so this seems to be the most appropriate way to deal with the host
>> >> >> > kernel's memory contents.
>> >> >>
>> >> >> Hmm. wouldn't appending the acpi reclaim regions to
>> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> >> >> be better? Because its indirectly achieving a similar objective
>> >> >> (although may be a subset of all System RAM regions on the primary
>> >> >> kernel's memory).
>> >> >>
>> >> >> I am not aware of the background about the current kexec-tools
>> >> >> implementation where we add only the crashkernel range to the dtb
>> >> >> being passed to the crashkernel.
>> >> >>
>> >> >> Probably Akashi can answer better, as to how we arrived at this design
>> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
>> >> >> ! NOMPAP regions) to the crashkernel.
>> >> >>
>> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
>> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
>> >> >> finalized on this design approach, but this is something which is just
>> >> >> my guess.
>> >> >>
>> >> >> Regards,
>> >> >> Bhupesh
>> >> >>
>> >> >> >>> >
>> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> >> >>> > via a kernel command line parameter, "memmap=".
>> >> >> >>>
>> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >> >> >>> e820 table.
>> >> >> >>
>> >> >> >> Thanks. I remember that you have explained it before.
>> >> >> >>
>> >> >> >> -Takahiro AKASHI
>> >> >> >>
>> >> >> >>> [snip]
>> >> >> >>>
>> >> >> >>> Thanks
>> >> >> >>> Dave
>> >> >
>> >> > ===8<==
>> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
>> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
>> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>> >> >
>> >> > ---
>> >> >  arch/arm64/mm/init.c | 10 ++++++++--
>> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> >> > index 00e7b900ca41..8175db94257b 100644
>> >> > --- a/arch/arm64/mm/init.c
>> >> > +++ b/arch/arm64/mm/init.c
>> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>> >> >         struct memblock_region reg = {
>> >> >                 .size = 0,
>> >> >         };
>> >> > +       u64 idx;
>> >> > +       phys_addr_t start, end;
>> >> >
>> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >
>> >> > -       if (reg.size)
>> >> > -               memblock_cap_memory_range(reg.base, reg.size);
>> >> > +       if (reg.size) {
>> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
>> >> > +                                       &start, &end, NULL)
>> >> > +                       memblock_mark_nomap(start, end - start);
>> >> > +               memblock_clear_nomap(reg.base, reg.size);
>> >> > +       }
>> >> >  }
>> >> >
>> >> >  void __init arm64_memblock_init(void)
>> >> > --
>> >> > 2.15.1
>> >> >
>> >>
>> >> Thanks for the patch. After applying this on top of
>> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
>> >> crashkernel boot no longer hangs while trying to access the acpi
>> >> tables.
>> >>
>> >> However I notice a minor issue. Please see the log below for
>> >> reference, the following message keeps spamming the console but I see
>> >> the crashkernel boot proceed further.:
>> >>
>> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
>> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
>> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
>> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
>> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
>> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
>> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
>> >> page_structs
>> >>
>> >> [snip..]
>> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
>> >> page_structs
>> >
>> > These messages shows that some "struct page" data are allocated on remote
>> > (numa) nodes.
>> > Since on your crash dump kernel, all the usable system memory (starting
>> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
>> >
>> > In my best guess, you can ingore them except for some performance penality.
>> > This may be one side-effect.
>> >
>> > So does your crash dump kernel now boot successfully?
>> >
>>
>> Indeed. The crash dump kernel now boots successfully and the crash
>> dump core can be saved properly as well (I tried saving it to local
>> disk).
>
> Thank you for the confirmation.
> (I'd like to suggest you to examine the core dump with crash utility.)
>
>> However, the 'potential offnode page_structs' WARN messages hog the
>> console and delay crashkernel boot for a significant duration, which
>> can be irritating.
>>
>> Can we also consider ratelimiting this WARNING message [which seems to
>> come from vmemmap_verify()] if invoked in the context of crash kernel,
>> in addition to making the above change suggested by  you.
>
> Well, we may be able to change pr_warn() to pr_warn_once() here, but
> I hope that adding "numa=off" to kernel command line should also work.

Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
my initial thought process as well, but I am not sure if this will
cause any regressions on aarch64 systems which use crashdump feature.

I think the 2nd solution, i.e limiting the warn message print
frequency might be a better option. Can you please add the following
patch (may be as a separate one) and send it along the patch which
marks all areas other than the crashkernel region being passed to the
crashkernel as NOMAP, so that we can get this issue fixed in upstream
aarch64 kernel:

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 17acf01791fa..4c13fe3c644d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
        int actual_node = early_pfn_to_nid(pfn);

        if (node_distance(actual_node, node) > LOCAL_DISTANCE)
-               pr_warn("[%lx-%lx] potential offnode page_structs\n",
+               pr_warn_once("[%lx-%lx] potential offnode page_structs\n",
                        start, end - 1);
 }

I have tested this solution on huawei taishan board and can boot
crashkernel successfully and also save the crash core properly
(without the  console warn message flooding which used to hold up the
crashkernel boot).

Thanks,
Bhupesh

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-25 20:14                                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-25 20:14 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young,
	Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi,
	Mark Rutland, James Morse, kexec

On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
>> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
>> <takahiro.akashi@linaro.org> wrote:
>> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
>> >> Hello Akashi,
>> >>
>> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
>> >> <takahiro.akashi@linaro.org> wrote:
>> >> > Bhupesh,
>> >> >
>> >> > Can you test the patch attached below, please?
>> >> >
>> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
>> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
>> >> > exporting them via usable-memory-range.
>> >> > (I still have to figure out what the side-effect of this patch is.)
>> >> >
>> >> > Thanks,
>> >> > -Takahiro AKASHI
>> >> >
>> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
>> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
>> >> >> <ard.biesheuvel@linaro.org> wrote:
>> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
>> >> >> > <takahiro.akashi@linaro.org> wrote:
>> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
>> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
>> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
>> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
>> >> >> >>> > > <takahiro.akashi@linaro.org> wrote:
>> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
>> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
>> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
>> >> >> >>> > > >> > Bhupesh, Ard,
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
>> >> >> >>> > > >> >> Hi Ard, Akashi
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> > (snip)
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
>> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
>> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
>> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
>> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
>> >> >> >>> > > >> >> , for details)
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > Right.
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
>> >> >> >>> > > >> >> with the crashkernel memory range:
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
>> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
>> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
>> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
>> >> >> >>> > > >> >>                                 address_cells, size_cells);
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
>> >> >> >>> > > >> >> , for details)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
>> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
>> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
>> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
>> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
>> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> [snip..]
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> Reserved memory range
>> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> Coredump memory ranges
>> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
>> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
>> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
>> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
>> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
>> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
>> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
>> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
>> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
>> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
>> >> >> >>> > > >> >> {
>> >> >> >>> > > >> >>         struct memblock_region reg = {
>> >> >> >>> > > >> >>                 .size = 0,
>> >> >> >>> > > >> >>         };
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >>         if (reg.size)
>> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
>> >> >> >>> > > >> >> comment this out */
>> >> >> >>> > > >> >> }
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
>> >> >> >>> > > >> > memory contents of the *crashed* kernel.
>> >> >> >>> > > >> >
>> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
>> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
>> >> >> >>> > > >> >> fail.
>> >> >> >>> > > >> >>
>> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
>> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
>> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
>> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
>> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
>> >> >> >>> > > >> >
>> >> >> >>> > > >> > I still don't understand why we need to carry over the information
>> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
>> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
>> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
>> >> >> >>> > > >> >
>> >> >> >>> > > >>
>> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
>> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
>> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
>> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
>> >> >> >>> > > >> memblock_reserve()'d now.
>> >> >> >>> > > >
>> >> >> >>> > > > For my better understandings, who is actually accessing such regions
>> >> >> >>> > > > during boot time, uefi itself or efistub?
>> >> >> >>> > > >
>> >> >> >>> > >
>> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
>> >> >> >>> > > instance, on QEMU we have
>> >> >> >>> > >
>> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
>> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
>> >> >> >>> > >   01000013)
>> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
>> >> >> >>> > > BXPC 00000001)
>> >> >> >>> > >
>> >> >> >>> > > covered by
>> >> >> >>> > >
>> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
>> >> >> >>> > >  ...
>> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
>> >> >> >>> >
>> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
>> >> >> >>> > UEFI boot services.
>> >> >> >>> >
>> >> >> >>> > >
>> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
>> >> >> >>> > > >> when booting the next kernel.
>> >> >> >>> > > >
>> >> >> >>> > > > not really.
>> >> >> >>> > > >
>> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
>> >> >> >>> > > >> > on crash dump kernel?)
>> >> >> >>> > > >> >
>> >> >> >>> > > >>
>> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
>> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
>> >> >> >>> > > >> memblock_reserve regions may be affected as well)
>> >> >> >>> > > >
>> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
>> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
>> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
>> >> >> >>> > > > exposed to user space (via proc/iomem).
>> >> >> >>> > > >
>> >> >> >>> > >
>> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
>> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
>> >> >> >>> >
>> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
>> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
>> >> >> >>> > not to allocate them as part of crash kernel's memory.
>> >> >> >>> >
>> >> >> >>> > But I'm not still convinced that we should export them in useable-
>> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
>> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
>> >> >> >>> > (or memblocks), I guess.
>> >> >> >>> >     -> Bhupesh?
>> >> >> >>>
>> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
>> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
>> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
>> >> >> >>> than usable memory (which is from the dt node instead) should be
>> >> >> >>> reinitialized according to efi passed info, no?
>> >> >> >>
>> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
>> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
>> >> >> >> usable-memory-range by fdt_enforce_memory_region().
>> >> >> >>
>> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
>> >> >> >> with multiple entries in usable-memory-range.
>> >> >> >>
>> >> >> >
>> >> >> > In any case, the root of the problem is that memory regions lose their
>> >> >> > 'memory' annotation due to the way the memory map is mangled before
>> >> >> > being supplied to the kexec kernel.
>> >> >> >
>> >> >> > Would it be possible to classify all memory that we want to hide from
>> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
>> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
>> >> >> > so this seems to be the most appropriate way to deal with the host
>> >> >> > kernel's memory contents.
>> >> >>
>> >> >> Hmm. wouldn't appending the acpi reclaim regions to
>> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
>> >> >> be better? Because its indirectly achieving a similar objective
>> >> >> (although may be a subset of all System RAM regions on the primary
>> >> >> kernel's memory).
>> >> >>
>> >> >> I am not aware of the background about the current kexec-tools
>> >> >> implementation where we add only the crashkernel range to the dtb
>> >> >> being passed to the crashkernel.
>> >> >>
>> >> >> Probably Akashi can answer better, as to how we arrived at this design
>> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
>> >> >> ! NOMPAP regions) to the crashkernel.
>> >> >>
>> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
>> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
>> >> >> finalized on this design approach, but this is something which is just
>> >> >> my guess.
>> >> >>
>> >> >> Regards,
>> >> >> Bhupesh
>> >> >>
>> >> >> >>> >
>> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
>> >> >> >>> > via a kernel command line parameter, "memmap=".
>> >> >> >>>
>> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
>> >> >> >>> e820 table.
>> >> >> >>
>> >> >> >> Thanks. I remember that you have explained it before.
>> >> >> >>
>> >> >> >> -Takahiro AKASHI
>> >> >> >>
>> >> >> >>> [snip]
>> >> >> >>>
>> >> >> >>> Thanks
>> >> >> >>> Dave
>> >> >
>> >> > ===8<==
>> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
>> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
>> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
>> >> >
>> >> > ---
>> >> >  arch/arm64/mm/init.c | 10 ++++++++--
>> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> >> > index 00e7b900ca41..8175db94257b 100644
>> >> > --- a/arch/arm64/mm/init.c
>> >> > +++ b/arch/arm64/mm/init.c
>> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
>> >> >         struct memblock_region reg = {
>> >> >                 .size = 0,
>> >> >         };
>> >> > +       u64 idx;
>> >> > +       phys_addr_t start, end;
>> >> >
>> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
>> >> >
>> >> > -       if (reg.size)
>> >> > -               memblock_cap_memory_range(reg.base, reg.size);
>> >> > +       if (reg.size) {
>> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
>> >> > +                                       &start, &end, NULL)
>> >> > +                       memblock_mark_nomap(start, end - start);
>> >> > +               memblock_clear_nomap(reg.base, reg.size);
>> >> > +       }
>> >> >  }
>> >> >
>> >> >  void __init arm64_memblock_init(void)
>> >> > --
>> >> > 2.15.1
>> >> >
>> >>
>> >> Thanks for the patch. After applying this on top of
>> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
>> >> crashkernel boot no longer hangs while trying to access the acpi
>> >> tables.
>> >>
>> >> However I notice a minor issue. Please see the log below for
>> >> reference, the following message keeps spamming the console but I see
>> >> the crashkernel boot proceed further.:
>> >>
>> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
>> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
>> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
>> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
>> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
>> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
>> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
>> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
>> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
>> >> page_structs
>> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
>> >> page_structs
>> >>
>> >> [snip..]
>> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
>> >> page_structs
>> >
>> > These messages shows that some "struct page" data are allocated on remote
>> > (numa) nodes.
>> > Since on your crash dump kernel, all the usable system memory (starting
>> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
>> >
>> > In my best guess, you can ingore them except for some performance penality.
>> > This may be one side-effect.
>> >
>> > So does your crash dump kernel now boot successfully?
>> >
>>
>> Indeed. The crash dump kernel now boots successfully and the crash
>> dump core can be saved properly as well (I tried saving it to local
>> disk).
>
> Thank you for the confirmation.
> (I'd like to suggest you to examine the core dump with crash utility.)
>
>> However, the 'potential offnode page_structs' WARN messages hog the
>> console and delay crashkernel boot for a significant duration, which
>> can be irritating.
>>
>> Can we also consider ratelimiting this WARNING message [which seems to
>> come from vmemmap_verify()] if invoked in the context of crash kernel,
>> in addition to making the above change suggested by  you.
>
> Well, we may be able to change pr_warn() to pr_warn_once() here, but
> I hope that adding "numa=off" to kernel command line should also work.

Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
my initial thought process as well, but I am not sure if this will
cause any regressions on aarch64 systems which use crashdump feature.

I think the 2nd solution, i.e limiting the warn message print
frequency might be a better option. Can you please add the following
patch (may be as a separate one) and send it along the patch which
marks all areas other than the crashkernel region being passed to the
crashkernel as NOMAP, so that we can get this issue fixed in upstream
aarch64 kernel:

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 17acf01791fa..4c13fe3c644d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
        int actual_node = early_pfn_to_nid(pfn);

        if (node_distance(actual_node, node) > LOCAL_DISTANCE)
-               pr_warn("[%lx-%lx] potential offnode page_structs\n",
+               pr_warn_once("[%lx-%lx] potential offnode page_structs\n",
                        start, end - 1);
 }

I have tested this solution on huawei taishan board and can boot
crashkernel successfully and also save the crash core properly
(without the  console warn message flooding which used to hold up the
crashkernel boot).

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-25 20:14                                                                                         ` Bhupesh Sharma
  (?)
@ 2017-12-26  1:32                                                                                             ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  1:32 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: AKASHI Takahiro, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 12/26/17 at 01:44am, Bhupesh Sharma wrote:
> On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
> >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> >> >> Hello Akashi,
> >> >>
> >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> >> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> > Bhupesh,
> >> >> >
> >> >> > Can you test the patch attached below, please?
> >> >> >
> >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
> >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
> >> >> > exporting them via usable-memory-range.
> >> >> > (I still have to figure out what the side-effect of this patch is.)
> >> >> >
> >> >> > Thanks,
> >> >> > -Takahiro AKASHI
> >> >> >
> >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> >> >> >>> > > >> > Bhupesh, Ard,
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> >> >>> > > >> >> Hi Ard, Akashi
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> > (snip)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> >> >>> > > >> >> , for details)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > Right.
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> >> >>> > > >> >> with the crashkernel memory range:
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> >> >>> > > >> >> , for details)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> [snip..]
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> Reserved memory range
> >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> Coredump memory ranges
> >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >> >> >>> > > >> >> {
> >> >> >> >>> > > >> >>         struct memblock_region reg = {
> >> >> >> >>> > > >> >>                 .size = 0,
> >> >> >> >>> > > >> >>         };
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>         if (reg.size)
> >> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> >> >>> > > >> >> comment this out */
> >> >> >> >>> > > >> >> }
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> >> >>> > > >> >> fail.
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >>
> >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >> >> >>> > > >> memblock_reserve()'d now.
> >> >> >> >>> > > >
> >> >> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >> >> >>> > > > during boot time, uefi itself or efistub?
> >> >> >> >>> > > >
> >> >> >> >>> > >
> >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >> >> >>> > > instance, on QEMU we have
> >> >> >> >>> > >
> >> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >> >> >>> > >   01000013)
> >> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >
> >> >> >> >>> > > covered by
> >> >> >> >>> > >
> >> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >> >> >>> > >  ...
> >> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >> >> >>> >
> >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >> >> >>> > UEFI boot services.
> >> >> >> >>> >
> >> >> >> >>> > >
> >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >> >> >>> > > >> when booting the next kernel.
> >> >> >> >>> > > >
> >> >> >> >>> > > > not really.
> >> >> >> >>> > > >
> >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >> >> >>> > > >> > on crash dump kernel?)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >>
> >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >> >> >>> > > >
> >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >> >> >>> > > > exposed to user space (via proc/iomem).
> >> >> >> >>> > > >
> >> >> >> >>> > >
> >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >> >> >>> >
> >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >> >> >>> >
> >> >> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> >> >>> > (or memblocks), I guess.
> >> >> >> >>> >     -> Bhupesh?
> >> >> >> >>>
> >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >> >> >>> than usable memory (which is from the dt node instead) should be
> >> >> >> >>> reinitialized according to efi passed info, no?
> >> >> >> >>
> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >> >> >>
> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> >> >> with multiple entries in usable-memory-range.
> >> >> >> >>
> >> >> >> >
> >> >> >> > In any case, the root of the problem is that memory regions lose their
> >> >> >> > 'memory' annotation due to the way the memory map is mangled before
> >> >> >> > being supplied to the kexec kernel.
> >> >> >> >
> >> >> >> > Would it be possible to classify all memory that we want to hide from
> >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> >> >> > so this seems to be the most appropriate way to deal with the host
> >> >> >> > kernel's memory contents.
> >> >> >>
> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> >> >> be better? Because its indirectly achieving a similar objective
> >> >> >> (although may be a subset of all System RAM regions on the primary
> >> >> >> kernel's memory).
> >> >> >>
> >> >> >> I am not aware of the background about the current kexec-tools
> >> >> >> implementation where we add only the crashkernel range to the dtb
> >> >> >> being passed to the crashkernel.
> >> >> >>
> >> >> >> Probably Akashi can answer better, as to how we arrived at this design
> >> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> >> >> ! NOMPAP regions) to the crashkernel.
> >> >> >>
> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> >> >> finalized on this design approach, but this is something which is just
> >> >> >> my guess.
> >> >> >>
> >> >> >> Regards,
> >> >> >> Bhupesh
> >> >> >>
> >> >> >> >>> >
> >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> >> >>> > via a kernel command line parameter, "memmap=".
> >> >> >> >>>
> >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >> >> >>> e820 table.
> >> >> >> >>
> >> >> >> >> Thanks. I remember that you have explained it before.
> >> >> >> >>
> >> >> >> >> -Takahiro AKASHI
> >> >> >> >>
> >> >> >> >>> [snip]
> >> >> >> >>>
> >> >> >> >>> Thanks
> >> >> >> >>> Dave
> >> >> >
> >> >> > ===8<==
> >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> >> >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >> >> >
> >> >> > ---
> >> >> >  arch/arm64/mm/init.c | 10 ++++++++--
> >> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >> >> >
> >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> >> > index 00e7b900ca41..8175db94257b 100644
> >> >> > --- a/arch/arm64/mm/init.c
> >> >> > +++ b/arch/arm64/mm/init.c
> >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >> >> >         struct memblock_region reg = {
> >> >> >                 .size = 0,
> >> >> >         };
> >> >> > +       u64 idx;
> >> >> > +       phys_addr_t start, end;
> >> >> >
> >> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >
> >> >> > -       if (reg.size)
> >> >> > -               memblock_cap_memory_range(reg.base, reg.size);
> >> >> > +       if (reg.size) {
> >> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> >> >> > +                                       &start, &end, NULL)
> >> >> > +                       memblock_mark_nomap(start, end - start);
> >> >> > +               memblock_clear_nomap(reg.base, reg.size);
> >> >> > +       }
> >> >> >  }
> >> >> >
> >> >> >  void __init arm64_memblock_init(void)
> >> >> > --
> >> >> > 2.15.1
> >> >> >
> >> >>
> >> >> Thanks for the patch. After applying this on top of
> >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> >> >> crashkernel boot no longer hangs while trying to access the acpi
> >> >> tables.
> >> >>
> >> >> However I notice a minor issue. Please see the log below for
> >> >> reference, the following message keeps spamming the console but I see
> >> >> the crashkernel boot proceed further.:
> >> >>
> >> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> >> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> >> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> >> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> >> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
> >> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> >> >> page_structs
> >> >>
> >> >> [snip..]
> >> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> >> >> page_structs
> >> >
> >> > These messages shows that some "struct page" data are allocated on remote
> >> > (numa) nodes.
> >> > Since on your crash dump kernel, all the usable system memory (starting
> >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
> >> >
> >> > In my best guess, you can ingore them except for some performance penality.
> >> > This may be one side-effect.
> >> >
> >> > So does your crash dump kernel now boot successfully?
> >> >
> >>
> >> Indeed. The crash dump kernel now boots successfully and the crash
> >> dump core can be saved properly as well (I tried saving it to local
> >> disk).
> >
> > Thank you for the confirmation.
> > (I'd like to suggest you to examine the core dump with crash utility.)
> >
> >> However, the 'potential offnode page_structs' WARN messages hog the
> >> console and delay crashkernel boot for a significant duration, which
> >> can be irritating.
> >>
> >> Can we also consider ratelimiting this WARNING message [which seems to
> >> come from vmemmap_verify()] if invoked in the context of crash kernel,
> >> in addition to making the above change suggested by  you.
> >
> > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > I hope that adding "numa=off" to kernel command line should also work.
> 
> Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> my initial thought process as well, but I am not sure if this will
> cause any regressions on aarch64 systems which use crashdump feature.

It should be fine since we use numa=off by default for all other arches
ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
mm component memory usage. 

> 
> I think the 2nd solution, i.e limiting the warn message print
> frequency might be a better option. Can you please add the following
> patch (may be as a separate one) and send it along the patch which
> marks all areas other than the crashkernel region being passed to the
> crashkernel as NOMAP, so that we can get this issue fixed in upstream
> aarch64 kernel:
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 17acf01791fa..4c13fe3c644d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>         int actual_node = early_pfn_to_nid(pfn);
> 
>         if (node_distance(actual_node, node) > LOCAL_DISTANCE)
> -               pr_warn("[%lx-%lx] potential offnode page_structs\n",
> +               pr_warn_once("[%lx-%lx] potential offnode page_structs\n",
>                         start, end - 1);
>  }
> 
> I have tested this solution on huawei taishan board and can boot
> crashkernel successfully and also save the crash core properly
> (without the  console warn message flooding which used to hold up the
> crashkernel boot).
> 
> Thanks,
> Bhupesh

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  1:32                                                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  1:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/26/17 at 01:44am, Bhupesh Sharma wrote:
> On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
> >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
> >> <takahiro.akashi@linaro.org> wrote:
> >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> >> >> Hello Akashi,
> >> >>
> >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> >> >> <takahiro.akashi@linaro.org> wrote:
> >> >> > Bhupesh,
> >> >> >
> >> >> > Can you test the patch attached below, please?
> >> >> >
> >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
> >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
> >> >> > exporting them via usable-memory-range.
> >> >> > (I still have to figure out what the side-effect of this patch is.)
> >> >> >
> >> >> > Thanks,
> >> >> > -Takahiro AKASHI
> >> >> >
> >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> >> >> <ard.biesheuvel@linaro.org> wrote:
> >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> >> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >> >> >>> > > <takahiro.akashi@linaro.org> wrote:
> >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >> >> >> >>> > > >> > Bhupesh, Ard,
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> >> >>> > > >> >> Hi Ard, Akashi
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> > (snip)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> >> >>> > > >> >> , for details)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > Right.
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> >> >>> > > >> >> with the crashkernel memory range:
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> >> >>> > > >> >> , for details)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> [snip..]
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> Reserved memory range
> >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> Coredump memory ranges
> >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >> >> >>> > > >> >> {
> >> >> >> >>> > > >> >>         struct memblock_region reg = {
> >> >> >> >>> > > >> >>                 .size = 0,
> >> >> >> >>> > > >> >>         };
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>         if (reg.size)
> >> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> >> >>> > > >> >> comment this out */
> >> >> >> >>> > > >> >> }
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> >> >>> > > >> >> fail.
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >>
> >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >> >> >>> > > >> memblock_reserve()'d now.
> >> >> >> >>> > > >
> >> >> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >> >> >>> > > > during boot time, uefi itself or efistub?
> >> >> >> >>> > > >
> >> >> >> >>> > >
> >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >> >> >>> > > instance, on QEMU we have
> >> >> >> >>> > >
> >> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >> >> >>> > >   01000013)
> >> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >
> >> >> >> >>> > > covered by
> >> >> >> >>> > >
> >> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >> >> >>> > >  ...
> >> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >> >> >>> >
> >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >> >> >>> > UEFI boot services.
> >> >> >> >>> >
> >> >> >> >>> > >
> >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >> >> >>> > > >> when booting the next kernel.
> >> >> >> >>> > > >
> >> >> >> >>> > > > not really.
> >> >> >> >>> > > >
> >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >> >> >>> > > >> > on crash dump kernel?)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >>
> >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >> >> >>> > > >
> >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >> >> >>> > > > exposed to user space (via proc/iomem).
> >> >> >> >>> > > >
> >> >> >> >>> > >
> >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >> >> >>> >
> >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >> >> >>> >
> >> >> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> >> >>> > (or memblocks), I guess.
> >> >> >> >>> >     -> Bhupesh?
> >> >> >> >>>
> >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >> >> >>> than usable memory (which is from the dt node instead) should be
> >> >> >> >>> reinitialized according to efi passed info, no?
> >> >> >> >>
> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >> >> >>
> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> >> >> with multiple entries in usable-memory-range.
> >> >> >> >>
> >> >> >> >
> >> >> >> > In any case, the root of the problem is that memory regions lose their
> >> >> >> > 'memory' annotation due to the way the memory map is mangled before
> >> >> >> > being supplied to the kexec kernel.
> >> >> >> >
> >> >> >> > Would it be possible to classify all memory that we want to hide from
> >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> >> >> > so this seems to be the most appropriate way to deal with the host
> >> >> >> > kernel's memory contents.
> >> >> >>
> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> >> >> be better? Because its indirectly achieving a similar objective
> >> >> >> (although may be a subset of all System RAM regions on the primary
> >> >> >> kernel's memory).
> >> >> >>
> >> >> >> I am not aware of the background about the current kexec-tools
> >> >> >> implementation where we add only the crashkernel range to the dtb
> >> >> >> being passed to the crashkernel.
> >> >> >>
> >> >> >> Probably Akashi can answer better, as to how we arrived at this design
> >> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> >> >> ! NOMPAP regions) to the crashkernel.
> >> >> >>
> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> >> >> finalized on this design approach, but this is something which is just
> >> >> >> my guess.
> >> >> >>
> >> >> >> Regards,
> >> >> >> Bhupesh
> >> >> >>
> >> >> >> >>> >
> >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> >> >>> > via a kernel command line parameter, "memmap=".
> >> >> >> >>>
> >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >> >> >>> e820 table.
> >> >> >> >>
> >> >> >> >> Thanks. I remember that you have explained it before.
> >> >> >> >>
> >> >> >> >> -Takahiro AKASHI
> >> >> >> >>
> >> >> >> >>> [snip]
> >> >> >> >>>
> >> >> >> >>> Thanks
> >> >> >> >>> Dave
> >> >> >
> >> >> > ===8<==
> >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> >> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >> >> >
> >> >> > ---
> >> >> >  arch/arm64/mm/init.c | 10 ++++++++--
> >> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >> >> >
> >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> >> > index 00e7b900ca41..8175db94257b 100644
> >> >> > --- a/arch/arm64/mm/init.c
> >> >> > +++ b/arch/arm64/mm/init.c
> >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >> >> >         struct memblock_region reg = {
> >> >> >                 .size = 0,
> >> >> >         };
> >> >> > +       u64 idx;
> >> >> > +       phys_addr_t start, end;
> >> >> >
> >> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >
> >> >> > -       if (reg.size)
> >> >> > -               memblock_cap_memory_range(reg.base, reg.size);
> >> >> > +       if (reg.size) {
> >> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> >> >> > +                                       &start, &end, NULL)
> >> >> > +                       memblock_mark_nomap(start, end - start);
> >> >> > +               memblock_clear_nomap(reg.base, reg.size);
> >> >> > +       }
> >> >> >  }
> >> >> >
> >> >> >  void __init arm64_memblock_init(void)
> >> >> > --
> >> >> > 2.15.1
> >> >> >
> >> >>
> >> >> Thanks for the patch. After applying this on top of
> >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> >> >> crashkernel boot no longer hangs while trying to access the acpi
> >> >> tables.
> >> >>
> >> >> However I notice a minor issue. Please see the log below for
> >> >> reference, the following message keeps spamming the console but I see
> >> >> the crashkernel boot proceed further.:
> >> >>
> >> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> >> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> >> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> >> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> >> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
> >> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> >> >> page_structs
> >> >>
> >> >> [snip..]
> >> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> >> >> page_structs
> >> >
> >> > These messages shows that some "struct page" data are allocated on remote
> >> > (numa) nodes.
> >> > Since on your crash dump kernel, all the usable system memory (starting
> >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
> >> >
> >> > In my best guess, you can ingore them except for some performance penality.
> >> > This may be one side-effect.
> >> >
> >> > So does your crash dump kernel now boot successfully?
> >> >
> >>
> >> Indeed. The crash dump kernel now boots successfully and the crash
> >> dump core can be saved properly as well (I tried saving it to local
> >> disk).
> >
> > Thank you for the confirmation.
> > (I'd like to suggest you to examine the core dump with crash utility.)
> >
> >> However, the 'potential offnode page_structs' WARN messages hog the
> >> console and delay crashkernel boot for a significant duration, which
> >> can be irritating.
> >>
> >> Can we also consider ratelimiting this WARNING message [which seems to
> >> come from vmemmap_verify()] if invoked in the context of crash kernel,
> >> in addition to making the above change suggested by  you.
> >
> > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > I hope that adding "numa=off" to kernel command line should also work.
> 
> Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> my initial thought process as well, but I am not sure if this will
> cause any regressions on aarch64 systems which use crashdump feature.

It should be fine since we use numa=off by default for all other arches
ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
mm component memory usage. 

> 
> I think the 2nd solution, i.e limiting the warn message print
> frequency might be a better option. Can you please add the following
> patch (may be as a separate one) and send it along the patch which
> marks all areas other than the crashkernel region being passed to the
> crashkernel as NOMAP, so that we can get this issue fixed in upstream
> aarch64 kernel:
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 17acf01791fa..4c13fe3c644d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>         int actual_node = early_pfn_to_nid(pfn);
> 
>         if (node_distance(actual_node, node) > LOCAL_DISTANCE)
> -               pr_warn("[%lx-%lx] potential offnode page_structs\n",
> +               pr_warn_once("[%lx-%lx] potential offnode page_structs\n",
>                         start, end - 1);
>  }
> 
> I have tested this solution on huawei taishan board and can boot
> crashkernel successfully and also save the crash core properly
> (without the  console warn message flooding which used to hold up the
> crashkernel boot).
> 
> Thanks,
> Bhupesh

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  1:32                                                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  1:32 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	AKASHI Takahiro, James Morse, Bhupesh SHARMA, linux-arm-kernel

On 12/26/17 at 01:44am, Bhupesh Sharma wrote:
> On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote:
> >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro
> >> <takahiro.akashi@linaro.org> wrote:
> >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote:
> >> >> Hello Akashi,
> >> >>
> >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro
> >> >> <takahiro.akashi@linaro.org> wrote:
> >> >> > Bhupesh,
> >> >> >
> >> >> > Can you test the patch attached below, please?
> >> >> >
> >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory
> >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly
> >> >> > exporting them via usable-memory-range.
> >> >> > (I still have to figure out what the side-effect of this patch is.)
> >> >> >
> >> >> > Thanks,
> >> >> > -Takahiro AKASHI
> >> >> >
> >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote:
> >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel
> >> >> >> <ard.biesheuvel@linaro.org> wrote:
> >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro
> >> >> >> > <takahiro.akashi@linaro.org> wrote:
> >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote:
> >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote:
> >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote:
> >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro
> >> >> >> >>> > > <takahiro.akashi@linaro.org> wrote:
> >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote:
> >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro
> >> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote:
> >> >> >> >>> > > >> > Bhupesh, Ard,
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote:
> >> >> >> >>> > > >> >> Hi Ard, Akashi
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> > (snip)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the
> >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to
> >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any
> >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory.
> >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt
> >> >> >> >>> > > >> >> , for details)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > Right.
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only
> >> >> >> >>> > > >> >> with the crashkernel memory range:
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>                 /* add linux,usable-memory-range */
> >> >> >> >>> > > >> >>                 nodeoffset = fdt_path_offset(new_buf, "/chosen");
> >> >> >> >>> > > >> >>                 result = fdt_setprop_range(new_buf, nodeoffset,
> >> >> >> >>> > > >> >>                                 PROP_USABLE_MEM_RANGE, &crash_reserved_mem,
> >> >> >> >>> > > >> >>                                 address_cells, size_cells);
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465
> >> >> >> >>> > > >> >> , for details)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether
> >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As,
> >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with
> >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges'
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this
> >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same:
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> [snip..]
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> Reserved memory range
> >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> Coredump memory ranges
> >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0)
> >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0)
> >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0)
> >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0)
> >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0)
> >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0)
> >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0)
> >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0)
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the
> >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside
> >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below):
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void)
> >> >> >> >>> > > >> >> {
> >> >> >> >>> > > >> >>         struct memblock_region reg = {
> >> >> >> >>> > > >> >>                 .size = 0,
> >> >> >> >>> > > >> >>         };
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >>         if (reg.size)
> >> >> >> >>> > > >> >>                 //memblock_cap_memory_range(reg.base, reg.size); /*
> >> >> >> >>> > > >> >> comment this out */
> >> >> >> >>> > > >> >> }
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on
> >> >> >> >>> > > >> > memory contents of the *crashed* kernel.
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem.
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not
> >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to
> >> >> >> >>> > > >> >> fail.
> >> >> >> >>> > > >> >>
> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are
> >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the
> >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will
> >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the
> >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range'
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >> > I still don't understand why we need to carry over the information
> >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings,
> >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of
> >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them?
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >>
> >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after
> >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and
> >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec
> >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are
> >> >> >> >>> > > >> memblock_reserve()'d now.
> >> >> >> >>> > > >
> >> >> >> >>> > > > For my better understandings, who is actually accessing such regions
> >> >> >> >>> > > > during boot time, uefi itself or efistub?
> >> >> >> >>> > > >
> >> >> >> >>> > >
> >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For
> >> >> >> >>> > > instance, on QEMU we have
> >> >> >> >>> > >
> >> >> >> >>> > >  ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS )
> >> >> >> >>> > >  ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS  BXPCFACP 00000001
> >> >> >> >>> > >   01000013)
> >> >> >> >>> > >  ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS  BXPCFACP 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS  BXPCDSDT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS  BXPCAPIC 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS  BXPCGTDT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS  BXPCMCFG 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS  BXPCSPCR 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >  ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS  BXPCIORT 00000001
> >> >> >> >>> > > BXPC 00000001)
> >> >> >> >>> > >
> >> >> >> >>> > > covered by
> >> >> >> >>> > >
> >> >> >> >>> > >  efi:   0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...]
> >> >> >> >>> > >  ...
> >> >> >> >>> > >  efi:   0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...]
> >> >> >> >>> >
> >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting
> >> >> >> >>> > UEFI boot services.
> >> >> >> >>> >
> >> >> >> >>> > >
> >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table
> >> >> >> >>> > > >> when booting the next kernel.
> >> >> >> >>> > > >
> >> >> >> >>> > > > not really.
> >> >> >> >>> > > >
> >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code
> >> >> >> >>> > > >> > on crash dump kernel?)
> >> >> >> >>> > > >> >
> >> >> >> >>> > > >>
> >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim
> >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other
> >> >> >> >>> > > >> memblock_reserve regions may be affected as well)
> >> >> >> >>> > > >
> >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing
> >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one.
> >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is
> >> >> >> >>> > > > exposed to user space (via proc/iomem).
> >> >> >> >>> > > >
> >> >> >> >>> > >
> >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them
> >> >> >> >>> > > as 'System RAM'. Do you think that could solve this?
> >> >> >> >>> >
> >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and
> >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order
> >> >> >> >>> > not to allocate them as part of crash kernel's memory.
> >> >> >> >>> >
> >> >> >> >>> > But I'm not still convinced that we should export them in useable-
> >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through
> >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram
> >> >> >> >>> > (or memblocks), I guess.
> >> >> >> >>> >     -> Bhupesh?
> >> >> >> >>>
> >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize
> >> >> >> >>> them.  If no "e820" like interfaces shouldn't kernel reinitialize all
> >> >> >> >>> the memory according to the efi memmap?  For kdump kernel anything other
> >> >> >> >>> than usable memory (which is from the dt node instead) should be
> >> >> >> >>> reinitialized according to efi passed info, no?
> >> >> >> >>
> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory
> >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as
> >> >> >> >> usable-memory-range by fdt_enforce_memory_region().
> >> >> >> >>
> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well
> >> >> >> >> with multiple entries in usable-memory-range.
> >> >> >> >>
> >> >> >> >
> >> >> >> > In any case, the root of the problem is that memory regions lose their
> >> >> >> > 'memory' annotation due to the way the memory map is mangled before
> >> >> >> > being supplied to the kexec kernel.
> >> >> >> >
> >> >> >> > Would it be possible to classify all memory that we want to hide from
> >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped
> >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(),
> >> >> >> > so this seems to be the most appropriate way to deal with the host
> >> >> >> > kernel's memory contents.
> >> >> >>
> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to
> >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel
> >> >> >> be better? Because its indirectly achieving a similar objective
> >> >> >> (although may be a subset of all System RAM regions on the primary
> >> >> >> kernel's memory).
> >> >> >>
> >> >> >> I am not aware of the background about the current kexec-tools
> >> >> >> implementation where we add only the crashkernel range to the dtb
> >> >> >> being passed to the crashkernel.
> >> >> >>
> >> >> >> Probably Akashi can answer better, as to how we arrived at this design
> >> >> >> approach and why we didn't want to expose all System RAM regions (i.e.
> >> >> >> ! NOMPAP regions) to the crashkernel.
> >> >> >>
> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (!
> >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we
> >> >> >> finalized on this design approach, but this is something which is just
> >> >> >> my guess.
> >> >> >>
> >> >> >> Regards,
> >> >> >> Bhupesh
> >> >> >>
> >> >> >> >>> >
> >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel
> >> >> >> >>> > via a kernel command line parameter, "memmap=".
> >> >> >> >>>
> >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via
> >> >> >> >>> e820 table.
> >> >> >> >>
> >> >> >> >> Thanks. I remember that you have explained it before.
> >> >> >> >>
> >> >> >> >> -Takahiro AKASHI
> >> >> >> >>
> >> >> >> >>> [snip]
> >> >> >> >>>
> >> >> >> >>> Thanks
> >> >> >> >>> Dave
> >> >> >
> >> >> > ===8<==
> >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001
> >> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
> >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900
> >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP
> >> >> >
> >> >> > ---
> >> >> >  arch/arm64/mm/init.c | 10 ++++++++--
> >> >> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >> >> >
> >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> >> > index 00e7b900ca41..8175db94257b 100644
> >> >> > --- a/arch/arm64/mm/init.c
> >> >> > +++ b/arch/arm64/mm/init.c
> >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void)
> >> >> >         struct memblock_region reg = {
> >> >> >                 .size = 0,
> >> >> >         };
> >> >> > +       u64 idx;
> >> >> > +       phys_addr_t start, end;
> >> >> >
> >> >> >         of_scan_flat_dt(early_init_dt_scan_usablemem, &reg);
> >> >> >
> >> >> > -       if (reg.size)
> >> >> > -               memblock_cap_memory_range(reg.base, reg.size);
> >> >> > +       if (reg.size) {
> >> >> > +               for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE,
> >> >> > +                                       &start, &end, NULL)
> >> >> > +                       memblock_mark_nomap(start, end - start);
> >> >> > +               memblock_clear_nomap(reg.base, reg.size);
> >> >> > +       }
> >> >> >  }
> >> >> >
> >> >> >  void __init arm64_memblock_init(void)
> >> >> > --
> >> >> > 2.15.1
> >> >> >
> >> >>
> >> >> Thanks for the patch. After applying this on top of
> >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the
> >> >> crashkernel boot no longer hangs while trying to access the acpi
> >> >> tables.
> >> >>
> >> >> However I notice a minor issue. Please see the log below for
> >> >> reference, the following message keeps spamming the console but I see
> >> >> the crashkernel boot proceed further.:
> >> >>
> >> >> [    0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3
> >> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff]
> >> >> [    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff]
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff]
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff]
> >> >> [    0.000000] NUMA: NODE_DATA(1) on node 0
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff]
> >> >> [    0.000000] NUMA: NODE_DATA(2) on node 0
> >> >> [    0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff]
> >> >> [    0.000000] NUMA: NODE_DATA(3) on node 0
> >> >> [    0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode
> >> >> page_structs
> >> >> [    0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode
> >> >> page_structs
> >> >>
> >> >> [snip..]
> >> >> [    0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode
> >> >> page_structs
> >> >
> >> > These messages shows that some "struct page" data are allocated on remote
> >> > (numa) nodes.
> >> > Since on your crash dump kernel, all the usable system memory (starting
> >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations.
> >> >
> >> > In my best guess, you can ingore them except for some performance penality.
> >> > This may be one side-effect.
> >> >
> >> > So does your crash dump kernel now boot successfully?
> >> >
> >>
> >> Indeed. The crash dump kernel now boots successfully and the crash
> >> dump core can be saved properly as well (I tried saving it to local
> >> disk).
> >
> > Thank you for the confirmation.
> > (I'd like to suggest you to examine the core dump with crash utility.)
> >
> >> However, the 'potential offnode page_structs' WARN messages hog the
> >> console and delay crashkernel boot for a significant duration, which
> >> can be irritating.
> >>
> >> Can we also consider ratelimiting this WARNING message [which seems to
> >> come from vmemmap_verify()] if invoked in the context of crash kernel,
> >> in addition to making the above change suggested by  you.
> >
> > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > I hope that adding "numa=off" to kernel command line should also work.
> 
> Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> my initial thought process as well, but I am not sure if this will
> cause any regressions on aarch64 systems which use crashdump feature.

It should be fine since we use numa=off by default for all other arches
ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
mm component memory usage. 

> 
> I think the 2nd solution, i.e limiting the warn message print
> frequency might be a better option. Can you please add the following
> patch (may be as a separate one) and send it along the patch which
> marks all areas other than the crashkernel region being passed to the
> crashkernel as NOMAP, so that we can get this issue fixed in upstream
> aarch64 kernel:
> 
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 17acf01791fa..4c13fe3c644d 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>         int actual_node = early_pfn_to_nid(pfn);
> 
>         if (node_distance(actual_node, node) > LOCAL_DISTANCE)
> -               pr_warn("[%lx-%lx] potential offnode page_structs\n",
> +               pr_warn_once("[%lx-%lx] potential offnode page_structs\n",
>                         start, end - 1);
>  }
> 
> I have tested this solution on huawei taishan board and can boot
> crashkernel successfully and also save the crash core properly
> (without the  console warn message flooding which used to hold up the
> crashkernel boot).
> 
> Thanks,
> Bhupesh

Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  1:32                                                                                             ` Dave Young
  (?)
@ 2017-12-26  1:35                                                                                                 ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  1:35 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: AKASHI Takahiro, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

[snip]
> > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > I hope that adding "numa=off" to kernel command line should also work.
> > 
> > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > my initial thought process as well, but I am not sure if this will
> > cause any regressions on aarch64 systems which use crashdump feature.
> 
> It should be fine since we use numa=off by default for all other arches
> ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> mm component memory usage. 
> 

Forgot to say I means in RHEL and Fedora we use numa=off for kdump..

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  1:35                                                                                                 ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  1:35 UTC (permalink / raw)
  To: linux-arm-kernel

[snip]
> > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > I hope that adding "numa=off" to kernel command line should also work.
> > 
> > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > my initial thought process as well, but I am not sure if this will
> > cause any regressions on aarch64 systems which use crashdump feature.
> 
> It should be fine since we use numa=off by default for all other arches
> ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> mm component memory usage. 
> 

Forgot to say I means in RHEL and Fedora we use numa=off for kdump..

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  1:35                                                                                                 ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  1:35 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	AKASHI Takahiro, James Morse, Bhupesh SHARMA, linux-arm-kernel

[snip]
> > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > I hope that adding "numa=off" to kernel command line should also work.
> > 
> > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > my initial thought process as well, but I am not sure if this will
> > cause any regressions on aarch64 systems which use crashdump feature.
> 
> It should be fine since we use numa=off by default for all other arches
> ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> mm component memory usage. 
> 

Forgot to say I means in RHEL and Fedora we use numa=off for kdump..

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  1:35                                                                                                 ` Dave Young
  (?)
@ 2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-26  2:28 UTC (permalink / raw)
  To: Dave Young
  Cc: Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> [snip]
> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > I hope that adding "numa=off" to kernel command line should also work.
> > > 
> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > my initial thought process as well, but I am not sure if this will
> > > cause any regressions on aarch64 systems which use crashdump feature.
> > 
> > It should be fine since we use numa=off by default for all other arches
> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > mm component memory usage. 
> > 
> 
> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..

Thank you for the clarification.
(It might be better to make numa off automatically if maxcpus == 0 (and 1?).)

-Takahiro AKASHI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-26  2:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> [snip]
> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > I hope that adding "numa=off" to kernel command line should also work.
> > > 
> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > my initial thought process as well, but I am not sure if this will
> > > cause any regressions on aarch64 systems which use crashdump feature.
> > 
> > It should be fine since we use numa=off by default for all other arches
> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > mm component memory usage. 
> > 
> 
> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..

Thank you for the clarification.
(It might be better to make numa off automatically if maxcpus == 0 (and 1?).)

-Takahiro AKASHI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2017-12-26  2:28 UTC (permalink / raw)
  To: Dave Young
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming,
	Bhupesh Sharma, kexec, James Morse, Bhupesh SHARMA,
	linux-arm-kernel

On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> [snip]
> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > I hope that adding "numa=off" to kernel command line should also work.
> > > 
> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > my initial thought process as well, but I am not sure if this will
> > > cause any regressions on aarch64 systems which use crashdump feature.
> > 
> > It should be fine since we use numa=off by default for all other arches
> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > mm component memory usage. 
> > 
> 
> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..

Thank you for the clarification.
(It might be better to make numa off automatically if maxcpus == 0 (and 1?).)

-Takahiro AKASHI

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
  (?)
@ 2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-26  2:56 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>> [snip]
>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>> > > > I hope that adding "numa=off" to kernel command line should also work.
>> > >
>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>> > > my initial thought process as well, but I am not sure if this will
>> > > cause any regressions on aarch64 systems which use crashdump feature.
>> >
>> > It should be fine since we use numa=off by default for all other arches
>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>> > mm component memory usage.
>> >
>>
>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>
> Thank you for the clarification.
> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>

Not sure if we can leave this to the distribution-specific kdump
scripts (as the crashkernel boot can be held up for sufficient time
and may appear stuck). The distribution scripts may be different (for
e.g. ubuntu and RHEL/fedora) across distributions and may have
different bootarg options.

So how about considering a kernel fix only which doesn't require
relying on changing the distribution-specific kdump scripts, as we
should avoid introducing a regression while trying to fix a regression
:)

Just my 2 cents.

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-26  2:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>> [snip]
>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>> > > > I hope that adding "numa=off" to kernel command line should also work.
>> > >
>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>> > > my initial thought process as well, but I am not sure if this will
>> > > cause any regressions on aarch64 systems which use crashdump feature.
>> >
>> > It should be fine since we use numa=off by default for all other arches
>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>> > mm component memory usage.
>> >
>>
>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>
> Thank you for the clarification.
> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>

Not sure if we can leave this to the distribution-specific kdump
scripts (as the crashkernel boot can be held up for sufficient time
and may appear stuck). The distribution scripts may be different (for
e.g. ubuntu and RHEL/fedora) across distributions and may have
different bootarg options.

So how about considering a kernel fix only which doesn't require
relying on changing the distribution-specific kdump scripts, as we
should avoid introducing a regression while trying to fix a regression
:)

Just my 2 cents.

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2017-12-26  2:56 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi,
	Mark Rutland, James Morse, kexec

On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>> [snip]
>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>> > > > I hope that adding "numa=off" to kernel command line should also work.
>> > >
>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>> > > my initial thought process as well, but I am not sure if this will
>> > > cause any regressions on aarch64 systems which use crashdump feature.
>> >
>> > It should be fine since we use numa=off by default for all other arches
>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>> > mm component memory usage.
>> >
>>
>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>
> Thank you for the clarification.
> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>

Not sure if we can leave this to the distribution-specific kdump
scripts (as the crashkernel boot can be held up for sufficient time
and may appear stuck). The distribution scripts may be different (for
e.g. ubuntu and RHEL/fedora) across distributions and may have
different bootarg options.

So how about considering a kernel fix only which doesn't require
relying on changing the distribution-specific kdump scripts, as we
should avoid introducing a regression while trying to fix a regression
:)

Just my 2 cents.

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
  (?)
@ 2017-12-26  6:56                                                                                                         ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  6:56 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 12/26/17 at 11:28am, AKASHI Takahiro wrote:
> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > [snip]
> > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > > I hope that adding "numa=off" to kernel command line should also work.
> > > > 
> > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > > my initial thought process as well, but I am not sure if this will
> > > > cause any regressions on aarch64 systems which use crashdump feature.
> > > 
> > > It should be fine since we use numa=off by default for all other arches
> > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > > mm component memory usage. 
> > > 
> > 
> > Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> 
> Thank you for the clarification.
> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)

Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off
I'm not sure why I do not see the warning messages on x86
machines, maybe something arm64 specific?

> 
> -Takahiro AKASHI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  6:56                                                                                                         ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  6:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/26/17 at 11:28am, AKASHI Takahiro wrote:
> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > [snip]
> > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > > I hope that adding "numa=off" to kernel command line should also work.
> > > > 
> > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > > my initial thought process as well, but I am not sure if this will
> > > > cause any regressions on aarch64 systems which use crashdump feature.
> > > 
> > > It should be fine since we use numa=off by default for all other arches
> > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > > mm component memory usage. 
> > > 
> > 
> > Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> 
> Thank you for the clarification.
> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)

Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off
I'm not sure why I do not see the warning messages on x86
machines, maybe something arm64 specific?

> 
> -Takahiro AKASHI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  6:56                                                                                                         ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  6:56 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA,
	Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland,
	James Morse, kexec

On 12/26/17 at 11:28am, AKASHI Takahiro wrote:
> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > [snip]
> > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > > I hope that adding "numa=off" to kernel command line should also work.
> > > > 
> > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > > my initial thought process as well, but I am not sure if this will
> > > > cause any regressions on aarch64 systems which use crashdump feature.
> > > 
> > > It should be fine since we use numa=off by default for all other arches
> > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > > mm component memory usage. 
> > > 
> > 
> > Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> 
> Thank you for the clarification.
> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)

Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off
I'm not sure why I do not see the warning messages on x86
machines, maybe something arm64 specific?

> 
> -Takahiro AKASHI

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
  (?)
@ 2017-12-26  6:58                                                                                                             ` Dave Young
  -1 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  6:58 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: AKASHI Takahiro, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 12/26/17 at 08:26am, Bhupesh Sharma wrote:
> On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> >> [snip]
> >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> >> > > > I hope that adding "numa=off" to kernel command line should also work.
> >> > >
> >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> >> > > my initial thought process as well, but I am not sure if this will
> >> > > cause any regressions on aarch64 systems which use crashdump feature.
> >> >
> >> > It should be fine since we use numa=off by default for all other arches
> >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> >> > mm component memory usage.
> >> >
> >>
> >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> >
> > Thank you for the clarification.
> > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> >
> 
> Not sure if we can leave this to the distribution-specific kdump
> scripts (as the crashkernel boot can be held up for sufficient time
> and may appear stuck). The distribution scripts may be different (for
> e.g. ubuntu and RHEL/fedora) across distributions and may have
> different bootarg options.

Personally I think distribution should take care of this param as for
kdump.  But as AKASHI said it could be a issue for 1st kernel with
nr_cpus=1 booting.  Problem is why we do not see this issue on other
machines.

> 
> So how about considering a kernel fix only which doesn't require
> relying on changing the distribution-specific kdump scripts, as we
> should avoid introducing a regression while trying to fix a regression
> :)
> 
> Just my 2 cents.
> 
> Thanks,
> Bhupesh

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  6:58                                                                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  6:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/26/17 at 08:26am, Bhupesh Sharma wrote:
> On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> >> [snip]
> >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> >> > > > I hope that adding "numa=off" to kernel command line should also work.
> >> > >
> >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> >> > > my initial thought process as well, but I am not sure if this will
> >> > > cause any regressions on aarch64 systems which use crashdump feature.
> >> >
> >> > It should be fine since we use numa=off by default for all other arches
> >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> >> > mm component memory usage.
> >> >
> >>
> >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> >
> > Thank you for the clarification.
> > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> >
> 
> Not sure if we can leave this to the distribution-specific kdump
> scripts (as the crashkernel boot can be held up for sufficient time
> and may appear stuck). The distribution scripts may be different (for
> e.g. ubuntu and RHEL/fedora) across distributions and may have
> different bootarg options.

Personally I think distribution should take care of this param as for
kdump.  But as AKASHI said it could be a issue for 1st kernel with
nr_cpus=1 booting.  Problem is why we do not see this issue on other
machines.

> 
> So how about considering a kernel fix only which doesn't require
> relying on changing the distribution-specific kdump scripts, as we
> should avoid introducing a regression while trying to fix a regression
> :)
> 
> Just my 2 cents.
> 
> Thanks,
> Bhupesh

Thanks
Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2017-12-26  6:58                                                                                                             ` Dave Young
  0 siblings, 0 replies; 135+ messages in thread
From: Dave Young @ 2017-12-26  6:58 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	AKASHI Takahiro, James Morse, Bhupesh SHARMA, linux-arm-kernel

On 12/26/17 at 08:26am, Bhupesh Sharma wrote:
> On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
> > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> >> [snip]
> >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> >> > > > I hope that adding "numa=off" to kernel command line should also work.
> >> > >
> >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> >> > > my initial thought process as well, but I am not sure if this will
> >> > > cause any regressions on aarch64 systems which use crashdump feature.
> >> >
> >> > It should be fine since we use numa=off by default for all other arches
> >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> >> > mm component memory usage.
> >> >
> >>
> >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> >
> > Thank you for the clarification.
> > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> >
> 
> Not sure if we can leave this to the distribution-specific kdump
> scripts (as the crashkernel boot can be held up for sufficient time
> and may appear stuck). The distribution scripts may be different (for
> e.g. ubuntu and RHEL/fedora) across distributions and may have
> different bootarg options.

Personally I think distribution should take care of this param as for
kdump.  But as AKASHI said it could be a issue for 1st kernel with
nr_cpus=1 booting.  Problem is why we do not see this issue on other
machines.

> 
> So how about considering a kernel fix only which doesn't require
> relying on changing the distribution-specific kdump scripts, as we
> should avoid introducing a regression while trying to fix a regression
> :)
> 
> Just my 2 cents.
> 
> Thanks,
> Bhupesh

Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
  (?)
@ 2018-01-08 20:00                                                                                                             ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2018-01-08 20:00 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hello Akashi,

On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>>> [snip]
>>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>>> > > > I hope that adding "numa=off" to kernel command line should also work.
>>> > >
>>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>>> > > my initial thought process as well, but I am not sure if this will
>>> > > cause any regressions on aarch64 systems which use crashdump feature.
>>> >
>>> > It should be fine since we use numa=off by default for all other arches
>>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>>> > mm component memory usage.
>>> >
>>>
>>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>>
>> Thank you for the clarification.
>> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>>
>
> Not sure if we can leave this to the distribution-specific kdump
> scripts (as the crashkernel boot can be held up for sufficient time
> and may appear stuck). The distribution scripts may be different (for
> e.g. ubuntu and RHEL/fedora) across distributions and may have
> different bootarg options.
>
> So how about considering a kernel fix only which doesn't require
> relying on changing the distribution-specific kdump scripts, as we
> should avoid introducing a regression while trying to fix a regression
> :)
>
> Just my 2 cents.
>

Sorry for the delay but I was on holidays in the last week.

Are you planning to send a patch to fix this issue or do you want me
to send a RFC version instead?

i think this is a blocking issue for aarch64 kdump support on newer
kernels (v4.14) and we are already hearing about this issue from other
users as well, so it would be great to get this fixed now that we have
root-caused the issue and found a possible way around.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-08 20:00                                                                                                             ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2018-01-08 20:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Akashi,

On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>>> [snip]
>>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>>> > > > I hope that adding "numa=off" to kernel command line should also work.
>>> > >
>>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>>> > > my initial thought process as well, but I am not sure if this will
>>> > > cause any regressions on aarch64 systems which use crashdump feature.
>>> >
>>> > It should be fine since we use numa=off by default for all other arches
>>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>>> > mm component memory usage.
>>> >
>>>
>>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>>
>> Thank you for the clarification.
>> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>>
>
> Not sure if we can leave this to the distribution-specific kdump
> scripts (as the crashkernel boot can be held up for sufficient time
> and may appear stuck). The distribution scripts may be different (for
> e.g. ubuntu and RHEL/fedora) across distributions and may have
> different bootarg options.
>
> So how about considering a kernel fix only which doesn't require
> relying on changing the distribution-specific kdump scripts, as we
> should avoid introducing a regression while trying to fix a regression
> :)
>
> Just my 2 cents.
>

Sorry for the delay but I was on holidays in the last week.

Are you planning to send a patch to fix this issue or do you want me
to send a RFC version instead?

i think this is a blocking issue for aarch64 kdump support on newer
kernels (v4.14) and we are already hearing about this issue from other
users as well, so it would be great to get this fixed now that we have
root-caused the issue and found a possible way around.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-08 20:00                                                                                                             ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2018-01-08 20:00 UTC (permalink / raw)
  To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel,
	Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi,
	Mark Rutland, James Morse, kexec

Hello Akashi,

On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> <takahiro.akashi@linaro.org> wrote:
>> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>>> [snip]
>>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>>> > > > I hope that adding "numa=off" to kernel command line should also work.
>>> > >
>>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>>> > > my initial thought process as well, but I am not sure if this will
>>> > > cause any regressions on aarch64 systems which use crashdump feature.
>>> >
>>> > It should be fine since we use numa=off by default for all other arches
>>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>>> > mm component memory usage.
>>> >
>>>
>>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>>
>> Thank you for the clarification.
>> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>>
>
> Not sure if we can leave this to the distribution-specific kdump
> scripts (as the crashkernel boot can be held up for sufficient time
> and may appear stuck). The distribution scripts may be different (for
> e.g. ubuntu and RHEL/fedora) across distributions and may have
> different bootarg options.
>
> So how about considering a kernel fix only which doesn't require
> relying on changing the distribution-specific kdump scripts, as we
> should avoid introducing a regression while trying to fix a regression
> :)
>
> Just my 2 cents.
>

Sorry for the delay but I was on holidays in the last week.

Are you planning to send a patch to fix this issue or do you want me
to send a RFC version instead?

i think this is a blocking issue for aarch64 kdump support on newer
kernels (v4.14) and we are already hearing about this issue from other
users as well, so it would be great to get this fixed now that we have
root-caused the issue and found a possible way around.

Regards,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2018-01-08 20:00                                                                                                             ` Bhupesh Sharma
  (?)
@ 2018-01-09  4:42                                                                                                                 ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  4:42 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Dave Young, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Bhupesh,

On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote:
> Hello Akashi,
> 
> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> >>> [snip]
> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> >>> > > > I hope that adding "numa=off" to kernel command line should also work.
> >>> > >
> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> >>> > > my initial thought process as well, but I am not sure if this will
> >>> > > cause any regressions on aarch64 systems which use crashdump feature.
> >>> >
> >>> > It should be fine since we use numa=off by default for all other arches
> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> >>> > mm component memory usage.
> >>> >
> >>>
> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> >>
> >> Thank you for the clarification.
> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> >>
> >
> > Not sure if we can leave this to the distribution-specific kdump
> > scripts (as the crashkernel boot can be held up for sufficient time
> > and may appear stuck). The distribution scripts may be different (for
> > e.g. ubuntu and RHEL/fedora) across distributions and may have
> > different bootarg options.
> >
> > So how about considering a kernel fix only which doesn't require
> > relying on changing the distribution-specific kdump scripts, as we
> > should avoid introducing a regression while trying to fix a regression
> > :)
> >
> > Just my 2 cents.
> >
> 
> Sorry for the delay but I was on holidays in the last week.
> 
> Are you planning to send a patch to fix this issue or do you want me
> to send a RFC version instead?

I should have submitted my own patch before my new year holidays,
but I will do so as soon as possible.

Thanks,
-Takahiro AKASHI


> i think this is a blocking issue for aarch64 kdump support on newer
> kernels (v4.14) and we are already hearing about this issue from other
> users as well, so it would be great to get this fixed now that we have
> root-caused the issue and found a possible way around.
> 
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09  4:42                                                                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  4:42 UTC (permalink / raw)
  To: linux-arm-kernel

Bhupesh,

On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote:
> Hello Akashi,
> 
> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> >>> [snip]
> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> >>> > > > I hope that adding "numa=off" to kernel command line should also work.
> >>> > >
> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> >>> > > my initial thought process as well, but I am not sure if this will
> >>> > > cause any regressions on aarch64 systems which use crashdump feature.
> >>> >
> >>> > It should be fine since we use numa=off by default for all other arches
> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> >>> > mm component memory usage.
> >>> >
> >>>
> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> >>
> >> Thank you for the clarification.
> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> >>
> >
> > Not sure if we can leave this to the distribution-specific kdump
> > scripts (as the crashkernel boot can be held up for sufficient time
> > and may appear stuck). The distribution scripts may be different (for
> > e.g. ubuntu and RHEL/fedora) across distributions and may have
> > different bootarg options.
> >
> > So how about considering a kernel fix only which doesn't require
> > relying on changing the distribution-specific kdump scripts, as we
> > should avoid introducing a regression while trying to fix a regression
> > :)
> >
> > Just my 2 cents.
> >
> 
> Sorry for the delay but I was on holidays in the last week.
> 
> Are you planning to send a patch to fix this issue or do you want me
> to send a RFC version instead?

I should have submitted my own patch before my new year holidays,
but I will do so as soon as possible.

Thanks,
-Takahiro AKASHI


> i think this is a blocking issue for aarch64 kdump support on newer
> kernels (v4.14) and we are already hearing about this issue from other
> users as well, so it would be great to get this fixed now that we have
> root-caused the issue and found a possible way around.
> 
> Regards,
> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09  4:42                                                                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  4:42 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec,
	James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel

Bhupesh,

On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote:
> Hello Akashi,
> 
> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> >>> [snip]
> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> >>> > > > I hope that adding "numa=off" to kernel command line should also work.
> >>> > >
> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> >>> > > my initial thought process as well, but I am not sure if this will
> >>> > > cause any regressions on aarch64 systems which use crashdump feature.
> >>> >
> >>> > It should be fine since we use numa=off by default for all other arches
> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> >>> > mm component memory usage.
> >>> >
> >>>
> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> >>
> >> Thank you for the clarification.
> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> >>
> >
> > Not sure if we can leave this to the distribution-specific kdump
> > scripts (as the crashkernel boot can be held up for sufficient time
> > and may appear stuck). The distribution scripts may be different (for
> > e.g. ubuntu and RHEL/fedora) across distributions and may have
> > different bootarg options.
> >
> > So how about considering a kernel fix only which doesn't require
> > relying on changing the distribution-specific kdump scripts, as we
> > should avoid introducing a regression while trying to fix a regression
> > :)
> >
> > Just my 2 cents.
> >
> 
> Sorry for the delay but I was on holidays in the last week.
> 
> Are you planning to send a patch to fix this issue or do you want me
> to send a RFC version instead?

I should have submitted my own patch before my new year holidays,
but I will do so as soon as possible.

Thanks,
-Takahiro AKASHI


> i think this is a blocking issue for aarch64 kdump support on newer
> kernels (v4.14) and we are already hearing about this issue from other
> users as well, so it would be great to get this fixed now that we have
> root-caused the issue and found a possible way around.
> 
> Regards,
> Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  6:56                                                                                                         ` Dave Young
  (?)
@ 2018-01-09  5:02                                                                                                             ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  5:02 UTC (permalink / raw)
  To: Dave Young
  Cc: Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Dec 26, 2017 at 02:56:36PM +0800, Dave Young wrote:
> On 12/26/17 at 11:28am, AKASHI Takahiro wrote:
> > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > > [snip]
> > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > > > I hope that adding "numa=off" to kernel command line should also work.
> > > > > 
> > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > > > my initial thought process as well, but I am not sure if this will
> > > > > cause any regressions on aarch64 systems which use crashdump feature.
> > > > 
> > > > It should be fine since we use numa=off by default for all other arches
> > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > > > mm component memory usage. 
> > > > 
> > > 
> > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> > 
> > Thank you for the clarification.
> > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> 
> Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off
> I'm not sure why I do not see the warning messages on x86
> machines, maybe something arm64 specific?

I didn't see the messages(i.e. "potential offnode page_structs")
on arm64 qemu (with -smp 2 -numa node -numa node).

It seems that qemu doesn't generate acpi slit(inter-node distance table).

Thanks,
-Takahiro AKASHI

> > 
> > -Takahiro AKASHI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09  5:02                                                                                                             ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  5:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 26, 2017 at 02:56:36PM +0800, Dave Young wrote:
> On 12/26/17 at 11:28am, AKASHI Takahiro wrote:
> > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > > [snip]
> > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > > > I hope that adding "numa=off" to kernel command line should also work.
> > > > > 
> > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > > > my initial thought process as well, but I am not sure if this will
> > > > > cause any regressions on aarch64 systems which use crashdump feature.
> > > > 
> > > > It should be fine since we use numa=off by default for all other arches
> > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > > > mm component memory usage. 
> > > > 
> > > 
> > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> > 
> > Thank you for the clarification.
> > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> 
> Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off
> I'm not sure why I do not see the warning messages on x86
> machines, maybe something arm64 specific?

I didn't see the messages(i.e. "potential offnode page_structs")
on arm64 qemu (with -smp 2 -numa node -numa node).

It seems that qemu doesn't generate acpi slit(inter-node distance table).

Thanks,
-Takahiro AKASHI

> > 
> > -Takahiro AKASHI

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09  5:02                                                                                                             ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  5:02 UTC (permalink / raw)
  To: Dave Young
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming,
	Bhupesh Sharma, kexec, James Morse, Bhupesh SHARMA,
	linux-arm-kernel

On Tue, Dec 26, 2017 at 02:56:36PM +0800, Dave Young wrote:
> On 12/26/17 at 11:28am, AKASHI Takahiro wrote:
> > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > > [snip]
> > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > > > > > I hope that adding "numa=off" to kernel command line should also work.
> > > > > 
> > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > > > > my initial thought process as well, but I am not sure if this will
> > > > > cause any regressions on aarch64 systems which use crashdump feature.
> > > > 
> > > > It should be fine since we use numa=off by default for all other arches
> > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > > > mm component memory usage. 
> > > > 
> > > 
> > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> > 
> > Thank you for the clarification.
> > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> 
> Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off
> I'm not sure why I do not see the warning messages on x86
> machines, maybe something arm64 specific?

I didn't see the messages(i.e. "potential offnode page_structs")
on arm64 qemu (with -smp 2 -numa node -numa node).

It seems that qemu doesn't generate acpi slit(inter-node distance table).

Thanks,
-Takahiro AKASHI

> > 
> > -Takahiro AKASHI

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2017-12-26  6:58                                                                                                             ` Dave Young
  (?)
@ 2018-01-09  5:22                                                                                                                 ` AKASHI Takahiro
  -1 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  5:22 UTC (permalink / raw)
  To: Dave Young
  Cc: Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Dec 26, 2017 at 02:58:45PM +0800, Dave Young wrote:
> On 12/26/17 at 08:26am, Bhupesh Sharma wrote:
> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > >> [snip]
> > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > >> > > > I hope that adding "numa=off" to kernel command line should also work.
> > >> > >
> > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > >> > > my initial thought process as well, but I am not sure if this will
> > >> > > cause any regressions on aarch64 systems which use crashdump feature.
> > >> >
> > >> > It should be fine since we use numa=off by default for all other arches
> > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > >> > mm component memory usage.
> > >> >
> > >>
> > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> > >
> > > Thank you for the clarification.
> > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> > >
> > 
> > Not sure if we can leave this to the distribution-specific kdump
> > scripts (as the crashkernel boot can be held up for sufficient time
> > and may appear stuck). The distribution scripts may be different (for
> > e.g. ubuntu and RHEL/fedora) across distributions and may have
> > different bootarg options.
> 
> Personally I think distribution should take care of this param as for
> kdump.  But as AKASHI said it could be a issue for 1st kernel with
> nr_cpus=1 booting.  Problem is why we do not see this issue on other
> machines.

The issue won't be kdump-specific. Theoretically, it also takes place
when "mem=" is specified on numa.

Since we can avoid annoying messages by adding "numa=off", I'm reluctant to
suppress most of messages but the first. My suggestion here is to add some
notes in Documentation/kdump/kdump.txt regarding NUMA case.

Thanks,
Takahiro AKASHI


> > 
> > So how about considering a kernel fix only which doesn't require
> > relying on changing the distribution-specific kdump scripts, as we
> > should avoid introducing a regression while trying to fix a regression
> > :)
> > 
> > Just my 2 cents.
> > 
> > Thanks,
> > Bhupesh
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09  5:22                                                                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 26, 2017 at 02:58:45PM +0800, Dave Young wrote:
> On 12/26/17 at 08:26am, Bhupesh Sharma wrote:
> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > >> [snip]
> > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > >> > > > I hope that adding "numa=off" to kernel command line should also work.
> > >> > >
> > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > >> > > my initial thought process as well, but I am not sure if this will
> > >> > > cause any regressions on aarch64 systems which use crashdump feature.
> > >> >
> > >> > It should be fine since we use numa=off by default for all other arches
> > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > >> > mm component memory usage.
> > >> >
> > >>
> > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> > >
> > > Thank you for the clarification.
> > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> > >
> > 
> > Not sure if we can leave this to the distribution-specific kdump
> > scripts (as the crashkernel boot can be held up for sufficient time
> > and may appear stuck). The distribution scripts may be different (for
> > e.g. ubuntu and RHEL/fedora) across distributions and may have
> > different bootarg options.
> 
> Personally I think distribution should take care of this param as for
> kdump.  But as AKASHI said it could be a issue for 1st kernel with
> nr_cpus=1 booting.  Problem is why we do not see this issue on other
> machines.

The issue won't be kdump-specific. Theoretically, it also takes place
when "mem=" is specified on numa.

Since we can avoid annoying messages by adding "numa=off", I'm reluctant to
suppress most of messages but the first. My suggestion here is to add some
notes in Documentation/kdump/kdump.txt regarding NUMA case.

Thanks,
Takahiro AKASHI


> > 
> > So how about considering a kernel fix only which doesn't require
> > relying on changing the distribution-specific kdump scripts, as we
> > should avoid introducing a regression while trying to fix a regression
> > :)
> > 
> > Just my 2 cents.
> > 
> > Thanks,
> > Bhupesh
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09  5:22                                                                                                                 ` AKASHI Takahiro
  0 siblings, 0 replies; 135+ messages in thread
From: AKASHI Takahiro @ 2018-01-09  5:22 UTC (permalink / raw)
  To: Dave Young
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming,
	Bhupesh Sharma, kexec, James Morse, Bhupesh SHARMA,
	linux-arm-kernel

On Tue, Dec 26, 2017 at 02:58:45PM +0800, Dave Young wrote:
> On 12/26/17 at 08:26am, Bhupesh Sharma wrote:
> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
> > <takahiro.akashi@linaro.org> wrote:
> > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
> > >> [snip]
> > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
> > >> > > > I hope that adding "numa=off" to kernel command line should also work.
> > >> > >
> > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
> > >> > > my initial thought process as well, but I am not sure if this will
> > >> > > cause any regressions on aarch64 systems which use crashdump feature.
> > >> >
> > >> > It should be fine since we use numa=off by default for all other arches
> > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
> > >> > mm component memory usage.
> > >> >
> > >>
> > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
> > >
> > > Thank you for the clarification.
> > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
> > >
> > 
> > Not sure if we can leave this to the distribution-specific kdump
> > scripts (as the crashkernel boot can be held up for sufficient time
> > and may appear stuck). The distribution scripts may be different (for
> > e.g. ubuntu and RHEL/fedora) across distributions and may have
> > different bootarg options.
> 
> Personally I think distribution should take care of this param as for
> kdump.  But as AKASHI said it could be a issue for 1st kernel with
> nr_cpus=1 booting.  Problem is why we do not see this issue on other
> machines.

The issue won't be kdump-specific. Theoretically, it also takes place
when "mem=" is specified on numa.

Since we can avoid annoying messages by adding "numa=off", I'm reluctant to
suppress most of messages but the first. My suggestion here is to add some
notes in Documentation/kdump/kdump.txt regarding NUMA case.

Thanks,
Takahiro AKASHI


> > 
> > So how about considering a kernel fix only which doesn't require
> > relying on changing the distribution-specific kdump scripts, as we
> > should avoid introducing a regression while trying to fix a regression
> > :)
> > 
> > Just my 2 cents.
> > 
> > Thanks,
> > Bhupesh
> 
> Thanks
> Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
  2018-01-09  4:42                                                                                                                 ` AKASHI Takahiro
  (?)
@ 2018-01-09 11:46                                                                                                                     ` Bhupesh Sharma
  -1 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2018-01-09 11:46 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Dave Young, Ard Biesheuvel,
	Bhupesh SHARMA, Matt Fleming,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jan 9, 2018 at 10:12 AM, AKASHI Takahiro
<takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> Bhupesh,
>
> On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote:
>> Hello Akashi,
>>
>> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
>> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>> >>> [snip]
>> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>> >>> > > > I hope that adding "numa=off" to kernel command line should also work.
>> >>> > >
>> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>> >>> > > my initial thought process as well, but I am not sure if this will
>> >>> > > cause any regressions on aarch64 systems which use crashdump feature.
>> >>> >
>> >>> > It should be fine since we use numa=off by default for all other arches
>> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>> >>> > mm component memory usage.
>> >>> >
>> >>>
>> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>> >>
>> >> Thank you for the clarification.
>> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>> >>
>> >
>> > Not sure if we can leave this to the distribution-specific kdump
>> > scripts (as the crashkernel boot can be held up for sufficient time
>> > and may appear stuck). The distribution scripts may be different (for
>> > e.g. ubuntu and RHEL/fedora) across distributions and may have
>> > different bootarg options.
>> >
>> > So how about considering a kernel fix only which doesn't require
>> > relying on changing the distribution-specific kdump scripts, as we
>> > should avoid introducing a regression while trying to fix a regression
>> > :)
>> >
>> > Just my 2 cents.
>> >
>>
>> Sorry for the delay but I was on holidays in the last week.
>>
>> Are you planning to send a patch to fix this issue or do you want me
>> to send a RFC version instead?
>
> I should have submitted my own patch before my new year holidays,
> but I will do so as soon as possible.

Thanks for the confirmation.
I will look forward to the patches and give them a go on the arm64
boards available with me.

Regards,
Bhupesh

>
>> i think this is a blocking issue for aarch64 kdump support on newer
>> kernels (v4.14) and we are already hearing about this issue from other
>> users as well, so it would be great to get this fixed now that we have
>> root-caused the issue and found a possible way around.
>>
>> Regards,
>> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09 11:46                                                                                                                     ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2018-01-09 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 9, 2018 at 10:12 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote:
>> Hello Akashi,
>>
>> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>> >>> [snip]
>> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>> >>> > > > I hope that adding "numa=off" to kernel command line should also work.
>> >>> > >
>> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>> >>> > > my initial thought process as well, but I am not sure if this will
>> >>> > > cause any regressions on aarch64 systems which use crashdump feature.
>> >>> >
>> >>> > It should be fine since we use numa=off by default for all other arches
>> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>> >>> > mm component memory usage.
>> >>> >
>> >>>
>> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>> >>
>> >> Thank you for the clarification.
>> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>> >>
>> >
>> > Not sure if we can leave this to the distribution-specific kdump
>> > scripts (as the crashkernel boot can be held up for sufficient time
>> > and may appear stuck). The distribution scripts may be different (for
>> > e.g. ubuntu and RHEL/fedora) across distributions and may have
>> > different bootarg options.
>> >
>> > So how about considering a kernel fix only which doesn't require
>> > relying on changing the distribution-specific kdump scripts, as we
>> > should avoid introducing a regression while trying to fix a regression
>> > :)
>> >
>> > Just my 2 cents.
>> >
>>
>> Sorry for the delay but I was on holidays in the last week.
>>
>> Are you planning to send a patch to fix this issue or do you want me
>> to send a RFC version instead?
>
> I should have submitted my own patch before my new year holidays,
> but I will do so as soon as possible.

Thanks for the confirmation.
I will look forward to the patches and give them a go on the arm64
boards available with me.

Regards,
Bhupesh

>
>> i think this is a blocking issue for aarch64 kdump support on newer
>> kernels (v4.14) and we are already hearing about this issue from other
>> users as well, so it would be great to get this fixed now that we have
>> root-caused the issue and found a possible way around.
>>
>> Regards,
>> Bhupesh

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP
@ 2018-01-09 11:46                                                                                                                     ` Bhupesh Sharma
  0 siblings, 0 replies; 135+ messages in thread
From: Bhupesh Sharma @ 2018-01-09 11:46 UTC (permalink / raw)
  To: AKASHI Takahiro, Bhupesh Sharma, Dave Young, Ard Biesheuvel,
	Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi,
	Mark Rutland, James Morse, kexec

On Tue, Jan 9, 2018 at 10:12 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Bhupesh,
>
> On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote:
>> Hello Akashi,
>>
>> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro
>> > <takahiro.akashi@linaro.org> wrote:
>> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote:
>> >>> [snip]
>> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but
>> >>> > > > I hope that adding "numa=off" to kernel command line should also work.
>> >>> > >
>> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was
>> >>> > > my initial thought process as well, but I am not sure if this will
>> >>> > > cause any regressions on aarch64 systems which use crashdump feature.
>> >>> >
>> >>> > It should be fine since we use numa=off by default for all other arches
>> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save
>> >>> > mm component memory usage.
>> >>> >
>> >>>
>> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump..
>> >>
>> >> Thank you for the clarification.
>> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).)
>> >>
>> >
>> > Not sure if we can leave this to the distribution-specific kdump
>> > scripts (as the crashkernel boot can be held up for sufficient time
>> > and may appear stuck). The distribution scripts may be different (for
>> > e.g. ubuntu and RHEL/fedora) across distributions and may have
>> > different bootarg options.
>> >
>> > So how about considering a kernel fix only which doesn't require
>> > relying on changing the distribution-specific kdump scripts, as we
>> > should avoid introducing a regression while trying to fix a regression
>> > :)
>> >
>> > Just my 2 cents.
>> >
>>
>> Sorry for the delay but I was on holidays in the last week.
>>
>> Are you planning to send a patch to fix this issue or do you want me
>> to send a RFC version instead?
>
> I should have submitted my own patch before my new year holidays,
> but I will do so as soon as possible.

Thanks for the confirmation.
I will look forward to the patches and give them a go on the arm64
boards available with me.

Regards,
Bhupesh

>
>> i think this is a blocking issue for aarch64 kdump support on newer
>> kernels (v4.14) and we are already hearing about this issue from other
>> users as well, so it would be great to get this fixed now that we have
>> root-caused the issue and found a possible way around.
>>
>> Regards,
>> Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 135+ messages in thread

end of thread, other threads:[~2018-01-09 11:47 UTC | newest]

Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10 12:09 arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP Bhupesh Sharma
2017-11-10 12:09 ` Bhupesh Sharma
     [not found] ` <CACi5LpM_95ebYFguPTyjWk+qHT5rDJVXiYDkNWbszo6Zw41zRA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-10 12:11   ` Bhupesh Sharma
2017-11-10 12:11     ` Bhupesh Sharma
     [not found]     ` <CACi5LpNV_E9pvhTwLcy6vtEj9qbL1ZEHe-5sv=iiW0k9JxPD1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-13  9:27       ` AKASHI Takahiro
2017-11-13  9:27         ` AKASHI Takahiro
     [not found]         ` <20171113092730.GA29552-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-11-14 11:20           ` Ard Biesheuvel
2017-11-14 11:20             ` Ard Biesheuvel
     [not found]             ` <CAKv+Gu_eQ-s0J22tKeHKJme4qXcvxvDkS7vKrNW+o_XtMTkMhQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-11-15 10:58               ` Bhupesh Sharma
2017-11-15 10:58                 ` Bhupesh Sharma
     [not found]                 ` <3df4c6c5-0abe-01ee-730d-2edaa5f497d2-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-16  7:00                   ` AKASHI Takahiro
2017-11-16  7:00                     ` AKASHI Takahiro
     [not found]                     ` <20171116070005.GI29552-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-11-26  8:29                       ` Bhupesh SHARMA
2017-11-26  8:29                         ` Bhupesh SHARMA
     [not found]                         ` <CAFTCetQHmpprAVu6uYO+rc5Xi4EUVhmovbmSaU6nM1n1mAH62w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-04 14:02                           ` Ard Biesheuvel
2017-12-04 14:02                             ` Ard Biesheuvel
     [not found]                             ` <CAKv+Gu9oda1Ee8AoXsCEw+Bjn-XF3wZA_CsxvqhjtT6_bmJ7uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-12 21:51                               ` Bhupesh Sharma
2017-12-12 21:51                                 ` Bhupesh Sharma
     [not found]                                 ` <CACi5LpOZ=WOx14gTwH5jfLozepT2Jw8JSY5x+bfEZ_YaiQvFpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-13 10:26                                   ` AKASHI Takahiro
2017-12-13 10:26                                     ` AKASHI Takahiro
     [not found]                                     ` <20171213102624.GC28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-13 10:49                                       ` Ard Biesheuvel
2017-12-13 10:49                                         ` Ard Biesheuvel
     [not found]                                         ` <CAKv+Gu_BmFN9Zg861SCS+R=V4khFykjuOzkmfEknsL=NvWW3Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-13 12:16                                           ` AKASHI Takahiro
2017-12-13 12:16                                             ` AKASHI Takahiro
     [not found]                                             ` <20171213121605.GE28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-13 12:17                                               ` Ard Biesheuvel
2017-12-13 12:17                                                 ` Ard Biesheuvel
     [not found]                                                 ` <CAKv+Gu_G8kBEAdAznVauZVAdJOFkr1vmu0Gf6tOwJfH2CgdufA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-13 19:22                                                   ` Bhupesh SHARMA
2017-12-13 19:22                                                     ` Bhupesh SHARMA
2017-12-15  8:59                                                   ` AKASHI Takahiro
2017-12-15  8:59                                                     ` AKASHI Takahiro
2017-12-15  9:35                                                     ` Ard Biesheuvel
2017-12-15  9:35                                                       ` Ard Biesheuvel
     [not found]                                                       ` <CAKv+Gu-W5VpVrgA=FVZCCevksaRGOVvPdE+B8WkpZc6AE1jOPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-17 21:01                                                         ` Bhupesh Sharma
2017-12-17 21:01                                                           ` Bhupesh Sharma
2017-12-18  5:16                                                           ` Dave Young
2017-12-18  5:16                                                             ` Dave Young
2017-12-18  5:16                                                             ` Dave Young
2017-12-18  5:16                                                             ` Dave Young
2017-12-18  5:54                                                             ` AKASHI Takahiro
2017-12-18  5:54                                                               ` AKASHI Takahiro
2017-12-18  5:54                                                               ` AKASHI Takahiro
2017-12-18  5:54                                                               ` AKASHI Takahiro
2017-12-18  8:59                                                               ` Bhupesh SHARMA
2017-12-18  8:59                                                                 ` Bhupesh SHARMA
2017-12-18  8:59                                                                 ` Bhupesh SHARMA
2017-12-18  8:59                                                                 ` Bhupesh SHARMA
     [not found]                                                                 ` <CAFTCetQ55zUKe25jSku0DHp8uVZA4hB32d5W6MSCNsTVpxu7Gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-18 11:18                                                                   ` AKASHI Takahiro
2017-12-18 11:18                                                                     ` AKASHI Takahiro
2017-12-18 11:18                                                                     ` AKASHI Takahiro
2017-12-18 11:18                                                                     ` AKASHI Takahiro
2017-12-18 22:28                                                                     ` Bhupesh Sharma
2017-12-18 22:28                                                                       ` Bhupesh Sharma
2017-12-18 22:28                                                                       ` Bhupesh Sharma
2017-12-18 22:28                                                                       ` Bhupesh Sharma
2017-12-19  5:01                                                                   ` AKASHI Takahiro
2017-12-19  5:01                                                                     ` AKASHI Takahiro
2017-12-19  5:01                                                                     ` AKASHI Takahiro
2017-12-19  5:01                                                                     ` AKASHI Takahiro
     [not found]                                                                     ` <20171219050113.GF28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-20 19:52                                                                       ` Bhupesh Sharma
2017-12-20 19:52                                                                         ` Bhupesh Sharma
2017-12-20 19:52                                                                         ` Bhupesh Sharma
2017-12-20 19:52                                                                         ` Bhupesh Sharma
2017-12-18 21:28                                                             ` Bhupesh Sharma
2017-12-18 21:28                                                               ` Bhupesh Sharma
2017-12-18 21:28                                                               ` Bhupesh Sharma
2017-12-18 21:28                                                               ` Bhupesh Sharma
2017-12-19  5:25                                                               ` AKASHI Takahiro
2017-12-19  5:25                                                                 ` AKASHI Takahiro
2017-12-19  5:25                                                                 ` AKASHI Takahiro
2017-12-19  5:25                                                                 ` AKASHI Takahiro
2017-12-18  5:40                                                     ` Dave Young
2017-12-18  5:40                                                       ` Dave Young
2017-12-18  5:43                                                       ` Dave Young
2017-12-18  5:43                                                         ` Dave Young
2017-12-18  5:43                                                         ` Dave Young
2017-12-18  5:43                                                         ` Dave Young
     [not found]                                                       ` <20171218054009.GA6392-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2017-12-19  6:09                                                         ` AKASHI Takahiro
2017-12-19  6:09                                                           ` AKASHI Takahiro
     [not found]                                                           ` <20171219060927.GH28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-19 13:09                                                             ` Ard Biesheuvel
2017-12-19 13:09                                                               ` Ard Biesheuvel
     [not found]                                                               ` <CAKv+Gu-gmbWdZ7rxp5qGrtSBQ7dM=3FqF-Pw=J0LaL=oKTMg4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-20 20:00                                                                 ` Bhupesh Sharma
2017-12-20 20:00                                                                   ` Bhupesh Sharma
     [not found]                                                                   ` <CACi5LpOscbcBecWaC3Q9P22kheRYc+M2Ynfusszk14fPY-cJ5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-21 10:34                                                                     ` AKASHI Takahiro
2017-12-21 10:34                                                                       ` AKASHI Takahiro
2017-12-21 10:34                                                                       ` AKASHI Takahiro
     [not found]                                                                       ` <20171221103440.GJ28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-21 12:06                                                                         ` Bhupesh Sharma
2017-12-21 12:06                                                                           ` Bhupesh Sharma
2017-12-21 12:06                                                                           ` Bhupesh Sharma
     [not found]                                                                           ` <CACi5LpMUnUKxiALAHW9_PE2RYC8GNWLPGpdJ5ca53g=v3rNkfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-22  8:33                                                                             ` AKASHI Takahiro
2017-12-22  8:33                                                                               ` AKASHI Takahiro
2017-12-22  8:33                                                                               ` AKASHI Takahiro
2017-12-23 19:51                                                                               ` Bhupesh Sharma
2017-12-23 19:51                                                                                 ` Bhupesh Sharma
2017-12-23 19:51                                                                                 ` Bhupesh Sharma
     [not found]                                                                                 ` <CACi5LpNF5i3Eo7nMLr_z9r4VVbXhDwSJCQoiOh-A_jB6hV0_2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-25  3:25                                                                                   ` AKASHI Takahiro
2017-12-25  3:25                                                                                     ` AKASHI Takahiro
2017-12-25  3:25                                                                                     ` AKASHI Takahiro
     [not found]                                                                                     ` <20171225032500.GA8877-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-25 20:14                                                                                       ` Bhupesh Sharma
2017-12-25 20:14                                                                                         ` Bhupesh Sharma
2017-12-25 20:14                                                                                         ` Bhupesh Sharma
     [not found]                                                                                         ` <CACi5LpMzYidDaC0_yfwgVOisH-FqcNViYj+Z54uKfUtHkJKKXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-26  1:32                                                                                           ` Dave Young
2017-12-26  1:32                                                                                             ` Dave Young
2017-12-26  1:32                                                                                             ` Dave Young
     [not found]                                                                                             ` <20171226013217.GA2119-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2017-12-26  1:35                                                                                               ` Dave Young
2017-12-26  1:35                                                                                                 ` Dave Young
2017-12-26  1:35                                                                                                 ` Dave Young
     [not found]                                                                                                 ` <20171226013517.GA2186-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2017-12-26  2:28                                                                                                   ` AKASHI Takahiro
2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
2017-12-26  2:28                                                                                                     ` AKASHI Takahiro
     [not found]                                                                                                     ` <20171226022807.GB8877-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2017-12-26  2:56                                                                                                       ` Bhupesh Sharma
2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
2017-12-26  2:56                                                                                                         ` Bhupesh Sharma
     [not found]                                                                                                         ` <CACi5LpNRtXh-j9Y9HwRatDZwRMr++-ZeaSnk62vD3btpxsVv7w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-26  6:58                                                                                                           ` Dave Young
2017-12-26  6:58                                                                                                             ` Dave Young
2017-12-26  6:58                                                                                                             ` Dave Young
     [not found]                                                                                                             ` <20171226065845.GB5354-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2018-01-09  5:22                                                                                                               ` AKASHI Takahiro
2018-01-09  5:22                                                                                                                 ` AKASHI Takahiro
2018-01-09  5:22                                                                                                                 ` AKASHI Takahiro
2018-01-08 20:00                                                                                                           ` Bhupesh Sharma
2018-01-08 20:00                                                                                                             ` Bhupesh Sharma
2018-01-08 20:00                                                                                                             ` Bhupesh Sharma
     [not found]                                                                                                             ` <CACi5LpNeSNHoUcM9xOq0bjN_okaEUDbaz1qyuqAct7BSNLQqKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-09  4:42                                                                                                               ` AKASHI Takahiro
2018-01-09  4:42                                                                                                                 ` AKASHI Takahiro
2018-01-09  4:42                                                                                                                 ` AKASHI Takahiro
     [not found]                                                                                                                 ` <20180109030717.GA18820-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2018-01-09 11:46                                                                                                                   ` Bhupesh Sharma
2018-01-09 11:46                                                                                                                     ` Bhupesh Sharma
2018-01-09 11:46                                                                                                                     ` Bhupesh Sharma
2017-12-26  6:56                                                                                                       ` Dave Young
2017-12-26  6:56                                                                                                         ` Dave Young
2017-12-26  6:56                                                                                                         ` Dave Young
     [not found]                                                                                                         ` <20171226065636.GA5354-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2018-01-09  5:02                                                                                                           ` AKASHI Takahiro
2018-01-09  5:02                                                                                                             ` AKASHI Takahiro
2018-01-09  5:02                                                                                                             ` AKASHI Takahiro
2017-11-24  8:47                   ` Dave Young
2017-11-24  8:47                     ` Dave Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.