* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-10 12:09 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-11-10 12:09 UTC (permalink / raw) To: Ard Biesheuvel, akahiro.akashi-QSEj5FYQhm4dnm+yROfE0A Cc: Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, james.morse-5wv7dgnIgG8, Bhupesh SHARMA Hi Ard, Akashi I have met an issue on an arm64 board using the latest master branch from Linus. I think I have a dirty hack to avoid the issue, but would want more opinions from you as it might break crashkernel dump on other arm64 machines. 1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is always set in bootargs) 2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory (by setting the EFI_MEMORY_WB flag for such efi memory descriptors thus marking them as System RAM). 3. However this causes crashkernel booting on ACPI only machines to fail: [ 0.039205] ACPI: Core revision 20170728 pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 [ 0.095098] Internal error: Oops: 96000021 [#1] SMP [ 0.100022] Modules linked in: [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] pstate: 60000045 [ 0.132647] sp : ffff000008ccfb40 [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146718] x25: 000000000000001b x24: 0000000000000001 [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162812] x19: 000000000000001b x18: 0000000000000005 [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 [ 0.173541] x15: 0000000000000000 x14: 000000000000038e [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223224] Call trace: [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232194] fa00: 0000000000000000 ffff000009710027 ffff0000095e3980 ffff000008ccfbe0 [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 ffff000008ccfc50 0000000000000000 [ 0.248018] fa40: ffff8000126d0140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263843] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 ffff000009710027 0000000000000001 [ 0.279667] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 ffff00000849b4f8 ffff000008ccfb40 [ 0.295491] fb00: ffff0000084a6764 0000000060000045 ffff000008ccfb40 ffff000008260a18 [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 ffff000008ccfb40 ffff0000084a6764 [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- [ 0.399160] Kernel panic - not syncing: Fatal exception [ 0.404437] Rebooting in 10 seconds.. 4. On the primary kernel boot, I notice with efi=debug that while the ACPI regions are properly recognized as Reclaim regions: ... [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] And appear correctly as early memory node ranges: [ 0.000000] Early memory node ranges ... [ 0.000000] node 0: [mem 0x00000000396c0000-0x000000003975ffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x0000000039770000-0x00000000397affff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398a0000-0x00000000398bffff] 5. However when the crashkernel is boot'ed I see that although the same regions are recognized as Reclaim regions they do not appear in the "Early Memory node range" entries: [ 141.348355] Starting crashdump kernel... [ 141.352269] Bye! ... [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x000000000e800000-0x000000002e7fffff] [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] ( ^^^ No entry for ACPI Reclaim Memory regions) 6a. Also I see that during the primary kernel boot: 'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions. 6b. But during the crashkernel boot, ''acpi_os_ioremap' calls 'ioremap' for the ACPI Reclaim Memory regions and not the _cache variant. 7. I think this is because of how the memblock regions are mapped in 'arch/arm64/mm/mmu.c': I am not quite sure if I fully understand the trick we have presently inside '__init map_mem(pgd_t *pgd)', but reading the comment below: /* * Take care not to create a writable alias for the * read-only text and rodata sections of the kernel image. * So temporarily mark them as NOMAP to skip mappings in * the following for-loop */ I think we are marking only the kernel text and crashkernel regions with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS 8. Also, I think now the crashkernel handling changed by e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved memblock regions explicitly in iomem), needs to be changed to handle the change added by Ard to fix this issue on ACPI only machines. I have a dirty hack in place, but I would like to have your opinions about what can be a more concrete fix to this issue (as we mark these regions as System RAM now rather than NOMAP) and I don't have a DTB based machine to test on currently. Please share your views. Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-10 12:09 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-11-10 12:09 UTC (permalink / raw) To: linux-arm-kernel Hi Ard, Akashi I have met an issue on an arm64 board using the latest master branch from Linus. I think I have a dirty hack to avoid the issue, but would want more opinions from you as it might break crashkernel dump on other arm64 machines. 1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is always set in bootargs) 2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory (by setting the EFI_MEMORY_WB flag for such efi memory descriptors thus marking them as System RAM). 3. However this causes crashkernel booting on ACPI only machines to fail: [ 0.039205] ACPI: Core revision 20170728 pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 [ 0.095098] Internal error: Oops: 96000021 [#1] SMP [ 0.100022] Modules linked in: [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] pstate: 60000045 [ 0.132647] sp : ffff000008ccfb40 [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146718] x25: 000000000000001b x24: 0000000000000001 [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162812] x19: 000000000000001b x18: 0000000000000005 [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 [ 0.173541] x15: 0000000000000000 x14: 000000000000038e [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223224] Call trace: [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232194] fa00: 0000000000000000 ffff000009710027 ffff0000095e3980 ffff000008ccfbe0 [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 ffff000008ccfc50 0000000000000000 [ 0.248018] fa40: ffff8000126d0140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263843] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 ffff000009710027 0000000000000001 [ 0.279667] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 ffff00000849b4f8 ffff000008ccfb40 [ 0.295491] fb00: ffff0000084a6764 0000000060000045 ffff000008ccfb40 ffff000008260a18 [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 ffff000008ccfb40 ffff0000084a6764 [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- [ 0.399160] Kernel panic - not syncing: Fatal exception [ 0.404437] Rebooting in 10 seconds.. 4. On the primary kernel boot, I notice with efi=debug that while the ACPI regions are properly recognized as Reclaim regions: ... [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] And appear correctly as early memory node ranges: [ 0.000000] Early memory node ranges ... [ 0.000000] node 0: [mem 0x00000000396c0000-0x000000003975ffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x0000000039770000-0x00000000397affff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398a0000-0x00000000398bffff] 5. However when the crashkernel is boot'ed I see that although the same regions are recognized as Reclaim regions they do not appear in the "Early Memory node range" entries: [ 141.348355] Starting crashdump kernel... [ 141.352269] Bye! ... [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x000000000e800000-0x000000002e7fffff] [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] ( ^^^ No entry for ACPI Reclaim Memory regions) 6a. Also I see that during the primary kernel boot: 'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions. 6b. But during the crashkernel boot, ''acpi_os_ioremap' calls 'ioremap' for the ACPI Reclaim Memory regions and not the _cache variant. 7. I think this is because of how the memblock regions are mapped in 'arch/arm64/mm/mmu.c': I am not quite sure if I fully understand the trick we have presently inside '__init map_mem(pgd_t *pgd)', but reading the comment below: /* * Take care not to create a writable alias for the * read-only text and rodata sections of the kernel image. * So temporarily mark them as NOMAP to skip mappings in * the following for-loop */ I think we are marking only the kernel text and crashkernel regions with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS 8. Also, I think now the crashkernel handling changed by e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved memblock regions explicitly in iomem), needs to be changed to handle the change added by Ard to fix this issue on ACPI only machines. I have a dirty hack in place, but I would like to have your opinions about what can be a more concrete fix to this issue (as we mark these regions as System RAM now rather than NOMAP) and I don't have a DTB based machine to test on currently. Please share your views. Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpM_95ebYFguPTyjWk+qHT5rDJVXiYDkNWbszo6Zw41zRA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-10 12:09 ` Bhupesh Sharma @ 2017-11-10 12:11 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-11-10 12:11 UTC (permalink / raw) To: Ard Biesheuvel, takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A Cc: Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, james.morse-5wv7dgnIgG8, Bhupesh SHARMA Resent with Akashi's correct email address. On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > Hi Ard, Akashi > > I have met an issue on an arm64 board using the latest master branch from Linus. > > I think I have a dirty hack to avoid the issue, but would want more > opinions from you as it might break crashkernel dump on other arm64 > machines. > > 1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is > always set in bootargs) > > 2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark > ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk > which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory > (by setting the EFI_MEMORY_WB flag for such efi memory descriptors > thus marking them as System RAM). > > 3. However this causes crashkernel booting on ACPI only machines to fail: > > [ 0.039205] ACPI: Core revision 20170728 > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > [ 0.100022] Modules linked in: > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > pstate: 60000045 > [ 0.132647] sp : ffff000008ccfb40 > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > [ 0.223224] Call trace: > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > ffff0000095e3980 ffff000008ccfbe0 > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > ffff000008ccfc50 0000000000000000 > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > 00000000ffffff76 0000000000000006 > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > 000000000000038e 0000000000000000 > [ 0.263843] fa80: 0000000000000000 0000000000000000 > 0000000000000005 000000000000001b > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > ffff000009710027 0000000000000001 > [ 0.279667] fac0: 0000000000000001 000000000000001b > 0000000000000000 ffff0000088be820 > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > ffff00000849b4f8 ffff000008ccfb40 > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > ffff000008ccfb40 ffff000008260a18 > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > ffff000008ccfb40 ffff0000084a6764 > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > [ 0.399160] Kernel panic - not syncing: Fatal exception > [ 0.404437] Rebooting in 10 seconds.. > > 4. On the primary kernel boot, I notice with efi=debug that while the > ACPI regions are properly recognized as Reclaim regions: > > ... > [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > ... > [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > > And appear correctly as early memory node ranges: > > [ 0.000000] Early memory node ranges > ... > [ 0.000000] node 0: [mem 0x00000000396c0000-0x000000003975ffff] > [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] > [ 0.000000] node 0: [mem 0x0000000039770000-0x00000000397affff] > [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] > [ 0.000000] node 0: [mem 0x00000000398a0000-0x00000000398bffff] > > 5. However when the crashkernel is boot'ed I see that although the > same regions are recognized as Reclaim regions they do not appear in > the "Early Memory node range" entries: > > [ 141.348355] Starting crashdump kernel... > [ 141.352269] Bye! > ... > [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > ... > [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > > ... > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x000000000e800000-0x000000002e7fffff] > [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] > [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] > [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] > [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] > [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] > > ( ^^^ No entry for ACPI Reclaim Memory regions) > > 6a. Also I see that during the primary kernel boot: > > 'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions. > > 6b. But during the crashkernel boot, ''acpi_os_ioremap' calls > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > variant. > > 7. I think this is because of how the memblock regions are mapped in > 'arch/arm64/mm/mmu.c': > > I am not quite sure if I fully understand the trick we have presently > inside '__init map_mem(pgd_t *pgd)', but reading the comment below: > > /* > * Take care not to create a writable alias for the > * read-only text and rodata sections of the kernel image. > * So temporarily mark them as NOMAP to skip mappings in > * the following for-loop > */ > > I think we are marking only the kernel text and crashkernel regions > with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS > > 8. Also, I think now the crashkernel handling changed by > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved > memblock regions explicitly in iomem), needs to be changed to handle > the change added by Ard to fix this issue on ACPI only machines. > > I have a dirty hack in place, but I would like to have your opinions > about what can be a more concrete fix to this issue (as we mark these > regions as System RAM now rather than NOMAP) and I don't have a DTB > based machine to test on currently. > > Please share your views. > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-10 12:11 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-11-10 12:11 UTC (permalink / raw) To: linux-arm-kernel Resent with Akashi's correct email address. On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote: > Hi Ard, Akashi > > I have met an issue on an arm64 board using the latest master branch from Linus. > > I think I have a dirty hack to avoid the issue, but would want more > opinions from you as it might break crashkernel dump on other arm64 > machines. > > 1. This arm64 machine supports acpi only boot mode (i.e. acpi=force is > always set in bootargs) > > 2. Since f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark > ACPI reclaim memory as MEMBLOCK_NOMAP), Ard rightly added a chunk > which marks the 'EFI_ACPI_RECLAIM_MEMORY' regions as useable memory > (by setting the EFI_MEMORY_WB flag for such efi memory descriptors > thus marking them as System RAM). > > 3. However this causes crashkernel booting on ACPI only machines to fail: > > [ 0.039205] ACPI: Core revision 20170728 > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > [ 0.100022] Modules linked in: > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > pstate: 60000045 > [ 0.132647] sp : ffff000008ccfb40 > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > [ 0.223224] Call trace: > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > ffff0000095e3980 ffff000008ccfbe0 > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > ffff000008ccfc50 0000000000000000 > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > 00000000ffffff76 0000000000000006 > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > 000000000000038e 0000000000000000 > [ 0.263843] fa80: 0000000000000000 0000000000000000 > 0000000000000005 000000000000001b > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > ffff000009710027 0000000000000001 > [ 0.279667] fac0: 0000000000000001 000000000000001b > 0000000000000000 ffff0000088be820 > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > ffff00000849b4f8 ffff000008ccfb40 > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > ffff000008ccfb40 ffff000008260a18 > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > ffff000008ccfb40 ffff0000084a6764 > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > [ 0.399160] Kernel panic - not syncing: Fatal exception > [ 0.404437] Rebooting in 10 seconds.. > > 4. On the primary kernel boot, I notice with efi=debug that while the > ACPI regions are properly recognized as Reclaim regions: > > ... > [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > ... > [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > > And appear correctly as early memory node ranges: > > [ 0.000000] Early memory node ranges > ... > [ 0.000000] node 0: [mem 0x00000000396c0000-0x000000003975ffff] > [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] > [ 0.000000] node 0: [mem 0x0000000039770000-0x00000000397affff] > [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] > [ 0.000000] node 0: [mem 0x00000000398a0000-0x00000000398bffff] > > 5. However when the crashkernel is boot'ed I see that although the > same regions are recognized as Reclaim regions they do not appear in > the "Early Memory node range" entries: > > [ 141.348355] Starting crashdump kernel... > [ 141.352269] Bye! > ... > [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x000039770000-0x0000397affff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > ... > [ 0.000000] efi: 0x0000398a0000-0x0000398bffff [ACPI Reclaim > Memory| | | | | | | | |WB|WT|WC|UC] > > ... > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x000000000e800000-0x000000002e7fffff] > [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] > [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] > [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] > [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] > [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] > > ( ^^^ No entry for ACPI Reclaim Memory regions) > > 6a. Also I see that during the primary kernel boot: > > 'acpi_os_ioremap' calls 'ioremap_cache' for the ACPI Reclaim Memory regions. > > 6b. But during the crashkernel boot, ''acpi_os_ioremap' calls > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > variant. > > 7. I think this is because of how the memblock regions are mapped in > 'arch/arm64/mm/mmu.c': > > I am not quite sure if I fully understand the trick we have presently > inside '__init map_mem(pgd_t *pgd)', but reading the comment below: > > /* > * Take care not to create a writable alias for the > * read-only text and rodata sections of the kernel image. > * So temporarily mark them as NOMAP to skip mappings in > * the following for-loop > */ > > I think we are marking only the kernel text and crashkernel regions > with NO_CONT_MAPPINGS or NO_CONT_MAPPINGS and NO_BLOCK_MAPPINGS > > 8. Also, I think now the crashkernel handling changed by > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved > memblock regions explicitly in iomem), needs to be changed to handle > the change added by Ard to fix this issue on ACPI only machines. > > I have a dirty hack in place, but I would like to have your opinions > about what can be a more concrete fix to this issue (as we mark these > regions as System RAM now rather than NOMAP) and I don't have a DTB > based machine to test on currently. > > Please share your views. > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpNV_E9pvhTwLcy6vtEj9qbL1ZEHe-5sv=iiW0k9JxPD1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-10 12:11 ` Bhupesh Sharma @ 2017-11-13 9:27 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-11-13 9:27 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, james.morse-5wv7dgnIgG8, Bhupesh SHARMA Hi, On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: > Resent with Akashi's correct email address. > > On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > Hi Ard, Akashi > > > > I have met an issue on an arm64 board using the latest master branch from Linus. (snip) > > > > 8. Also, I think now the crashkernel handling changed by > > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved > > memblock regions explicitly in iomem), needs to be changed to handle > > the change added by Ard to fix this issue on ACPI only machines. > > > > I have a dirty hack in place, but I would like to have your opinions > > about what can be a more concrete fix to this issue (as we mark these > > regions as System RAM now rather than NOMAP) and I don't have a DTB > > based machine to test on currently. I don't know much about acpi reclaim regions, can you please tell me how your change affects your panic case? Thanks, -Takahiro AKASHI > > Please share your views. > > > > Regards, > > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-13 9:27 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-11-13 9:27 UTC (permalink / raw) To: linux-arm-kernel Hi, On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: > Resent with Akashi's correct email address. > > On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote: > > Hi Ard, Akashi > > > > I have met an issue on an arm64 board using the latest master branch from Linus. (snip) > > > > 8. Also, I think now the crashkernel handling changed by > > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved > > memblock regions explicitly in iomem), needs to be changed to handle > > the change added by Ard to fix this issue on ACPI only machines. > > > > I have a dirty hack in place, but I would like to have your opinions > > about what can be a more concrete fix to this issue (as we mark these > > regions as System RAM now rather than NOMAP) and I don't have a DTB > > based machine to test on currently. I don't know much about acpi reclaim regions, can you please tell me how your change affects your panic case? Thanks, -Takahiro AKASHI > > Please share your views. > > > > Regards, > > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171113092730.GA29552-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-13 9:27 ` AKASHI Takahiro @ 2017-11-14 11:20 ` Ard Biesheuvel -1 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-11-14 11:20 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, Bhupesh SHARMA On 13 November 2017 at 09:27, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > Hi, > > On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: >> Resent with Akashi's correct email address. >> >> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > Hi Ard, Akashi >> > >> > I have met an issue on an arm64 board using the latest master branch from Linus. > (snip) >> > >> > 8. Also, I think now the crashkernel handling changed by >> > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved >> > memblock regions explicitly in iomem), needs to be changed to handle >> > the change added by Ard to fix this issue on ACPI only machines. >> > >> > I have a dirty hack in place, but I would like to have your opinions >> > about what can be a more concrete fix to this issue (as we mark these >> > regions as System RAM now rather than NOMAP) and I don't have a DTB >> > based machine to test on currently. > > I don't know much about acpi reclaim regions, > can you please tell me how your change affects your panic case? > Does this help at all? diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 7768423b39d3..61d867647cca 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) for_each_memblock(memory, region) { res = alloc_bootmem_low(sizeof(*res)); - if (memblock_is_nomap(region)) { + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { res->name = "reserved"; res->flags = IORESOURCE_MEM; } else { ^ permalink raw reply related [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-14 11:20 ` Ard Biesheuvel 0 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-11-14 11:20 UTC (permalink / raw) To: linux-arm-kernel On 13 November 2017 at 09:27, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Hi, > > On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: >> Resent with Akashi's correct email address. >> >> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote: >> > Hi Ard, Akashi >> > >> > I have met an issue on an arm64 board using the latest master branch from Linus. > (snip) >> > >> > 8. Also, I think now the crashkernel handling changed by >> > e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved >> > memblock regions explicitly in iomem), needs to be changed to handle >> > the change added by Ard to fix this issue on ACPI only machines. >> > >> > I have a dirty hack in place, but I would like to have your opinions >> > about what can be a more concrete fix to this issue (as we mark these >> > regions as System RAM now rather than NOMAP) and I don't have a DTB >> > based machine to test on currently. > > I don't know much about acpi reclaim regions, > can you please tell me how your change affects your panic case? > Does this help at all? diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 7768423b39d3..61d867647cca 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) for_each_memblock(memory, region) { res = alloc_bootmem_low(sizeof(*res)); - if (memblock_is_nomap(region)) { + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { res->name = "reserved"; res->flags = IORESOURCE_MEM; } else { ^ permalink raw reply related [flat|nested] 135+ messages in thread
[parent not found: <CAKv+Gu_eQ-s0J22tKeHKJme4qXcvxvDkS7vKrNW+o_XtMTkMhQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-14 11:20 ` Ard Biesheuvel @ 2017-11-15 10:58 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-11-15 10:58 UTC (permalink / raw) To: Ard Biesheuvel, AKASHI Takahiro, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, Bhupesh SHARMA Hi Ard, Akashi, On 11/14/2017 04:50 PM, Ard Biesheuvel wrote: > On 13 November 2017 at 09:27, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> Hi, >> >> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: >>> Resent with Akashi's correct email address. >>> >>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >>>> Hi Ard, Akashi >>>> >>>> I have met an issue on an arm64 board using the latest master branch from Linus. >> (snip) >>>> >>>> 8. Also, I think now the crashkernel handling changed by >>>> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved >>>> memblock regions explicitly in iomem), needs to be changed to handle >>>> the change added by Ard to fix this issue on ACPI only machines. >>>> >>>> I have a dirty hack in place, but I would like to have your opinions >>>> about what can be a more concrete fix to this issue (as we mark these >>>> regions as System RAM now rather than NOMAP) and I don't have a DTB >>>> based machine to test on currently. >> >> I don't know much about acpi reclaim regions, >> can you please tell me how your change affects your panic case? Sorry I was away yesterday and couldn't get back with the dirty hack details. But I see Ard has already proposed the following change and it looks similar to the change I did locally however that doesn't seem to fix the issue completely at my end so far. Here are more details on the same .. > > Does this help at all? > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 7768423b39d3..61d867647cca 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) > > for_each_memblock(memory, region) { > res = alloc_bootmem_low(sizeof(*res)); > - if (memblock_is_nomap(region)) { > + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { > res->name = "reserved"; > res->flags = IORESOURCE_MEM; > } else { > .. So, I tried using the 'memblock_is_reserved' check in ' request_standard_resources' however as 'memblock_is_reserved' expects a phy_addr as an input argument, I changed mine to something like this: - if (memblock_is_nomap(region)) { + if (memblock_is_nomap(region) || memblock_is_reserved(__pfn_to_phys(memblock_region_reserved_base_pfn(region)))) { However, I see I am hitting a still hitting the issue and its quite peculiar one. First some more background on what is happening on this Huawei Taishan arm64 board that I have: 1a. I see from the boot logs that one of the ACPI tables (DSDT) is at phy addr 0x39710000: # dmesg | grep -i "DSDT" [ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124) 1b. This DSDT table is correctly marked as a ACPI Reclaim memory, however I see that just preceding this entry there also is a 'Boot Code' entry from address '0x0000396c0000-0x00003970ffff': # dmesg | grep -B 2 -i "ACPI reclaim" [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] 2. Now, I am not sure which kernel layer does the following changes (I am still trying to dig it out more), but I see that the 'Boot Code' and ACPI DSDT table regions are somehow merged into one memblock_region and appear as range '396c0000-3975ffff' in the '/proc/iomem' interface: # cat /proc/iomem | grep -A 2 -B 2 39 00000000-3961ffff : System RAM 00080000-00b6ffff : Kernel code 00cb0000-0167ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : System RAM 39760000-3976ffff : reserved 39770000-397affff : reserved 397b0000-3989ffff : reserved 398a0000-398bffff : reserved 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3. As to why this merged region appears as a System RAM area, rather than a RESERVED one, the following code path explains the same: 3a. The check we added in 'arch/arm64/kernel/setup.c' doesn't handle the ACPI DSDT table properly and mark it as 'RESERVED'. This is because 'memblock_is_reserved' calls 'memblock_search' internally which is implemented currently as: static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr) { unsigned int left = 0, right = type->cnt; do { unsigned int mid = (right + left) / 2; if (addr < type->regions[mid].base) right = mid; else if (addr >= (type->regions[mid].base + type->regions[mid].size)) left = mid + 1; else return mid; } while (left < right); return -1; } 3b. Since 'addr' being passed to 'memblock_search' calculated via '__pfn_to__phys(memblock_region_memory_base_pfn(region)' in this case is 0x396c0000 (see iomem entry in point 2 above), so we never see that this memblock is reserved for the ACPI DSDT entry at 0x39710000. 4. Now, when we run the kexec-tools to load a crashdump kernel, it doesn't find an entry for the ACPI DSDT table in the reserved range (but instead finds it as a System RAM range): # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d ... get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 00000000396c0000 - 000000003975ffff : System RAM get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 0000000039770000 - 00000000397affff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398a0000 - 00000000398bffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. .. 5. Now when a crash is issued to boot the crashkernel, we see it panic while trying to access the acpi tables (note that the logs below have been snipped for clarity): # echo c > /proc/sysrq-trigger ... [ 419.495621] Bye! ... [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124) ... [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000010200000-0x00000000301fffff] [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] ... [ 0.039309] ACPI: Core revision 20170728 [ 0.044383] Unable to handle kernel paging request at virtual address ffff000009f10027 [ 0.052386] Mem abort info: [ 0.055201] Exception class = DABT (current EL), IL = 32 bits [ 0.061179] SET = 0, FnV = 0 [ 0.064258] EA = 0, S1PTW = 0 [ 0.067424] Data abort info: [ 0.070326] ISV = 0, ISS = 0x00000021 [ 0.074195] CM = 0, WnR = 0 [ 0.077187] swapper pgtable: 64k pages, 48-bit VAs, pgd = ffff000009650000 [ 0.084133] [ffff000009f10027] *pgd=00000000301d0003, *pud=00000000301d0003, *pmd=00000000301c0003, *pte=00e8000039710707 [ 0.095215] Internal error: Oops: 96000021 [#1] SMP [ 0.100139] Modules linked in: [ 0.103219] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0+ #30 [ 0.109373] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.115356] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119929] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125117] pc : [<ffff0000084a862c>] lr : [<ffff00000849d3c0>] pstate: 60000045 [ 0.132589] sp : ffff000008ccfb40 [ 0.135930] x29: ffff000008ccfb40 x28: ffff000008a9c18c [ 0.141295] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146659] x25: 000000000000001b x24: 0000000000000001 [ 0.152024] x23: 0000000000000001 x22: ffff000009f10027 [ 0.157389] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162753] x19: 000000000000001b x18: 0000000000000005 [ 0.168117] x17: 0000000000000000 x16: 0000000000000000 [ 0.173481] x15: 0000000000000000 x14: 000000000000038e [ 0.178846] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184210] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189574] x9 : 000000000000005f x8 : ffff800014670140 [ 0.194939] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200303] x5 : ffff800012d45000 x4 : 0000000000000001 [ 0.205668] x3 : ffff000008ccfbe0 x2 : ffff0000095e3a00 [ 0.211032] x1 : ffff000009f10027 x0 : 0000000000000000 [ 0.216397] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223166] Call trace: [ 0.225629] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232136] fa00: 0000000000000000 ffff000009f10027 ffff0000095e3a00 ffff000008ccfbe0 [ 0.240048] fa20: 0000000000000001 ffff800012d45000 ffff000008ccfc50 0000000000000000 [ 0.247960] fa40: ffff800014670140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255872] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263785] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271697] faa0: 0000000000000001 ffff000008ccfc50 ffff000009f10027 0000000000000001 [ 0.279609] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287521] fae0: ffff000008a9c18c ffff000008ccfb40 ffff00000849d3c0 ffff000008ccfb40 [ 0.295433] fb00: ffff0000084a862c 0000000060000045 ffff000008ccfb40 ffff000008261918 [ 0.303345] fb20: ffffffffffffffff ffff0000087f193c ffff000008ccfb40 ffff0000084a862c [ 0.311258] [<ffff0000084a862c>] acpi_ns_lookup+0x25c/0x3c0 [ 0.316885] [<ffff00000849d3c0>] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323128] [<ffff0000084af374>] acpi_ps_build_named_op+0xc4/0x198 [ 0.329371] [<ffff0000084af594>] acpi_ps_create_op+0x14c/0x270 [ 0.335262] [<ffff0000084aee70>] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341241] [<ffff0000084aff10>] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347044] [<ffff0000084aacd8>] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353726] [<ffff0000084aad60>] acpi_ns_parse_table+0x48/0x68 [ 0.359616] [<ffff0000084aa194>] acpi_ns_load_table+0x4c/0xdc [ 0.365420] [<ffff0000084b51c0>] acpi_tb_load_namespace+0xe4/0x264 [ 0.371664] [<ffff000008bafd64>] acpi_load_tables+0x48/0xc0 [ 0.377292] [<ffff000008badfd0>] acpi_early_init+0x9c/0xd0 [ 0.382832] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT table' ranges to be merged into a single region at '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using 'memblock_is_reserved'. Any pointers? Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-15 10:58 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-11-15 10:58 UTC (permalink / raw) To: linux-arm-kernel Hi Ard, Akashi, On 11/14/2017 04:50 PM, Ard Biesheuvel wrote: > On 13 November 2017 at 09:27, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> Hi, >> >> On Fri, Nov 10, 2017 at 05:41:56PM +0530, Bhupesh Sharma wrote: >>> Resent with Akashi's correct email address. >>> >>> On Fri, Nov 10, 2017 at 5:39 PM, Bhupesh Sharma <bhsharma@redhat.com> wrote: >>>> Hi Ard, Akashi >>>> >>>> I have met an issue on an arm64 board using the latest master branch from Linus. >> (snip) >>>> >>>> 8. Also, I think now the crashkernel handling changed by >>>> e7cd190385d17790cc3eb3821b1094b00aacf325 (arm64: mark reserved >>>> memblock regions explicitly in iomem), needs to be changed to handle >>>> the change added by Ard to fix this issue on ACPI only machines. >>>> >>>> I have a dirty hack in place, but I would like to have your opinions >>>> about what can be a more concrete fix to this issue (as we mark these >>>> regions as System RAM now rather than NOMAP) and I don't have a DTB >>>> based machine to test on currently. >> >> I don't know much about acpi reclaim regions, >> can you please tell me how your change affects your panic case? Sorry I was away yesterday and couldn't get back with the dirty hack details. But I see Ard has already proposed the following change and it looks similar to the change I did locally however that doesn't seem to fix the issue completely at my end so far. Here are more details on the same .. > > Does this help at all? > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 7768423b39d3..61d867647cca 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) > > for_each_memblock(memory, region) { > res = alloc_bootmem_low(sizeof(*res)); > - if (memblock_is_nomap(region)) { > + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { > res->name = "reserved"; > res->flags = IORESOURCE_MEM; > } else { > .. So, I tried using the 'memblock_is_reserved' check in ' request_standard_resources' however as 'memblock_is_reserved' expects a phy_addr as an input argument, I changed mine to something like this: - if (memblock_is_nomap(region)) { + if (memblock_is_nomap(region) || memblock_is_reserved(__pfn_to_phys(memblock_region_reserved_base_pfn(region)))) { However, I see I am hitting a still hitting the issue and its quite peculiar one. First some more background on what is happening on this Huawei Taishan arm64 board that I have: 1a. I see from the boot logs that one of the ACPI tables (DSDT) is at phy addr 0x39710000: # dmesg | grep -i "DSDT" [ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124) 1b. This DSDT table is correctly marked as a ACPI Reclaim memory, however I see that just preceding this entry there also is a 'Boot Code' entry from address '0x0000396c0000-0x00003970ffff': # dmesg | grep -B 2 -i "ACPI reclaim" [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] 2. Now, I am not sure which kernel layer does the following changes (I am still trying to dig it out more), but I see that the 'Boot Code' and ACPI DSDT table regions are somehow merged into one memblock_region and appear as range '396c0000-3975ffff' in the '/proc/iomem' interface: # cat /proc/iomem | grep -A 2 -B 2 39 00000000-3961ffff : System RAM 00080000-00b6ffff : Kernel code 00cb0000-0167ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : System RAM 39760000-3976ffff : reserved 39770000-397affff : reserved 397b0000-3989ffff : reserved 398a0000-398bffff : reserved 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3. As to why this merged region appears as a System RAM area, rather than a RESERVED one, the following code path explains the same: 3a. The check we added in 'arch/arm64/kernel/setup.c' doesn't handle the ACPI DSDT table properly and mark it as 'RESERVED'. This is because 'memblock_is_reserved' calls 'memblock_search' internally which is implemented currently as: static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr) { unsigned int left = 0, right = type->cnt; do { unsigned int mid = (right + left) / 2; if (addr < type->regions[mid].base) right = mid; else if (addr >= (type->regions[mid].base + type->regions[mid].size)) left = mid + 1; else return mid; } while (left < right); return -1; } 3b. Since 'addr' being passed to 'memblock_search' calculated via '__pfn_to__phys(memblock_region_memory_base_pfn(region)' in this case is 0x396c0000 (see iomem entry in point 2 above), so we never see that this memblock is reserved for the ACPI DSDT entry@0x39710000. 4. Now, when we run the kexec-tools to load a crashdump kernel, it doesn't find an entry for the ACPI DSDT table in the reserved range (but instead finds it as a System RAM range): # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d ... get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 00000000396c0000 - 000000003975ffff : System RAM get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 0000000039770000 - 00000000397affff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398a0000 - 00000000398bffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. .. 5. Now when a crash is issued to boot the crashkernel, we see it panic while trying to access the acpi tables (note that the logs below have been snipped for clarity): # echo c > /proc/sysrq-trigger ... [ 419.495621] Bye! ... [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | | | | | |WB|WT|WC|UC] [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| | | | | | | | |WB|WT|WC|UC] ... [ 0.000000] ACPI: DSDT 0x0000000039710000 006656 (v02 HISI HIP07 00000000 INTL 20151124) ... [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000010200000-0x00000000301fffff] [ 0.000000] node 0: [mem 0x0000000039620000-0x00000000396bffff] [ 0.000000] node 0: [mem 0x0000000039760000-0x000000003976ffff] [ 0.000000] node 0: [mem 0x00000000397b0000-0x000000003989ffff] [ 0.000000] node 0: [mem 0x00000000398c0000-0x0000000039d3ffff] [ 0.000000] node 0: [mem 0x000000003ed30000-0x000000003ed5ffff] ... [ 0.039309] ACPI: Core revision 20170728 [ 0.044383] Unable to handle kernel paging request at virtual address ffff000009f10027 [ 0.052386] Mem abort info: [ 0.055201] Exception class = DABT (current EL), IL = 32 bits [ 0.061179] SET = 0, FnV = 0 [ 0.064258] EA = 0, S1PTW = 0 [ 0.067424] Data abort info: [ 0.070326] ISV = 0, ISS = 0x00000021 [ 0.074195] CM = 0, WnR = 0 [ 0.077187] swapper pgtable: 64k pages, 48-bit VAs, pgd = ffff000009650000 [ 0.084133] [ffff000009f10027] *pgd=00000000301d0003, *pud=00000000301d0003, *pmd=00000000301c0003, *pte=00e8000039710707 [ 0.095215] Internal error: Oops: 96000021 [#1] SMP [ 0.100139] Modules linked in: [ 0.103219] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0+ #30 [ 0.109373] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.115356] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119929] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125117] pc : [<ffff0000084a862c>] lr : [<ffff00000849d3c0>] pstate: 60000045 [ 0.132589] sp : ffff000008ccfb40 [ 0.135930] x29: ffff000008ccfb40 x28: ffff000008a9c18c [ 0.141295] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146659] x25: 000000000000001b x24: 0000000000000001 [ 0.152024] x23: 0000000000000001 x22: ffff000009f10027 [ 0.157389] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162753] x19: 000000000000001b x18: 0000000000000005 [ 0.168117] x17: 0000000000000000 x16: 0000000000000000 [ 0.173481] x15: 0000000000000000 x14: 000000000000038e [ 0.178846] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184210] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189574] x9 : 000000000000005f x8 : ffff800014670140 [ 0.194939] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200303] x5 : ffff800012d45000 x4 : 0000000000000001 [ 0.205668] x3 : ffff000008ccfbe0 x2 : ffff0000095e3a00 [ 0.211032] x1 : ffff000009f10027 x0 : 0000000000000000 [ 0.216397] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223166] Call trace: [ 0.225629] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232136] fa00: 0000000000000000 ffff000009f10027 ffff0000095e3a00 ffff000008ccfbe0 [ 0.240048] fa20: 0000000000000001 ffff800012d45000 ffff000008ccfc50 0000000000000000 [ 0.247960] fa40: ffff800014670140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255872] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263785] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271697] faa0: 0000000000000001 ffff000008ccfc50 ffff000009f10027 0000000000000001 [ 0.279609] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287521] fae0: ffff000008a9c18c ffff000008ccfb40 ffff00000849d3c0 ffff000008ccfb40 [ 0.295433] fb00: ffff0000084a862c 0000000060000045 ffff000008ccfb40 ffff000008261918 [ 0.303345] fb20: ffffffffffffffff ffff0000087f193c ffff000008ccfb40 ffff0000084a862c [ 0.311258] [<ffff0000084a862c>] acpi_ns_lookup+0x25c/0x3c0 [ 0.316885] [<ffff00000849d3c0>] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323128] [<ffff0000084af374>] acpi_ps_build_named_op+0xc4/0x198 [ 0.329371] [<ffff0000084af594>] acpi_ps_create_op+0x14c/0x270 [ 0.335262] [<ffff0000084aee70>] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341241] [<ffff0000084aff10>] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347044] [<ffff0000084aacd8>] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353726] [<ffff0000084aad60>] acpi_ns_parse_table+0x48/0x68 [ 0.359616] [<ffff0000084aa194>] acpi_ns_load_table+0x4c/0xdc [ 0.365420] [<ffff0000084b51c0>] acpi_tb_load_namespace+0xe4/0x264 [ 0.371664] [<ffff000008bafd64>] acpi_load_tables+0x48/0xc0 [ 0.377292] [<ffff000008badfd0>] acpi_early_init+0x9c/0xd0 [ 0.382832] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT table' ranges to be merged into a single region at '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using 'memblock_is_reserved'. Any pointers? Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <3df4c6c5-0abe-01ee-730d-2edaa5f497d2-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-15 10:58 ` Bhupesh Sharma @ 2017-11-16 7:00 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-11-16 7:00 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, Bhupesh SHARMA Bhupesh, On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: > (snip) > # dmesg | grep -B 2 -i "ACPI reclaim" > [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | > | | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | > | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| > | | | | | | | |WB|WT|WC|UC] > > 2. Now, I am not sure which kernel layer does the following changes (I am > still trying to dig it out more), but I see that the 'Boot Code' and ACPI > DSDT table regions are somehow merged into one memblock_region and appear as > range '396c0000-3975ffff' in the '/proc/iomem' interface: > > # cat /proc/iomem | grep -A 2 -B 2 39 > 00000000-3961ffff : System RAM > 00080000-00b6ffff : Kernel code > 00cb0000-0167ffff : Kernel data > 0e800000-2e7fffff : Crash kernel > 39620000-396bffff : reserved > 396c0000-3975ffff : System RAM > 39760000-3976ffff : reserved > 39770000-397affff : reserved > 397b0000-3989ffff : reserved > 398a0000-398bffff : reserved > 398c0000-39d3ffff : reserved > 39d40000-3ed2ffff : System RAM > (snip) > > So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT > table' ranges to be merged into a single region at > '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using > 'memblock_is_reserved'. Simple:) The short answer is that memblock_add() does. The long answer: First, please note that memblock maintains two type of regions list, "memory" and "reserved". efi_init() reserve_regions() early_init_dt_add_memory_arch() memblock_add() memblock_add_range(memblock.memory) The memory regions described in efi.memmap are added to "memory" list with all the neighboring regions being merged into ones, in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. The secret here is that "Runtime Code" is also marked with "NOMAP" flag in reserve_regions(), which creates an isolated region since it now has a different attribute. Consequently only "Boot Code" and "ACPI Reclaim Memory" are unified. Look at request_standard_resources(). It handles only "memory" list, and doesn't care about whether any arbitrary part of memory is in "reserved" list or not. Thanks, -Takahiro AKASHI > > Any pointers? > > Regards, > Bhupesh > ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-16 7:00 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-11-16 7:00 UTC (permalink / raw) To: linux-arm-kernel Bhupesh, On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: > (snip) > # dmesg | grep -B 2 -i "ACPI reclaim" > [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | > | | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | > | | | | |WB|WT|WC|UC] > [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| > | | | | | | | |WB|WT|WC|UC] > > 2. Now, I am not sure which kernel layer does the following changes (I am > still trying to dig it out more), but I see that the 'Boot Code' and ACPI > DSDT table regions are somehow merged into one memblock_region and appear as > range '396c0000-3975ffff' in the '/proc/iomem' interface: > > # cat /proc/iomem | grep -A 2 -B 2 39 > 00000000-3961ffff : System RAM > 00080000-00b6ffff : Kernel code > 00cb0000-0167ffff : Kernel data > 0e800000-2e7fffff : Crash kernel > 39620000-396bffff : reserved > 396c0000-3975ffff : System RAM > 39760000-3976ffff : reserved > 39770000-397affff : reserved > 397b0000-3989ffff : reserved > 398a0000-398bffff : reserved > 398c0000-39d3ffff : reserved > 39d40000-3ed2ffff : System RAM > (snip) > > So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT > table' ranges to be merged into a single region at > '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using > 'memblock_is_reserved'. Simple:) The short answer is that memblock_add() does. The long answer: First, please note that memblock maintains two type of regions list, "memory" and "reserved". efi_init() reserve_regions() early_init_dt_add_memory_arch() memblock_add() memblock_add_range(memblock.memory) The memory regions described in efi.memmap are added to "memory" list with all the neighboring regions being merged into ones, in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. The secret here is that "Runtime Code" is also marked with "NOMAP" flag in reserve_regions(), which creates an isolated region since it now has a different attribute. Consequently only "Boot Code" and "ACPI Reclaim Memory" are unified. Look at request_standard_resources(). It handles only "memory" list, and doesn't care about whether any arbitrary part of memory is in "reserved" list or not. Thanks, -Takahiro AKASHI > > Any pointers? > > Regards, > Bhupesh > ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171116070005.GI29552-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-16 7:00 ` AKASHI Takahiro @ 2017-11-26 8:29 ` Bhupesh SHARMA -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-11-26 8:29 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, Bhupesh SHARMA Hi Akashi, On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > Bhupesh, > > On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: >> > (snip) > >> # dmesg | grep -B 2 -i "ACPI reclaim" >> [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | >> | | | | | |WB|WT|WC|UC] >> [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | >> | | | | |WB|WT|WC|UC] >> [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| >> | | | | | | | |WB|WT|WC|UC] >> >> 2. Now, I am not sure which kernel layer does the following changes (I am >> still trying to dig it out more), but I see that the 'Boot Code' and ACPI >> DSDT table regions are somehow merged into one memblock_region and appear as >> range '396c0000-3975ffff' in the '/proc/iomem' interface: >> >> # cat /proc/iomem | grep -A 2 -B 2 39 >> 00000000-3961ffff : System RAM >> 00080000-00b6ffff : Kernel code >> 00cb0000-0167ffff : Kernel data >> 0e800000-2e7fffff : Crash kernel >> 39620000-396bffff : reserved >> 396c0000-3975ffff : System RAM >> 39760000-3976ffff : reserved >> 39770000-397affff : reserved >> 397b0000-3989ffff : reserved >> 398a0000-398bffff : reserved >> 398c0000-39d3ffff : reserved >> 39d40000-3ed2ffff : System RAM >> > (snip) >> >> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT >> table' ranges to be merged into a single region at >> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using >> 'memblock_is_reserved'. > > Simple:) The short answer is that memblock_add() does. > > The long answer: > First, please note that memblock maintains two type of regions list, > "memory" and "reserved". > > efi_init() > reserve_regions() > early_init_dt_add_memory_arch() > memblock_add() > memblock_add_range(memblock.memory) > > The memory regions described in efi.memmap are added to "memory" list > with all the neighboring regions being merged into ones, > in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. > > The secret here is that "Runtime Code" is also marked with "NOMAP" flag in > reserve_regions(), which creates an isolated region since it now has > a different attribute. > Consequently only "Boot Code" and "ACPI Reclaim Memory" are > unified. > > Look at request_standard_resources(). It handles only "memory" list, > and doesn't care about whether any arbitrary part of memory is in > "reserved" list or not. Thanks for the pointers. Now I did some experiments and traversed the whole memblock path and I see how these two regions get merged into a single region which is later on recognized by 'request_standard_resources()' as a System RAM region rather than a RESERVED region. I recently reproduced this on a APM mustang with latest kernel as well when acpi is used to boot the machine, which makes me believe that this is a generic issue for arm64 machines with the 4.14 kernel and if they use acpi=force as the boot method. I am not sure, if a fix/or hack would be suitable for all underlying arm64 machines, but I am trying one on the arm64 machines I have to see if it fixes the issue. @Ard: Hi Ard, I think to create and test a clean solution for all arm64 boards it will take some time, in the meantime should we consider reverting the commit [1] to make sure that acpi enabled arm64 machines can boot with 4.14? Please let me know your opinion. [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP) Thanks, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-26 8:29 ` Bhupesh SHARMA 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-11-26 8:29 UTC (permalink / raw) To: linux-arm-kernel Hi Akashi, On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: >> > (snip) > >> # dmesg | grep -B 2 -i "ACPI reclaim" >> [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | >> | | | | | |WB|WT|WC|UC] >> [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | >> | | | | |WB|WT|WC|UC] >> [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| >> | | | | | | | |WB|WT|WC|UC] >> >> 2. Now, I am not sure which kernel layer does the following changes (I am >> still trying to dig it out more), but I see that the 'Boot Code' and ACPI >> DSDT table regions are somehow merged into one memblock_region and appear as >> range '396c0000-3975ffff' in the '/proc/iomem' interface: >> >> # cat /proc/iomem | grep -A 2 -B 2 39 >> 00000000-3961ffff : System RAM >> 00080000-00b6ffff : Kernel code >> 00cb0000-0167ffff : Kernel data >> 0e800000-2e7fffff : Crash kernel >> 39620000-396bffff : reserved >> 396c0000-3975ffff : System RAM >> 39760000-3976ffff : reserved >> 39770000-397affff : reserved >> 397b0000-3989ffff : reserved >> 398a0000-398bffff : reserved >> 398c0000-39d3ffff : reserved >> 39d40000-3ed2ffff : System RAM >> > (snip) >> >> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT >> table' ranges to be merged into a single region at >> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using >> 'memblock_is_reserved'. > > Simple:) The short answer is that memblock_add() does. > > The long answer: > First, please note that memblock maintains two type of regions list, > "memory" and "reserved". > > efi_init() > reserve_regions() > early_init_dt_add_memory_arch() > memblock_add() > memblock_add_range(memblock.memory) > > The memory regions described in efi.memmap are added to "memory" list > with all the neighboring regions being merged into ones, > in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. > > The secret here is that "Runtime Code" is also marked with "NOMAP" flag in > reserve_regions(), which creates an isolated region since it now has > a different attribute. > Consequently only "Boot Code" and "ACPI Reclaim Memory" are > unified. > > Look at request_standard_resources(). It handles only "memory" list, > and doesn't care about whether any arbitrary part of memory is in > "reserved" list or not. Thanks for the pointers. Now I did some experiments and traversed the whole memblock path and I see how these two regions get merged into a single region which is later on recognized by 'request_standard_resources()' as a System RAM region rather than a RESERVED region. I recently reproduced this on a APM mustang with latest kernel as well when acpi is used to boot the machine, which makes me believe that this is a generic issue for arm64 machines with the 4.14 kernel and if they use acpi=force as the boot method. I am not sure, if a fix/or hack would be suitable for all underlying arm64 machines, but I am trying one on the arm64 machines I have to see if it fixes the issue. @Ard: Hi Ard, I think to create and test a clean solution for all arm64 boards it will take some time, in the meantime should we consider reverting the commit [1] to make sure that acpi enabled arm64 machines can boot with 4.14? Please let me know your opinion. [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP) Thanks, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAFTCetQHmpprAVu6uYO+rc5Xi4EUVhmovbmSaU6nM1n1mAH62w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-26 8:29 ` Bhupesh SHARMA @ 2017-12-04 14:02 ` Ard Biesheuvel -1 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-04 14:02 UTC (permalink / raw) To: Bhupesh SHARMA Cc: AKASHI Takahiro, Bhupesh Sharma, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hi Akashi, > > On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> Bhupesh, >> >> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: >>> >> (snip) >> >>> # dmesg | grep -B 2 -i "ACPI reclaim" >>> [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | >>> | | | | | |WB|WT|WC|UC] >>> [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | >>> | | | | |WB|WT|WC|UC] >>> [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| >>> | | | | | | | |WB|WT|WC|UC] >>> >>> 2. Now, I am not sure which kernel layer does the following changes (I am >>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI >>> DSDT table regions are somehow merged into one memblock_region and appear as >>> range '396c0000-3975ffff' in the '/proc/iomem' interface: >>> >>> # cat /proc/iomem | grep -A 2 -B 2 39 >>> 00000000-3961ffff : System RAM >>> 00080000-00b6ffff : Kernel code >>> 00cb0000-0167ffff : Kernel data >>> 0e800000-2e7fffff : Crash kernel >>> 39620000-396bffff : reserved >>> 396c0000-3975ffff : System RAM >>> 39760000-3976ffff : reserved >>> 39770000-397affff : reserved >>> 397b0000-3989ffff : reserved >>> 398a0000-398bffff : reserved >>> 398c0000-39d3ffff : reserved >>> 39d40000-3ed2ffff : System RAM >>> >> (snip) >>> >>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT >>> table' ranges to be merged into a single region at >>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using >>> 'memblock_is_reserved'. >> >> Simple:) The short answer is that memblock_add() does. >> >> The long answer: >> First, please note that memblock maintains two type of regions list, >> "memory" and "reserved". >> >> efi_init() >> reserve_regions() >> early_init_dt_add_memory_arch() >> memblock_add() >> memblock_add_range(memblock.memory) >> >> The memory regions described in efi.memmap are added to "memory" list >> with all the neighboring regions being merged into ones, >> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. >> >> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in >> reserve_regions(), which creates an isolated region since it now has >> a different attribute. >> Consequently only "Boot Code" and "ACPI Reclaim Memory" are >> unified. >> >> Look at request_standard_resources(). It handles only "memory" list, >> and doesn't care about whether any arbitrary part of memory is in >> "reserved" list or not. > > Thanks for the pointers. Now I did some experiments and traversed the > whole memblock path and I see > how these two regions get merged into a single region which is later > on recognized by > 'request_standard_resources()' as a System RAM region rather than a > RESERVED region. > > I recently reproduced this on a APM mustang with latest kernel as well > when acpi is used to boot the machine, which makes me believe that > this is a generic issue for arm64 machines with the 4.14 kernel and if > they use acpi=force as the boot method. > > I am not sure, if a fix/or hack would be suitable for all underlying > arm64 machines, but I am trying one on the arm64 machines I have to > see if it fixes the issue. > > @Ard: > > Hi Ard, > > I think to create and test a clean solution for all arm64 boards it > will take some time, in the meantime should we consider reverting the > commit [1] to make sure that acpi enabled arm64 machines can boot with > 4.14? > > Please let me know your opinion. > > [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark > ACPI reclaim memory as MEMBLOCK_NOMAP) > I don't think that is really going to help tbh. ACPI reclaim regions are not the only regions that are memblock_reserve()d and need to be reserved by the incoming kernel as well. So as far as I can tell, this is a symptom of an underlying issue that we will need to solve, and reverting the code that exposed it will not make the bug go away. -- Ard. ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-04 14:02 ` Ard Biesheuvel 0 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-04 14:02 UTC (permalink / raw) To: linux-arm-kernel On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux@gmail.com> wrote: > Hi Akashi, > > On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> Bhupesh, >> >> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: >>> >> (snip) >> >>> # dmesg | grep -B 2 -i "ACPI reclaim" >>> [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | >>> | | | | | |WB|WT|WC|UC] >>> [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | >>> | | | | |WB|WT|WC|UC] >>> [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| >>> | | | | | | | |WB|WT|WC|UC] >>> >>> 2. Now, I am not sure which kernel layer does the following changes (I am >>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI >>> DSDT table regions are somehow merged into one memblock_region and appear as >>> range '396c0000-3975ffff' in the '/proc/iomem' interface: >>> >>> # cat /proc/iomem | grep -A 2 -B 2 39 >>> 00000000-3961ffff : System RAM >>> 00080000-00b6ffff : Kernel code >>> 00cb0000-0167ffff : Kernel data >>> 0e800000-2e7fffff : Crash kernel >>> 39620000-396bffff : reserved >>> 396c0000-3975ffff : System RAM >>> 39760000-3976ffff : reserved >>> 39770000-397affff : reserved >>> 397b0000-3989ffff : reserved >>> 398a0000-398bffff : reserved >>> 398c0000-39d3ffff : reserved >>> 39d40000-3ed2ffff : System RAM >>> >> (snip) >>> >>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT >>> table' ranges to be merged into a single region at >>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using >>> 'memblock_is_reserved'. >> >> Simple:) The short answer is that memblock_add() does. >> >> The long answer: >> First, please note that memblock maintains two type of regions list, >> "memory" and "reserved". >> >> efi_init() >> reserve_regions() >> early_init_dt_add_memory_arch() >> memblock_add() >> memblock_add_range(memblock.memory) >> >> The memory regions described in efi.memmap are added to "memory" list >> with all the neighboring regions being merged into ones, >> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. >> >> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in >> reserve_regions(), which creates an isolated region since it now has >> a different attribute. >> Consequently only "Boot Code" and "ACPI Reclaim Memory" are >> unified. >> >> Look at request_standard_resources(). It handles only "memory" list, >> and doesn't care about whether any arbitrary part of memory is in >> "reserved" list or not. > > Thanks for the pointers. Now I did some experiments and traversed the > whole memblock path and I see > how these two regions get merged into a single region which is later > on recognized by > 'request_standard_resources()' as a System RAM region rather than a > RESERVED region. > > I recently reproduced this on a APM mustang with latest kernel as well > when acpi is used to boot the machine, which makes me believe that > this is a generic issue for arm64 machines with the 4.14 kernel and if > they use acpi=force as the boot method. > > I am not sure, if a fix/or hack would be suitable for all underlying > arm64 machines, but I am trying one on the arm64 machines I have to > see if it fixes the issue. > > @Ard: > > Hi Ard, > > I think to create and test a clean solution for all arm64 boards it > will take some time, in the meantime should we consider reverting the > commit [1] to make sure that acpi enabled arm64 machines can boot with > 4.14? > > Please let me know your opinion. > > [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark > ACPI reclaim memory as MEMBLOCK_NOMAP) > I don't think that is really going to help tbh. ACPI reclaim regions are not the only regions that are memblock_reserve()d and need to be reserved by the incoming kernel as well. So as far as I can tell, this is a symptom of an underlying issue that we will need to solve, and reverting the code that exposed it will not make the bug go away. -- Ard. ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAKv+Gu9oda1Ee8AoXsCEw+Bjn-XF3wZA_CsxvqhjtT6_bmJ7uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-04 14:02 ` Ard Biesheuvel @ 2017-12-12 21:51 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-12 21:51 UTC (permalink / raw) To: Ard Biesheuvel Cc: Bhupesh SHARMA, AKASHI Takahiro, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A Hi Ard, Akashi On Mon, Dec 4, 2017 at 7:32 PM, Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Hi Akashi, >> >> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >>> Bhupesh, >>> >>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: >>>> >>> (snip) >>> >>>> # dmesg | grep -B 2 -i "ACPI reclaim" >>>> [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | >>>> | | | | | |WB|WT|WC|UC] >>>> [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | >>>> | | | | |WB|WT|WC|UC] >>>> [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| >>>> | | | | | | | |WB|WT|WC|UC] >>>> >>>> 2. Now, I am not sure which kernel layer does the following changes (I am >>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI >>>> DSDT table regions are somehow merged into one memblock_region and appear as >>>> range '396c0000-3975ffff' in the '/proc/iomem' interface: >>>> >>>> # cat /proc/iomem | grep -A 2 -B 2 39 >>>> 00000000-3961ffff : System RAM >>>> 00080000-00b6ffff : Kernel code >>>> 00cb0000-0167ffff : Kernel data >>>> 0e800000-2e7fffff : Crash kernel >>>> 39620000-396bffff : reserved >>>> 396c0000-3975ffff : System RAM >>>> 39760000-3976ffff : reserved >>>> 39770000-397affff : reserved >>>> 397b0000-3989ffff : reserved >>>> 398a0000-398bffff : reserved >>>> 398c0000-39d3ffff : reserved >>>> 39d40000-3ed2ffff : System RAM >>>> >>> (snip) >>>> >>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT >>>> table' ranges to be merged into a single region at >>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using >>>> 'memblock_is_reserved'. >>> >>> Simple:) The short answer is that memblock_add() does. >>> >>> The long answer: >>> First, please note that memblock maintains two type of regions list, >>> "memory" and "reserved". >>> >>> efi_init() >>> reserve_regions() >>> early_init_dt_add_memory_arch() >>> memblock_add() >>> memblock_add_range(memblock.memory) >>> >>> The memory regions described in efi.memmap are added to "memory" list >>> with all the neighboring regions being merged into ones, >>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. >>> >>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in >>> reserve_regions(), which creates an isolated region since it now has >>> a different attribute. >>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are >>> unified. >>> >>> Look at request_standard_resources(). It handles only "memory" list, >>> and doesn't care about whether any arbitrary part of memory is in >>> "reserved" list or not. >> >> Thanks for the pointers. Now I did some experiments and traversed the >> whole memblock path and I see >> how these two regions get merged into a single region which is later >> on recognized by >> 'request_standard_resources()' as a System RAM region rather than a >> RESERVED region. >> >> I recently reproduced this on a APM mustang with latest kernel as well >> when acpi is used to boot the machine, which makes me believe that >> this is a generic issue for arm64 machines with the 4.14 kernel and if >> they use acpi=force as the boot method. >> >> I am not sure, if a fix/or hack would be suitable for all underlying >> arm64 machines, but I am trying one on the arm64 machines I have to >> see if it fixes the issue. >> >> @Ard: >> >> Hi Ard, >> >> I think to create and test a clean solution for all arm64 boards it >> will take some time, in the meantime should we consider reverting the >> commit [1] to make sure that acpi enabled arm64 machines can boot with >> 4.14? >> >> Please let me know your opinion. >> >> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark >> ACPI reclaim memory as MEMBLOCK_NOMAP) >> > > I don't think that is really going to help tbh. > > ACPI reclaim regions are not the only regions that are > memblock_reserve()d and need to be reserved by the incoming kernel as > well. So as far as I can tell, this is a symptom of an underlying > issue that we will need to solve, and reverting the code that exposed > it will not make the bug go away. > Looking deeper into the issue, since the arm64 kexec-tools uses the 'linux,usable-memory-range' dt property to allow crash dump kernel to identify its own usable memory and exclude, at its boot time, any other memory areas that are part of the panicked kernel's memory. (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt , for details) 1). Now when 'kexec -p' is executed, this node is patched up only with the crashkernel memory range: /* add linux,usable-memory-range */ nodeoffset = fdt_path_offset(new_buf, "/chosen"); result = fdt_setprop_range(new_buf, nodeoffset, PROP_USABLE_MEM_RANGE, &crash_reserved_mem, address_cells, size_cells); (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 , for details) 2). This excludes the ACPI reclaim regions irrespective of whether they are marked as System RAM or as RESERVED. As, 'linux,usable-memory-range' dt node is patched up only with 'crash_reserved_mem' and not 'system_memory_ranges' 3). As a result when the crashkernel boots up it doesn't find this ACPI memory and crashes while trying to access the same: # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d [snip..] Reserved memory range 000000000e800000-000000002e7fffff (0) Coredump memory ranges 0000000000000000-000000000e7fffff (0) 000000002e800000-000000003961ffff (0) 0000000039d40000-000000003ed2ffff (0) 000000003ed60000-000000003fbfffff (0) 0000001040000000-0000001ffbffffff (0) 0000002000000000-0000002ffbffffff (0) 0000009000000000-0000009ffbffffff (0) 000000a000000000-000000affbffffff (0) 4). So if we revert Ard's patch or just comment the fixing up of the memory cap'ing passed to the crash kernel inside 'arch/arm64/mm/init.c' (see below): static void __init fdt_enforce_memory_region(void) { struct memblock_region reg = { .size = 0, }; of_scan_flat_dt(early_init_dt_scan_usablemem, ®); if (reg.size) //memblock_cap_memory_range(reg.base, reg.size); /* comment this out */ } 5). Both the above temporary solutions fix the problem. 6). However exposing all System RAM regions to the crashkernel is not advisable and may cause the crashkernel or some crashkernel drivers to fail. 6a). I am trying an approach now, where the ACPI reclaim regions are added to '/proc/iomem' separately as ACPI reclaim regions by the kernel code and on the other hand the user-space 'kexec-tools' will pick up the ACPI reclaim regions from '/proc/iomem' and add it to the dt node 'linux,usable-memory-range' 6b). The kernel code currently looks like the following: diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 30ad2f085d1f..867bdec7c692 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) { struct memblock_region *region; struct resource *res; + phys_addr_t addr_start, addr_end; kernel_code.start = __pa_symbol(_text); kernel_code.end = __pa_symbol(__init_begin - 1); @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) res->name = "reserved"; res->flags = IORESOURCE_MEM; } else { - res->name = "System RAM"; - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + addr_start = __pfn_to_phys(memblock_region_reserved_base_pfn(region)); + addr_end = __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { + res->name = "ACPI reclaim region"; + res->flags = IORESOURCE_MEM; + } else { + res->name = "System RAM"; + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + } } + res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) request_standard_resources(); + efi_memmap_unmap(); early_ioremap_reset(); if (acpi_disabled) diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index 80d1a885def5..a7c522eac640 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -259,7 +259,6 @@ void __init efi_init(void) reserve_regions(); efi_esrt_init(); - efi_memmap_unmap(); memblock_reserve(params.mmap & PAGE_MASK, PAGE_ALIGN(params.mmap_size + After this change the ACPI reclaim regions are properly recognized in '/proc/iomem': # cat /proc/iomem | grep -i ACPI 396c0000-3975ffff : ACPI reclaim region 39770000-397affff : ACPI reclaim region 398a0000-398bffff : ACPI reclaim region 6c). I am currently changing the 'kexec-tools' and will finish the testing over the next few days. I just wanted to know your opinion on this issue, so that I will be able to propose a fix on the above lines. Also Cc'ing kexec mailing list for more inputs on changes proposed to kexec-tools. Thanks, Bhupesh ^ permalink raw reply related [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-12 21:51 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-12 21:51 UTC (permalink / raw) To: linux-arm-kernel Hi Ard, Akashi On Mon, Dec 4, 2017 at 7:32 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux@gmail.com> wrote: >> Hi Akashi, >> >> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >>> Bhupesh, >>> >>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote: >>>> >>> (snip) >>> >>>> # dmesg | grep -B 2 -i "ACPI reclaim" >>>> [ 0.000000] efi: 0x000039670000-0x0000396bffff [Runtime Code |RUN| | >>>> | | | | | |WB|WT|WC|UC] >>>> [ 0.000000] efi: 0x0000396c0000-0x00003970ffff [Boot Code | | | | >>>> | | | | |WB|WT|WC|UC] >>>> [ 0.000000] efi: 0x000039710000-0x00003975ffff [ACPI Reclaim Memory| >>>> | | | | | | | |WB|WT|WC|UC] >>>> >>>> 2. Now, I am not sure which kernel layer does the following changes (I am >>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI >>>> DSDT table regions are somehow merged into one memblock_region and appear as >>>> range '396c0000-3975ffff' in the '/proc/iomem' interface: >>>> >>>> # cat /proc/iomem | grep -A 2 -B 2 39 >>>> 00000000-3961ffff : System RAM >>>> 00080000-00b6ffff : Kernel code >>>> 00cb0000-0167ffff : Kernel data >>>> 0e800000-2e7fffff : Crash kernel >>>> 39620000-396bffff : reserved >>>> 396c0000-3975ffff : System RAM >>>> 39760000-3976ffff : reserved >>>> 39770000-397affff : reserved >>>> 397b0000-3989ffff : reserved >>>> 398a0000-398bffff : reserved >>>> 398c0000-39d3ffff : reserved >>>> 39d40000-3ed2ffff : System RAM >>>> >>> (snip) >>>> >>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT >>>> table' ranges to be merged into a single region at >>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using >>>> 'memblock_is_reserved'. >>> >>> Simple:) The short answer is that memblock_add() does. >>> >>> The long answer: >>> First, please note that memblock maintains two type of regions list, >>> "memory" and "reserved". >>> >>> efi_init() >>> reserve_regions() >>> early_init_dt_add_memory_arch() >>> memblock_add() >>> memblock_add_range(memblock.memory) >>> >>> The memory regions described in efi.memmap are added to "memory" list >>> with all the neighboring regions being merged into ones, >>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others. >>> >>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in >>> reserve_regions(), which creates an isolated region since it now has >>> a different attribute. >>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are >>> unified. >>> >>> Look at request_standard_resources(). It handles only "memory" list, >>> and doesn't care about whether any arbitrary part of memory is in >>> "reserved" list or not. >> >> Thanks for the pointers. Now I did some experiments and traversed the >> whole memblock path and I see >> how these two regions get merged into a single region which is later >> on recognized by >> 'request_standard_resources()' as a System RAM region rather than a >> RESERVED region. >> >> I recently reproduced this on a APM mustang with latest kernel as well >> when acpi is used to boot the machine, which makes me believe that >> this is a generic issue for arm64 machines with the 4.14 kernel and if >> they use acpi=force as the boot method. >> >> I am not sure, if a fix/or hack would be suitable for all underlying >> arm64 machines, but I am trying one on the arm64 machines I have to >> see if it fixes the issue. >> >> @Ard: >> >> Hi Ard, >> >> I think to create and test a clean solution for all arm64 boards it >> will take some time, in the meantime should we consider reverting the >> commit [1] to make sure that acpi enabled arm64 machines can boot with >> 4.14? >> >> Please let me know your opinion. >> >> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark >> ACPI reclaim memory as MEMBLOCK_NOMAP) >> > > I don't think that is really going to help tbh. > > ACPI reclaim regions are not the only regions that are > memblock_reserve()d and need to be reserved by the incoming kernel as > well. So as far as I can tell, this is a symptom of an underlying > issue that we will need to solve, and reverting the code that exposed > it will not make the bug go away. > Looking deeper into the issue, since the arm64 kexec-tools uses the 'linux,usable-memory-range' dt property to allow crash dump kernel to identify its own usable memory and exclude, at its boot time, any other memory areas that are part of the panicked kernel's memory. (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt , for details) 1). Now when 'kexec -p' is executed, this node is patched up only with the crashkernel memory range: /* add linux,usable-memory-range */ nodeoffset = fdt_path_offset(new_buf, "/chosen"); result = fdt_setprop_range(new_buf, nodeoffset, PROP_USABLE_MEM_RANGE, &crash_reserved_mem, address_cells, size_cells); (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 , for details) 2). This excludes the ACPI reclaim regions irrespective of whether they are marked as System RAM or as RESERVED. As, 'linux,usable-memory-range' dt node is patched up only with 'crash_reserved_mem' and not 'system_memory_ranges' 3). As a result when the crashkernel boots up it doesn't find this ACPI memory and crashes while trying to access the same: # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d [snip..] Reserved memory range 000000000e800000-000000002e7fffff (0) Coredump memory ranges 0000000000000000-000000000e7fffff (0) 000000002e800000-000000003961ffff (0) 0000000039d40000-000000003ed2ffff (0) 000000003ed60000-000000003fbfffff (0) 0000001040000000-0000001ffbffffff (0) 0000002000000000-0000002ffbffffff (0) 0000009000000000-0000009ffbffffff (0) 000000a000000000-000000affbffffff (0) 4). So if we revert Ard's patch or just comment the fixing up of the memory cap'ing passed to the crash kernel inside 'arch/arm64/mm/init.c' (see below): static void __init fdt_enforce_memory_region(void) { struct memblock_region reg = { .size = 0, }; of_scan_flat_dt(early_init_dt_scan_usablemem, ®); if (reg.size) //memblock_cap_memory_range(reg.base, reg.size); /* comment this out */ } 5). Both the above temporary solutions fix the problem. 6). However exposing all System RAM regions to the crashkernel is not advisable and may cause the crashkernel or some crashkernel drivers to fail. 6a). I am trying an approach now, where the ACPI reclaim regions are added to '/proc/iomem' separately as ACPI reclaim regions by the kernel code and on the other hand the user-space 'kexec-tools' will pick up the ACPI reclaim regions from '/proc/iomem' and add it to the dt node 'linux,usable-memory-range' 6b). The kernel code currently looks like the following: diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 30ad2f085d1f..867bdec7c692 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) { struct memblock_region *region; struct resource *res; + phys_addr_t addr_start, addr_end; kernel_code.start = __pa_symbol(_text); kernel_code.end = __pa_symbol(__init_begin - 1); @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) res->name = "reserved"; res->flags = IORESOURCE_MEM; } else { - res->name = "System RAM"; - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + addr_start = __pfn_to_phys(memblock_region_reserved_base_pfn(region)); + addr_end = __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { + res->name = "ACPI reclaim region"; + res->flags = IORESOURCE_MEM; + } else { + res->name = "System RAM"; + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + } } + res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) request_standard_resources(); + efi_memmap_unmap(); early_ioremap_reset(); if (acpi_disabled) diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index 80d1a885def5..a7c522eac640 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -259,7 +259,6 @@ void __init efi_init(void) reserve_regions(); efi_esrt_init(); - efi_memmap_unmap(); memblock_reserve(params.mmap & PAGE_MASK, PAGE_ALIGN(params.mmap_size + After this change the ACPI reclaim regions are properly recognized in '/proc/iomem': # cat /proc/iomem | grep -i ACPI 396c0000-3975ffff : ACPI reclaim region 39770000-397affff : ACPI reclaim region 398a0000-398bffff : ACPI reclaim region 6c). I am currently changing the 'kexec-tools' and will finish the testing over the next few days. I just wanted to know your opinion on this issue, so that I will be able to propose a fix on the above lines. Also Cc'ing kexec mailing list for more inputs on changes proposed to kexec-tools. Thanks, Bhupesh ^ permalink raw reply related [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpOZ=WOx14gTwH5jfLozepT2Jw8JSY5x+bfEZ_YaiQvFpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-12 21:51 ` Bhupesh Sharma @ 2017-12-13 10:26 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-13 10:26 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A Bhupesh, Ard, On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > Hi Ard, Akashi > (snip) > Looking deeper into the issue, since the arm64 kexec-tools uses the > 'linux,usable-memory-range' dt property to allow crash dump kernel to > identify its own usable memory and exclude, at its boot time, any > other memory areas that are part of the panicked kernel's memory. > (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > , for details) Right. > 1). Now when 'kexec -p' is executed, this node is patched up only > with the crashkernel memory range: > > /* add linux,usable-memory-range */ > nodeoffset = fdt_path_offset(new_buf, "/chosen"); > result = fdt_setprop_range(new_buf, nodeoffset, > PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > address_cells, size_cells); > > (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > , for details) > > 2). This excludes the ACPI reclaim regions irrespective of whether > they are marked as System RAM or as RESERVED. As, > 'linux,usable-memory-range' dt node is patched up only with > 'crash_reserved_mem' and not 'system_memory_ranges' > > 3). As a result when the crashkernel boots up it doesn't find this > ACPI memory and crashes while trying to access the same: > > # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > -r`.img --reuse-cmdline -d > > [snip..] > > Reserved memory range > 000000000e800000-000000002e7fffff (0) > > Coredump memory ranges > 0000000000000000-000000000e7fffff (0) > 000000002e800000-000000003961ffff (0) > 0000000039d40000-000000003ed2ffff (0) > 000000003ed60000-000000003fbfffff (0) > 0000001040000000-0000001ffbffffff (0) > 0000002000000000-0000002ffbffffff (0) > 0000009000000000-0000009ffbffffff (0) > 000000a000000000-000000affbffffff (0) > > 4). So if we revert Ard's patch or just comment the fixing up of the > memory cap'ing passed to the crash kernel inside > 'arch/arm64/mm/init.c' (see below): > > static void __init fdt_enforce_memory_region(void) > { > struct memblock_region reg = { > .size = 0, > }; > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > if (reg.size) > //memblock_cap_memory_range(reg.base, reg.size); /* > comment this out */ > } Please just don't do that. It can cause a fatal damage on memory contents of the *crashed* kernel. > 5). Both the above temporary solutions fix the problem. > > 6). However exposing all System RAM regions to the crashkernel is not > advisable and may cause the crashkernel or some crashkernel drivers to > fail. > > 6a). I am trying an approach now, where the ACPI reclaim regions are > added to '/proc/iomem' separately as ACPI reclaim regions by the > kernel code and on the other hand the user-space 'kexec-tools' will > pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > dt node 'linux,usable-memory-range' I still don't understand why we need to carry over the information about "ACPI Reclaim memory" to crash dump kernel. In my understandings, such regions are free to be reused by the kernel after some point of initialization. Why does crash dump kernel need to know about them? (In other words, can or should we skip some part of ACPI-related init code on crash dump kernel?) Thanks, -Takahiro AKASHI > 6b). The kernel code currently looks like the following: > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 30ad2f085d1f..867bdec7c692 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) > { > struct memblock_region *region; > struct resource *res; > + phys_addr_t addr_start, addr_end; > > kernel_code.start = __pa_symbol(_text); > kernel_code.end = __pa_symbol(__init_begin - 1); > @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) > res->name = "reserved"; > res->flags = IORESOURCE_MEM; > } else { > - res->name = "System RAM"; > - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + addr_start = > __pfn_to_phys(memblock_region_reserved_base_pfn(region)); > + addr_end = > __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; > + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) > || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { > + res->name = "ACPI reclaim region"; > + res->flags = IORESOURCE_MEM; > + } else { > + res->name = "System RAM"; > + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + } > } > + > res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); > res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; > > @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) > > request_standard_resources(); > > + efi_memmap_unmap(); > early_ioremap_reset(); > > if (acpi_disabled) > diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > index 80d1a885def5..a7c522eac640 100644 > --- a/drivers/firmware/efi/arm-init.c > +++ b/drivers/firmware/efi/arm-init.c > @@ -259,7 +259,6 @@ void __init efi_init(void) > > reserve_regions(); > efi_esrt_init(); > - efi_memmap_unmap(); > > memblock_reserve(params.mmap & PAGE_MASK, > PAGE_ALIGN(params.mmap_size + > > > After this change the ACPI reclaim regions are properly recognized in > '/proc/iomem': > > # cat /proc/iomem | grep -i ACPI > 396c0000-3975ffff : ACPI reclaim region > 39770000-397affff : ACPI reclaim region > 398a0000-398bffff : ACPI reclaim region > > 6c). I am currently changing the 'kexec-tools' and will finish the > testing over the next few days. > > I just wanted to know your opinion on this issue, so that I will be > able to propose a fix on the above lines. > > Also Cc'ing kexec mailing list for more inputs on changes proposed to > kexec-tools. > > Thanks, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-13 10:26 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-13 10:26 UTC (permalink / raw) To: linux-arm-kernel Bhupesh, Ard, On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > Hi Ard, Akashi > (snip) > Looking deeper into the issue, since the arm64 kexec-tools uses the > 'linux,usable-memory-range' dt property to allow crash dump kernel to > identify its own usable memory and exclude, at its boot time, any > other memory areas that are part of the panicked kernel's memory. > (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > , for details) Right. > 1). Now when 'kexec -p' is executed, this node is patched up only > with the crashkernel memory range: > > /* add linux,usable-memory-range */ > nodeoffset = fdt_path_offset(new_buf, "/chosen"); > result = fdt_setprop_range(new_buf, nodeoffset, > PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > address_cells, size_cells); > > (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > , for details) > > 2). This excludes the ACPI reclaim regions irrespective of whether > they are marked as System RAM or as RESERVED. As, > 'linux,usable-memory-range' dt node is patched up only with > 'crash_reserved_mem' and not 'system_memory_ranges' > > 3). As a result when the crashkernel boots up it doesn't find this > ACPI memory and crashes while trying to access the same: > > # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > -r`.img --reuse-cmdline -d > > [snip..] > > Reserved memory range > 000000000e800000-000000002e7fffff (0) > > Coredump memory ranges > 0000000000000000-000000000e7fffff (0) > 000000002e800000-000000003961ffff (0) > 0000000039d40000-000000003ed2ffff (0) > 000000003ed60000-000000003fbfffff (0) > 0000001040000000-0000001ffbffffff (0) > 0000002000000000-0000002ffbffffff (0) > 0000009000000000-0000009ffbffffff (0) > 000000a000000000-000000affbffffff (0) > > 4). So if we revert Ard's patch or just comment the fixing up of the > memory cap'ing passed to the crash kernel inside > 'arch/arm64/mm/init.c' (see below): > > static void __init fdt_enforce_memory_region(void) > { > struct memblock_region reg = { > .size = 0, > }; > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > if (reg.size) > //memblock_cap_memory_range(reg.base, reg.size); /* > comment this out */ > } Please just don't do that. It can cause a fatal damage on memory contents of the *crashed* kernel. > 5). Both the above temporary solutions fix the problem. > > 6). However exposing all System RAM regions to the crashkernel is not > advisable and may cause the crashkernel or some crashkernel drivers to > fail. > > 6a). I am trying an approach now, where the ACPI reclaim regions are > added to '/proc/iomem' separately as ACPI reclaim regions by the > kernel code and on the other hand the user-space 'kexec-tools' will > pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > dt node 'linux,usable-memory-range' I still don't understand why we need to carry over the information about "ACPI Reclaim memory" to crash dump kernel. In my understandings, such regions are free to be reused by the kernel after some point of initialization. Why does crash dump kernel need to know about them? (In other words, can or should we skip some part of ACPI-related init code on crash dump kernel?) Thanks, -Takahiro AKASHI > 6b). The kernel code currently looks like the following: > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > index 30ad2f085d1f..867bdec7c692 100644 > --- a/arch/arm64/kernel/setup.c > +++ b/arch/arm64/kernel/setup.c > @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) > { > struct memblock_region *region; > struct resource *res; > + phys_addr_t addr_start, addr_end; > > kernel_code.start = __pa_symbol(_text); > kernel_code.end = __pa_symbol(__init_begin - 1); > @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) > res->name = "reserved"; > res->flags = IORESOURCE_MEM; > } else { > - res->name = "System RAM"; > - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + addr_start = > __pfn_to_phys(memblock_region_reserved_base_pfn(region)); > + addr_end = > __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; > + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) > || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { > + res->name = "ACPI reclaim region"; > + res->flags = IORESOURCE_MEM; > + } else { > + res->name = "System RAM"; > + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + } > } > + > res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); > res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; > > @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) > > request_standard_resources(); > > + efi_memmap_unmap(); > early_ioremap_reset(); > > if (acpi_disabled) > diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > index 80d1a885def5..a7c522eac640 100644 > --- a/drivers/firmware/efi/arm-init.c > +++ b/drivers/firmware/efi/arm-init.c > @@ -259,7 +259,6 @@ void __init efi_init(void) > > reserve_regions(); > efi_esrt_init(); > - efi_memmap_unmap(); > > memblock_reserve(params.mmap & PAGE_MASK, > PAGE_ALIGN(params.mmap_size + > > > After this change the ACPI reclaim regions are properly recognized in > '/proc/iomem': > > # cat /proc/iomem | grep -i ACPI > 396c0000-3975ffff : ACPI reclaim region > 39770000-397affff : ACPI reclaim region > 398a0000-398bffff : ACPI reclaim region > > 6c). I am currently changing the 'kexec-tools' and will finish the > testing over the next few days. > > I just wanted to know your opinion on this issue, so that I will be > able to propose a fix on the above lines. > > Also Cc'ing kexec mailing list for more inputs on changes proposed to > kexec-tools. > > Thanks, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171213102624.GC28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-13 10:26 ` AKASHI Takahiro @ 2017-12-13 10:49 ` Ard Biesheuvel -1 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-13 10:49 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On 13 December 2017 at 10:26, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > Bhupesh, Ard, > > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> Hi Ard, Akashi >> > (snip) > >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> identify its own usable memory and exclude, at its boot time, any >> other memory areas that are part of the panicked kernel's memory. >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> , for details) > > Right. > >> 1). Now when 'kexec -p' is executed, this node is patched up only >> with the crashkernel memory range: >> >> /* add linux,usable-memory-range */ >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> result = fdt_setprop_range(new_buf, nodeoffset, >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> address_cells, size_cells); >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> , for details) >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> they are marked as System RAM or as RESERVED. As, >> 'linux,usable-memory-range' dt node is patched up only with >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> 3). As a result when the crashkernel boots up it doesn't find this >> ACPI memory and crashes while trying to access the same: >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> -r`.img --reuse-cmdline -d >> >> [snip..] >> >> Reserved memory range >> 000000000e800000-000000002e7fffff (0) >> >> Coredump memory ranges >> 0000000000000000-000000000e7fffff (0) >> 000000002e800000-000000003961ffff (0) >> 0000000039d40000-000000003ed2ffff (0) >> 000000003ed60000-000000003fbfffff (0) >> 0000001040000000-0000001ffbffffff (0) >> 0000002000000000-0000002ffbffffff (0) >> 0000009000000000-0000009ffbffffff (0) >> 000000a000000000-000000affbffffff (0) >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> memory cap'ing passed to the crash kernel inside >> 'arch/arm64/mm/init.c' (see below): >> >> static void __init fdt_enforce_memory_region(void) >> { >> struct memblock_region reg = { >> .size = 0, >> }; >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> if (reg.size) >> //memblock_cap_memory_range(reg.base, reg.size); /* >> comment this out */ >> } > > Please just don't do that. It can cause a fatal damage on > memory contents of the *crashed* kernel. > >> 5). Both the above temporary solutions fix the problem. >> >> 6). However exposing all System RAM regions to the crashkernel is not >> advisable and may cause the crashkernel or some crashkernel drivers to >> fail. >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> kernel code and on the other hand the user-space 'kexec-tools' will >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> dt node 'linux,usable-memory-range' > > I still don't understand why we need to carry over the information > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > such regions are free to be reused by the kernel after some point of > initialization. Why does crash dump kernel need to know about them? > Not really. According to the UEFI spec, they can be reclaimed after the OS has initialized, i.e., when it has consumed the ACPI tables and no longer needs them. Of course, in order to be able to boot a kexec kernel, those regions needs to be preserved, which is why they are memblock_reserve()'d now. So it seems that kexec does not honour the memblock_reserve() table when booting the next kernel. > (In other words, can or should we skip some part of ACPI-related init code > on crash dump kernel?) > I don't think so. And the change to the handling of ACPI reclaim regions only revealed the bug, not created it (given that other memblock_reserve regions may be affected as well) >> 6b). The kernel code currently looks like the following: >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> index 30ad2f085d1f..867bdec7c692 100644 >> --- a/arch/arm64/kernel/setup.c >> +++ b/arch/arm64/kernel/setup.c >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) >> { >> struct memblock_region *region; >> struct resource *res; >> + phys_addr_t addr_start, addr_end; >> >> kernel_code.start = __pa_symbol(_text); >> kernel_code.end = __pa_symbol(__init_begin - 1); >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) >> res->name = "reserved"; >> res->flags = IORESOURCE_MEM; >> } else { >> - res->name = "System RAM"; >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> + addr_start = >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); >> + addr_end = >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { >> + res->name = "ACPI reclaim region"; >> + res->flags = IORESOURCE_MEM; >> + } else { >> + res->name = "System RAM"; >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> + } >> } >> + >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) >> >> request_standard_resources(); >> >> + efi_memmap_unmap(); >> early_ioremap_reset(); >> >> if (acpi_disabled) >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c >> index 80d1a885def5..a7c522eac640 100644 >> --- a/drivers/firmware/efi/arm-init.c >> +++ b/drivers/firmware/efi/arm-init.c >> @@ -259,7 +259,6 @@ void __init efi_init(void) >> >> reserve_regions(); >> efi_esrt_init(); >> - efi_memmap_unmap(); >> >> memblock_reserve(params.mmap & PAGE_MASK, >> PAGE_ALIGN(params.mmap_size + >> >> >> After this change the ACPI reclaim regions are properly recognized in >> '/proc/iomem': >> >> # cat /proc/iomem | grep -i ACPI >> 396c0000-3975ffff : ACPI reclaim region >> 39770000-397affff : ACPI reclaim region >> 398a0000-398bffff : ACPI reclaim region >> >> 6c). I am currently changing the 'kexec-tools' and will finish the >> testing over the next few days. >> >> I just wanted to know your opinion on this issue, so that I will be >> able to propose a fix on the above lines. >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to >> kexec-tools. >> >> Thanks, >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-13 10:49 ` Ard Biesheuvel 0 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-13 10:49 UTC (permalink / raw) To: linux-arm-kernel On 13 December 2017 at 10:26, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, Ard, > > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> Hi Ard, Akashi >> > (snip) > >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> identify its own usable memory and exclude, at its boot time, any >> other memory areas that are part of the panicked kernel's memory. >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> , for details) > > Right. > >> 1). Now when 'kexec -p' is executed, this node is patched up only >> with the crashkernel memory range: >> >> /* add linux,usable-memory-range */ >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> result = fdt_setprop_range(new_buf, nodeoffset, >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> address_cells, size_cells); >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> , for details) >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> they are marked as System RAM or as RESERVED. As, >> 'linux,usable-memory-range' dt node is patched up only with >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> 3). As a result when the crashkernel boots up it doesn't find this >> ACPI memory and crashes while trying to access the same: >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> -r`.img --reuse-cmdline -d >> >> [snip..] >> >> Reserved memory range >> 000000000e800000-000000002e7fffff (0) >> >> Coredump memory ranges >> 0000000000000000-000000000e7fffff (0) >> 000000002e800000-000000003961ffff (0) >> 0000000039d40000-000000003ed2ffff (0) >> 000000003ed60000-000000003fbfffff (0) >> 0000001040000000-0000001ffbffffff (0) >> 0000002000000000-0000002ffbffffff (0) >> 0000009000000000-0000009ffbffffff (0) >> 000000a000000000-000000affbffffff (0) >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> memory cap'ing passed to the crash kernel inside >> 'arch/arm64/mm/init.c' (see below): >> >> static void __init fdt_enforce_memory_region(void) >> { >> struct memblock_region reg = { >> .size = 0, >> }; >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> if (reg.size) >> //memblock_cap_memory_range(reg.base, reg.size); /* >> comment this out */ >> } > > Please just don't do that. It can cause a fatal damage on > memory contents of the *crashed* kernel. > >> 5). Both the above temporary solutions fix the problem. >> >> 6). However exposing all System RAM regions to the crashkernel is not >> advisable and may cause the crashkernel or some crashkernel drivers to >> fail. >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> kernel code and on the other hand the user-space 'kexec-tools' will >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> dt node 'linux,usable-memory-range' > > I still don't understand why we need to carry over the information > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > such regions are free to be reused by the kernel after some point of > initialization. Why does crash dump kernel need to know about them? > Not really. According to the UEFI spec, they can be reclaimed after the OS has initialized, i.e., when it has consumed the ACPI tables and no longer needs them. Of course, in order to be able to boot a kexec kernel, those regions needs to be preserved, which is why they are memblock_reserve()'d now. So it seems that kexec does not honour the memblock_reserve() table when booting the next kernel. > (In other words, can or should we skip some part of ACPI-related init code > on crash dump kernel?) > I don't think so. And the change to the handling of ACPI reclaim regions only revealed the bug, not created it (given that other memblock_reserve regions may be affected as well) >> 6b). The kernel code currently looks like the following: >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> index 30ad2f085d1f..867bdec7c692 100644 >> --- a/arch/arm64/kernel/setup.c >> +++ b/arch/arm64/kernel/setup.c >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) >> { >> struct memblock_region *region; >> struct resource *res; >> + phys_addr_t addr_start, addr_end; >> >> kernel_code.start = __pa_symbol(_text); >> kernel_code.end = __pa_symbol(__init_begin - 1); >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) >> res->name = "reserved"; >> res->flags = IORESOURCE_MEM; >> } else { >> - res->name = "System RAM"; >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> + addr_start = >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); >> + addr_end = >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { >> + res->name = "ACPI reclaim region"; >> + res->flags = IORESOURCE_MEM; >> + } else { >> + res->name = "System RAM"; >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> + } >> } >> + >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) >> >> request_standard_resources(); >> >> + efi_memmap_unmap(); >> early_ioremap_reset(); >> >> if (acpi_disabled) >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c >> index 80d1a885def5..a7c522eac640 100644 >> --- a/drivers/firmware/efi/arm-init.c >> +++ b/drivers/firmware/efi/arm-init.c >> @@ -259,7 +259,6 @@ void __init efi_init(void) >> >> reserve_regions(); >> efi_esrt_init(); >> - efi_memmap_unmap(); >> >> memblock_reserve(params.mmap & PAGE_MASK, >> PAGE_ALIGN(params.mmap_size + >> >> >> After this change the ACPI reclaim regions are properly recognized in >> '/proc/iomem': >> >> # cat /proc/iomem | grep -i ACPI >> 396c0000-3975ffff : ACPI reclaim region >> 39770000-397affff : ACPI reclaim region >> 398a0000-398bffff : ACPI reclaim region >> >> 6c). I am currently changing the 'kexec-tools' and will finish the >> testing over the next few days. >> >> I just wanted to know your opinion on this issue, so that I will be >> able to propose a fix on the above lines. >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to >> kexec-tools. >> >> Thanks, >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAKv+Gu_BmFN9Zg861SCS+R=V4khFykjuOzkmfEknsL=NvWW3Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-13 10:49 ` Ard Biesheuvel @ 2017-12-13 12:16 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-13 12:16 UTC (permalink / raw) To: Ard Biesheuvel Cc: Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > On 13 December 2017 at 10:26, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > Bhupesh, Ard, > > > > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> Hi Ard, Akashi > >> > > (snip) > > > >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> identify its own usable memory and exclude, at its boot time, any > >> other memory areas that are part of the panicked kernel's memory. > >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> , for details) > > > > Right. > > > >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> with the crashkernel memory range: > >> > >> /* add linux,usable-memory-range */ > >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> result = fdt_setprop_range(new_buf, nodeoffset, > >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> address_cells, size_cells); > >> > >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> , for details) > >> > >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> they are marked as System RAM or as RESERVED. As, > >> 'linux,usable-memory-range' dt node is patched up only with > >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> > >> 3). As a result when the crashkernel boots up it doesn't find this > >> ACPI memory and crashes while trying to access the same: > >> > >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> -r`.img --reuse-cmdline -d > >> > >> [snip..] > >> > >> Reserved memory range > >> 000000000e800000-000000002e7fffff (0) > >> > >> Coredump memory ranges > >> 0000000000000000-000000000e7fffff (0) > >> 000000002e800000-000000003961ffff (0) > >> 0000000039d40000-000000003ed2ffff (0) > >> 000000003ed60000-000000003fbfffff (0) > >> 0000001040000000-0000001ffbffffff (0) > >> 0000002000000000-0000002ffbffffff (0) > >> 0000009000000000-0000009ffbffffff (0) > >> 000000a000000000-000000affbffffff (0) > >> > >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> memory cap'ing passed to the crash kernel inside > >> 'arch/arm64/mm/init.c' (see below): > >> > >> static void __init fdt_enforce_memory_region(void) > >> { > >> struct memblock_region reg = { > >> .size = 0, > >> }; > >> > >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > >> if (reg.size) > >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> comment this out */ > >> } > > > > Please just don't do that. It can cause a fatal damage on > > memory contents of the *crashed* kernel. > > > >> 5). Both the above temporary solutions fix the problem. > >> > >> 6). However exposing all System RAM regions to the crashkernel is not > >> advisable and may cause the crashkernel or some crashkernel drivers to > >> fail. > >> > >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> kernel code and on the other hand the user-space 'kexec-tools' will > >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> dt node 'linux,usable-memory-range' > > > > I still don't understand why we need to carry over the information > > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > such regions are free to be reused by the kernel after some point of > > initialization. Why does crash dump kernel need to know about them? > > > > Not really. According to the UEFI spec, they can be reclaimed after > the OS has initialized, i.e., when it has consumed the ACPI tables and > no longer needs them. Of course, in order to be able to boot a kexec > kernel, those regions needs to be preserved, which is why they are > memblock_reserve()'d now. For my better understandings, who is actually accessing such regions during boot time, uefi itself or efistub? > So it seems that kexec does not honour the memblock_reserve() table > when booting the next kernel. not really. > > (In other words, can or should we skip some part of ACPI-related init code > > on crash dump kernel?) > > > > I don't think so. And the change to the handling of ACPI reclaim > regions only revealed the bug, not created it (given that other > memblock_reserve regions may be affected as well) As whether we should honor such reserved regions over kexec'ing depends on each one's specific nature, we will have to take care one-by-one. As a matter of fact, no information about "reserved" memblocks is exposed to user space (via proc/iomem). -Takahiro AKASHI > > >> 6b). The kernel code currently looks like the following: > >> > >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > >> index 30ad2f085d1f..867bdec7c692 100644 > >> --- a/arch/arm64/kernel/setup.c > >> +++ b/arch/arm64/kernel/setup.c > >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) > >> { > >> struct memblock_region *region; > >> struct resource *res; > >> + phys_addr_t addr_start, addr_end; > >> > >> kernel_code.start = __pa_symbol(_text); > >> kernel_code.end = __pa_symbol(__init_begin - 1); > >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) > >> res->name = "reserved"; > >> res->flags = IORESOURCE_MEM; > >> } else { > >> - res->name = "System RAM"; > >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> + addr_start = > >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); > >> + addr_end = > >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; > >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) > >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { > >> + res->name = "ACPI reclaim region"; > >> + res->flags = IORESOURCE_MEM; > >> + } else { > >> + res->name = "System RAM"; > >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> + } > >> } > >> + > >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); > >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; > >> > >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) > >> > >> request_standard_resources(); > >> > >> + efi_memmap_unmap(); > >> early_ioremap_reset(); > >> > >> if (acpi_disabled) > >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > >> index 80d1a885def5..a7c522eac640 100644 > >> --- a/drivers/firmware/efi/arm-init.c > >> +++ b/drivers/firmware/efi/arm-init.c > >> @@ -259,7 +259,6 @@ void __init efi_init(void) > >> > >> reserve_regions(); > >> efi_esrt_init(); > >> - efi_memmap_unmap(); > >> > >> memblock_reserve(params.mmap & PAGE_MASK, > >> PAGE_ALIGN(params.mmap_size + > >> > >> > >> After this change the ACPI reclaim regions are properly recognized in > >> '/proc/iomem': > >> > >> # cat /proc/iomem | grep -i ACPI > >> 396c0000-3975ffff : ACPI reclaim region > >> 39770000-397affff : ACPI reclaim region > >> 398a0000-398bffff : ACPI reclaim region > >> > >> 6c). I am currently changing the 'kexec-tools' and will finish the > >> testing over the next few days. > >> > >> I just wanted to know your opinion on this issue, so that I will be > >> able to propose a fix on the above lines. > >> > >> Also Cc'ing kexec mailing list for more inputs on changes proposed to > >> kexec-tools. > >> > >> Thanks, > >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-13 12:16 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-13 12:16 UTC (permalink / raw) To: linux-arm-kernel On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > On 13 December 2017 at 10:26, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > Bhupesh, Ard, > > > > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> Hi Ard, Akashi > >> > > (snip) > > > >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> identify its own usable memory and exclude, at its boot time, any > >> other memory areas that are part of the panicked kernel's memory. > >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> , for details) > > > > Right. > > > >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> with the crashkernel memory range: > >> > >> /* add linux,usable-memory-range */ > >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> result = fdt_setprop_range(new_buf, nodeoffset, > >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> address_cells, size_cells); > >> > >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> , for details) > >> > >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> they are marked as System RAM or as RESERVED. As, > >> 'linux,usable-memory-range' dt node is patched up only with > >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> > >> 3). As a result when the crashkernel boots up it doesn't find this > >> ACPI memory and crashes while trying to access the same: > >> > >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> -r`.img --reuse-cmdline -d > >> > >> [snip..] > >> > >> Reserved memory range > >> 000000000e800000-000000002e7fffff (0) > >> > >> Coredump memory ranges > >> 0000000000000000-000000000e7fffff (0) > >> 000000002e800000-000000003961ffff (0) > >> 0000000039d40000-000000003ed2ffff (0) > >> 000000003ed60000-000000003fbfffff (0) > >> 0000001040000000-0000001ffbffffff (0) > >> 0000002000000000-0000002ffbffffff (0) > >> 0000009000000000-0000009ffbffffff (0) > >> 000000a000000000-000000affbffffff (0) > >> > >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> memory cap'ing passed to the crash kernel inside > >> 'arch/arm64/mm/init.c' (see below): > >> > >> static void __init fdt_enforce_memory_region(void) > >> { > >> struct memblock_region reg = { > >> .size = 0, > >> }; > >> > >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > >> if (reg.size) > >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> comment this out */ > >> } > > > > Please just don't do that. It can cause a fatal damage on > > memory contents of the *crashed* kernel. > > > >> 5). Both the above temporary solutions fix the problem. > >> > >> 6). However exposing all System RAM regions to the crashkernel is not > >> advisable and may cause the crashkernel or some crashkernel drivers to > >> fail. > >> > >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> kernel code and on the other hand the user-space 'kexec-tools' will > >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> dt node 'linux,usable-memory-range' > > > > I still don't understand why we need to carry over the information > > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > such regions are free to be reused by the kernel after some point of > > initialization. Why does crash dump kernel need to know about them? > > > > Not really. According to the UEFI spec, they can be reclaimed after > the OS has initialized, i.e., when it has consumed the ACPI tables and > no longer needs them. Of course, in order to be able to boot a kexec > kernel, those regions needs to be preserved, which is why they are > memblock_reserve()'d now. For my better understandings, who is actually accessing such regions during boot time, uefi itself or efistub? > So it seems that kexec does not honour the memblock_reserve() table > when booting the next kernel. not really. > > (In other words, can or should we skip some part of ACPI-related init code > > on crash dump kernel?) > > > > I don't think so. And the change to the handling of ACPI reclaim > regions only revealed the bug, not created it (given that other > memblock_reserve regions may be affected as well) As whether we should honor such reserved regions over kexec'ing depends on each one's specific nature, we will have to take care one-by-one. As a matter of fact, no information about "reserved" memblocks is exposed to user space (via proc/iomem). -Takahiro AKASHI > > >> 6b). The kernel code currently looks like the following: > >> > >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > >> index 30ad2f085d1f..867bdec7c692 100644 > >> --- a/arch/arm64/kernel/setup.c > >> +++ b/arch/arm64/kernel/setup.c > >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) > >> { > >> struct memblock_region *region; > >> struct resource *res; > >> + phys_addr_t addr_start, addr_end; > >> > >> kernel_code.start = __pa_symbol(_text); > >> kernel_code.end = __pa_symbol(__init_begin - 1); > >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) > >> res->name = "reserved"; > >> res->flags = IORESOURCE_MEM; > >> } else { > >> - res->name = "System RAM"; > >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> + addr_start = > >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); > >> + addr_end = > >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; > >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) > >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { > >> + res->name = "ACPI reclaim region"; > >> + res->flags = IORESOURCE_MEM; > >> + } else { > >> + res->name = "System RAM"; > >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> + } > >> } > >> + > >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); > >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; > >> > >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) > >> > >> request_standard_resources(); > >> > >> + efi_memmap_unmap(); > >> early_ioremap_reset(); > >> > >> if (acpi_disabled) > >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > >> index 80d1a885def5..a7c522eac640 100644 > >> --- a/drivers/firmware/efi/arm-init.c > >> +++ b/drivers/firmware/efi/arm-init.c > >> @@ -259,7 +259,6 @@ void __init efi_init(void) > >> > >> reserve_regions(); > >> efi_esrt_init(); > >> - efi_memmap_unmap(); > >> > >> memblock_reserve(params.mmap & PAGE_MASK, > >> PAGE_ALIGN(params.mmap_size + > >> > >> > >> After this change the ACPI reclaim regions are properly recognized in > >> '/proc/iomem': > >> > >> # cat /proc/iomem | grep -i ACPI > >> 396c0000-3975ffff : ACPI reclaim region > >> 39770000-397affff : ACPI reclaim region > >> 398a0000-398bffff : ACPI reclaim region > >> > >> 6c). I am currently changing the 'kexec-tools' and will finish the > >> testing over the next few days. > >> > >> I just wanted to know your opinion on this issue, so that I will be > >> able to propose a fix on the above lines. > >> > >> Also Cc'ing kexec mailing list for more inputs on changes proposed to > >> kexec-tools. > >> > >> Thanks, > >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171213121605.GE28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-13 12:16 ` AKASHI Takahiro @ 2017-12-13 12:17 ` Ard Biesheuvel -1 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-13 12:17 UTC (permalink / raw) To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On 13 December 2017 at 12:16, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> On 13 December 2017 at 10:26, AKASHI Takahiro >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > Bhupesh, Ard, >> > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> Hi Ard, Akashi >> >> >> > (snip) >> > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> identify its own usable memory and exclude, at its boot time, any >> >> other memory areas that are part of the panicked kernel's memory. >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> , for details) >> > >> > Right. >> > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> with the crashkernel memory range: >> >> >> >> /* add linux,usable-memory-range */ >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> address_cells, size_cells); >> >> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> , for details) >> >> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> they are marked as System RAM or as RESERVED. As, >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> ACPI memory and crashes while trying to access the same: >> >> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> -r`.img --reuse-cmdline -d >> >> >> >> [snip..] >> >> >> >> Reserved memory range >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >> Coredump memory ranges >> >> 0000000000000000-000000000e7fffff (0) >> >> 000000002e800000-000000003961ffff (0) >> >> 0000000039d40000-000000003ed2ffff (0) >> >> 000000003ed60000-000000003fbfffff (0) >> >> 0000001040000000-0000001ffbffffff (0) >> >> 0000002000000000-0000002ffbffffff (0) >> >> 0000009000000000-0000009ffbffffff (0) >> >> 000000a000000000-000000affbffffff (0) >> >> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> memory cap'ing passed to the crash kernel inside >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >> static void __init fdt_enforce_memory_region(void) >> >> { >> >> struct memblock_region reg = { >> >> .size = 0, >> >> }; >> >> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >> if (reg.size) >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> comment this out */ >> >> } >> > >> > Please just don't do that. It can cause a fatal damage on >> > memory contents of the *crashed* kernel. >> > >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> fail. >> >> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> dt node 'linux,usable-memory-range' >> > >> > I still don't understand why we need to carry over the information >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > such regions are free to be reused by the kernel after some point of >> > initialization. Why does crash dump kernel need to know about them? >> > >> >> Not really. According to the UEFI spec, they can be reclaimed after >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> no longer needs them. Of course, in order to be able to boot a kexec >> kernel, those regions needs to be preserved, which is why they are >> memblock_reserve()'d now. > > For my better understandings, who is actually accessing such regions > during boot time, uefi itself or efistub? > No, only the kernel. This is where the ACPI tables are stored. For instance, on QEMU we have ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 01000013) ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 BXPC 00000001) ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 BXPC 00000001) ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 BXPC 00000001) ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 BXPC 00000001) ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 BXPC 00000001) ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 BXPC 00000001) ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 BXPC 00000001) covered by efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] ... efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> So it seems that kexec does not honour the memblock_reserve() table >> when booting the next kernel. > > not really. > >> > (In other words, can or should we skip some part of ACPI-related init code >> > on crash dump kernel?) >> > >> >> I don't think so. And the change to the handling of ACPI reclaim >> regions only revealed the bug, not created it (given that other >> memblock_reserve regions may be affected as well) > > As whether we should honor such reserved regions over kexec'ing > depends on each one's specific nature, we will have to take care one-by-one. > As a matter of fact, no information about "reserved" memblocks is > exposed to user space (via proc/iomem). > That is why I suggested (somewhere in this thread?) to not expose them as 'System RAM'. Do you think that could solve this? > >> >> >> 6b). The kernel code currently looks like the following: >> >> >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> >> index 30ad2f085d1f..867bdec7c692 100644 >> >> --- a/arch/arm64/kernel/setup.c >> >> +++ b/arch/arm64/kernel/setup.c >> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) >> >> { >> >> struct memblock_region *region; >> >> struct resource *res; >> >> + phys_addr_t addr_start, addr_end; >> >> >> >> kernel_code.start = __pa_symbol(_text); >> >> kernel_code.end = __pa_symbol(__init_begin - 1); >> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) >> >> res->name = "reserved"; >> >> res->flags = IORESOURCE_MEM; >> >> } else { >> >> - res->name = "System RAM"; >> >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> >> + addr_start = >> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); >> >> + addr_end = >> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; >> >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) >> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { >> >> + res->name = "ACPI reclaim region"; >> >> + res->flags = IORESOURCE_MEM; >> >> + } else { >> >> + res->name = "System RAM"; >> >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> >> + } >> >> } >> >> + >> >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); >> >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; >> >> >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) >> >> >> >> request_standard_resources(); >> >> >> >> + efi_memmap_unmap(); >> >> early_ioremap_reset(); >> >> >> >> if (acpi_disabled) >> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c >> >> index 80d1a885def5..a7c522eac640 100644 >> >> --- a/drivers/firmware/efi/arm-init.c >> >> +++ b/drivers/firmware/efi/arm-init.c >> >> @@ -259,7 +259,6 @@ void __init efi_init(void) >> >> >> >> reserve_regions(); >> >> efi_esrt_init(); >> >> - efi_memmap_unmap(); >> >> >> >> memblock_reserve(params.mmap & PAGE_MASK, >> >> PAGE_ALIGN(params.mmap_size + >> >> >> >> >> >> After this change the ACPI reclaim regions are properly recognized in >> >> '/proc/iomem': >> >> >> >> # cat /proc/iomem | grep -i ACPI >> >> 396c0000-3975ffff : ACPI reclaim region >> >> 39770000-397affff : ACPI reclaim region >> >> 398a0000-398bffff : ACPI reclaim region >> >> >> >> 6c). I am currently changing the 'kexec-tools' and will finish the >> >> testing over the next few days. >> >> >> >> I just wanted to know your opinion on this issue, so that I will be >> >> able to propose a fix on the above lines. >> >> >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to >> >> kexec-tools. >> >> >> >> Thanks, >> >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-13 12:17 ` Ard Biesheuvel 0 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-13 12:17 UTC (permalink / raw) To: linux-arm-kernel On 13 December 2017 at 12:16, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> On 13 December 2017 at 10:26, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > Bhupesh, Ard, >> > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> Hi Ard, Akashi >> >> >> > (snip) >> > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> identify its own usable memory and exclude, at its boot time, any >> >> other memory areas that are part of the panicked kernel's memory. >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> , for details) >> > >> > Right. >> > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> with the crashkernel memory range: >> >> >> >> /* add linux,usable-memory-range */ >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> address_cells, size_cells); >> >> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> , for details) >> >> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> they are marked as System RAM or as RESERVED. As, >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> ACPI memory and crashes while trying to access the same: >> >> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> -r`.img --reuse-cmdline -d >> >> >> >> [snip..] >> >> >> >> Reserved memory range >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >> Coredump memory ranges >> >> 0000000000000000-000000000e7fffff (0) >> >> 000000002e800000-000000003961ffff (0) >> >> 0000000039d40000-000000003ed2ffff (0) >> >> 000000003ed60000-000000003fbfffff (0) >> >> 0000001040000000-0000001ffbffffff (0) >> >> 0000002000000000-0000002ffbffffff (0) >> >> 0000009000000000-0000009ffbffffff (0) >> >> 000000a000000000-000000affbffffff (0) >> >> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> memory cap'ing passed to the crash kernel inside >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >> static void __init fdt_enforce_memory_region(void) >> >> { >> >> struct memblock_region reg = { >> >> .size = 0, >> >> }; >> >> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >> if (reg.size) >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> comment this out */ >> >> } >> > >> > Please just don't do that. It can cause a fatal damage on >> > memory contents of the *crashed* kernel. >> > >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> fail. >> >> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> dt node 'linux,usable-memory-range' >> > >> > I still don't understand why we need to carry over the information >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > such regions are free to be reused by the kernel after some point of >> > initialization. Why does crash dump kernel need to know about them? >> > >> >> Not really. According to the UEFI spec, they can be reclaimed after >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> no longer needs them. Of course, in order to be able to boot a kexec >> kernel, those regions needs to be preserved, which is why they are >> memblock_reserve()'d now. > > For my better understandings, who is actually accessing such regions > during boot time, uefi itself or efistub? > No, only the kernel. This is where the ACPI tables are stored. For instance, on QEMU we have ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 01000013) ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 BXPC 00000001) ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 BXPC 00000001) ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 BXPC 00000001) ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 BXPC 00000001) ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 BXPC 00000001) ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 BXPC 00000001) ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 BXPC 00000001) covered by efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] ... efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> So it seems that kexec does not honour the memblock_reserve() table >> when booting the next kernel. > > not really. > >> > (In other words, can or should we skip some part of ACPI-related init code >> > on crash dump kernel?) >> > >> >> I don't think so. And the change to the handling of ACPI reclaim >> regions only revealed the bug, not created it (given that other >> memblock_reserve regions may be affected as well) > > As whether we should honor such reserved regions over kexec'ing > depends on each one's specific nature, we will have to take care one-by-one. > As a matter of fact, no information about "reserved" memblocks is > exposed to user space (via proc/iomem). > That is why I suggested (somewhere in this thread?) to not expose them as 'System RAM'. Do you think that could solve this? > >> >> >> 6b). The kernel code currently looks like the following: >> >> >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >> >> index 30ad2f085d1f..867bdec7c692 100644 >> >> --- a/arch/arm64/kernel/setup.c >> >> +++ b/arch/arm64/kernel/setup.c >> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) >> >> { >> >> struct memblock_region *region; >> >> struct resource *res; >> >> + phys_addr_t addr_start, addr_end; >> >> >> >> kernel_code.start = __pa_symbol(_text); >> >> kernel_code.end = __pa_symbol(__init_begin - 1); >> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) >> >> res->name = "reserved"; >> >> res->flags = IORESOURCE_MEM; >> >> } else { >> >> - res->name = "System RAM"; >> >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> >> + addr_start = >> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); >> >> + addr_end = >> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; >> >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) >> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { >> >> + res->name = "ACPI reclaim region"; >> >> + res->flags = IORESOURCE_MEM; >> >> + } else { >> >> + res->name = "System RAM"; >> >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> >> + } >> >> } >> >> + >> >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); >> >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; >> >> >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) >> >> >> >> request_standard_resources(); >> >> >> >> + efi_memmap_unmap(); >> >> early_ioremap_reset(); >> >> >> >> if (acpi_disabled) >> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c >> >> index 80d1a885def5..a7c522eac640 100644 >> >> --- a/drivers/firmware/efi/arm-init.c >> >> +++ b/drivers/firmware/efi/arm-init.c >> >> @@ -259,7 +259,6 @@ void __init efi_init(void) >> >> >> >> reserve_regions(); >> >> efi_esrt_init(); >> >> - efi_memmap_unmap(); >> >> >> >> memblock_reserve(params.mmap & PAGE_MASK, >> >> PAGE_ALIGN(params.mmap_size + >> >> >> >> >> >> After this change the ACPI reclaim regions are properly recognized in >> >> '/proc/iomem': >> >> >> >> # cat /proc/iomem | grep -i ACPI >> >> 396c0000-3975ffff : ACPI reclaim region >> >> 39770000-397affff : ACPI reclaim region >> >> 398a0000-398bffff : ACPI reclaim region >> >> >> >> 6c). I am currently changing the 'kexec-tools' and will finish the >> >> testing over the next few days. >> >> >> >> I just wanted to know your opinion on this issue, so that I will be >> >> able to propose a fix on the above lines. >> >> >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to >> >> kexec-tools. >> >> >> >> Thanks, >> >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAKv+Gu_G8kBEAdAznVauZVAdJOFkr1vmu0Gf6tOwJfH2CgdufA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-13 12:17 ` Ard Biesheuvel @ 2017-12-13 19:22 ` Bhupesh SHARMA -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-12-13 19:22 UTC (permalink / raw) To: Ard Biesheuvel Cc: AKASHI Takahiro, Bhupesh Sharma, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A Hi Ard, Akashi, On Wed, Dec 13, 2017 at 5:47 PM, Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On 13 December 2017 at 12:16, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >>> On 13 December 2017 at 10:26, AKASHI Takahiro >>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >>> > Bhupesh, Ard, >>> > >>> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >>> >> Hi Ard, Akashi >>> >> >>> > (snip) >>> > >>> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >>> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >>> >> identify its own usable memory and exclude, at its boot time, any >>> >> other memory areas that are part of the panicked kernel's memory. >>> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >>> >> , for details) >>> > >>> > Right. >>> > >>> >> 1). Now when 'kexec -p' is executed, this node is patched up only >>> >> with the crashkernel memory range: >>> >> >>> >> /* add linux,usable-memory-range */ >>> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >>> >> result = fdt_setprop_range(new_buf, nodeoffset, >>> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >>> >> address_cells, size_cells); >>> >> >>> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >>> >> , for details) >>> >> >>> >> 2). This excludes the ACPI reclaim regions irrespective of whether >>> >> they are marked as System RAM or as RESERVED. As, >>> >> 'linux,usable-memory-range' dt node is patched up only with >>> >> 'crash_reserved_mem' and not 'system_memory_ranges' >>> >> >>> >> 3). As a result when the crashkernel boots up it doesn't find this >>> >> ACPI memory and crashes while trying to access the same: >>> >> >>> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >>> >> -r`.img --reuse-cmdline -d >>> >> >>> >> [snip..] >>> >> >>> >> Reserved memory range >>> >> 000000000e800000-000000002e7fffff (0) >>> >> >>> >> Coredump memory ranges >>> >> 0000000000000000-000000000e7fffff (0) >>> >> 000000002e800000-000000003961ffff (0) >>> >> 0000000039d40000-000000003ed2ffff (0) >>> >> 000000003ed60000-000000003fbfffff (0) >>> >> 0000001040000000-0000001ffbffffff (0) >>> >> 0000002000000000-0000002ffbffffff (0) >>> >> 0000009000000000-0000009ffbffffff (0) >>> >> 000000a000000000-000000affbffffff (0) >>> >> >>> >> 4). So if we revert Ard's patch or just comment the fixing up of the >>> >> memory cap'ing passed to the crash kernel inside >>> >> 'arch/arm64/mm/init.c' (see below): >>> >> >>> >> static void __init fdt_enforce_memory_region(void) >>> >> { >>> >> struct memblock_region reg = { >>> >> .size = 0, >>> >> }; >>> >> >>> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >>> >> >>> >> if (reg.size) >>> >> //memblock_cap_memory_range(reg.base, reg.size); /* >>> >> comment this out */ >>> >> } >>> > >>> > Please just don't do that. It can cause a fatal damage on >>> > memory contents of the *crashed* kernel. >>> > >>> >> 5). Both the above temporary solutions fix the problem. >>> >> >>> >> 6). However exposing all System RAM regions to the crashkernel is not >>> >> advisable and may cause the crashkernel or some crashkernel drivers to >>> >> fail. >>> >> >>> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >>> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >>> >> kernel code and on the other hand the user-space 'kexec-tools' will >>> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >>> >> dt node 'linux,usable-memory-range' >>> > >>> > I still don't understand why we need to carry over the information >>> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >>> > such regions are free to be reused by the kernel after some point of >>> > initialization. Why does crash dump kernel need to know about them? >>> > >>> >>> Not really. According to the UEFI spec, they can be reclaimed after >>> the OS has initialized, i.e., when it has consumed the ACPI tables and >>> no longer needs them. Of course, in order to be able to boot a kexec >>> kernel, those regions needs to be preserved, which is why they are >>> memblock_reserve()'d now. >> >> For my better understandings, who is actually accessing such regions >> during boot time, uefi itself or efistub? >> > > No, only the kernel. This is where the ACPI tables are stored. For > instance, on QEMU we have > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > 01000013) > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > BXPC 00000001) > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > BXPC 00000001) > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > BXPC 00000001) > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > BXPC 00000001) > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > BXPC 00000001) > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > BXPC 00000001) > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > BXPC 00000001) > > covered by > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > ... > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > >>> So it seems that kexec does not honour the memblock_reserve() table >>> when booting the next kernel. >> >> not really. >> >>> > (In other words, can or should we skip some part of ACPI-related init code >>> > on crash dump kernel?) >>> > >>> >>> I don't think so. And the change to the handling of ACPI reclaim >>> regions only revealed the bug, not created it (given that other >>> memblock_reserve regions may be affected as well) >> >> As whether we should honor such reserved regions over kexec'ing >> depends on each one's specific nature, we will have to take care one-by-one. >> As a matter of fact, no information about "reserved" memblocks is >> exposed to user space (via proc/iomem). >> > > That is why I suggested (somewhere in this thread?) to not expose them > as 'System RAM'. Do you think that could solve this? I agree. So how about my proposal (please see my last reply) - to expose these regions as "ACPI reclaim regions" in /proc/iomem. Please note that we already have several instances where the driver regions are already explicitly labelled by different concise names across /proc/iomem, for e.g.: # cat /proc/iomem | grep -i serial 1c021000-1c02101f : serial If we expose only the ACPI reclaim regions to the crashkernel (along with the normal crash kernel memory range), we avoid exposing all System RAM or reserved regions to the crashkernel which may cause issues with crashkernel boot or crash coredump save operations. And we can also accordingly modify the 'kexec-tools' to pick these regions along with the normal crash kernel memory range and append them to the 'linux,usable-memory-range' dt node, so that the crash kernel can operate on them. If you think this ok, I can try to send a RFC patch later this week. Please let me know. Regards, Bhupesh >>> >>> >> 6b). The kernel code currently looks like the following: >>> >> >>> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >>> >> index 30ad2f085d1f..867bdec7c692 100644 >>> >> --- a/arch/arm64/kernel/setup.c >>> >> +++ b/arch/arm64/kernel/setup.c >>> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) >>> >> { >>> >> struct memblock_region *region; >>> >> struct resource *res; >>> >> + phys_addr_t addr_start, addr_end; >>> >> >>> >> kernel_code.start = __pa_symbol(_text); >>> >> kernel_code.end = __pa_symbol(__init_begin - 1); >>> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) >>> >> res->name = "reserved"; >>> >> res->flags = IORESOURCE_MEM; >>> >> } else { >>> >> - res->name = "System RAM"; >>> >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >>> >> + addr_start = >>> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); >>> >> + addr_end = >>> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; >>> >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) >>> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { >>> >> + res->name = "ACPI reclaim region"; >>> >> + res->flags = IORESOURCE_MEM; >>> >> + } else { >>> >> + res->name = "System RAM"; >>> >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >>> >> + } >>> >> } >>> >> + >>> >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); >>> >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; >>> >> >>> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) >>> >> >>> >> request_standard_resources(); >>> >> >>> >> + efi_memmap_unmap(); >>> >> early_ioremap_reset(); >>> >> >>> >> if (acpi_disabled) >>> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c >>> >> index 80d1a885def5..a7c522eac640 100644 >>> >> --- a/drivers/firmware/efi/arm-init.c >>> >> +++ b/drivers/firmware/efi/arm-init.c >>> >> @@ -259,7 +259,6 @@ void __init efi_init(void) >>> >> >>> >> reserve_regions(); >>> >> efi_esrt_init(); >>> >> - efi_memmap_unmap(); >>> >> >>> >> memblock_reserve(params.mmap & PAGE_MASK, >>> >> PAGE_ALIGN(params.mmap_size + >>> >> >>> >> >>> >> After this change the ACPI reclaim regions are properly recognized in >>> >> '/proc/iomem': >>> >> >>> >> # cat /proc/iomem | grep -i ACPI >>> >> 396c0000-3975ffff : ACPI reclaim region >>> >> 39770000-397affff : ACPI reclaim region >>> >> 398a0000-398bffff : ACPI reclaim region >>> >> >>> >> 6c). I am currently changing the 'kexec-tools' and will finish the >>> >> testing over the next few days. >>> >> >>> >> I just wanted to know your opinion on this issue, so that I will be >>> >> able to propose a fix on the above lines. >>> >> >>> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to >>> >> kexec-tools. >>> >> >>> >> Thanks, >>> >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-13 19:22 ` Bhupesh SHARMA 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-12-13 19:22 UTC (permalink / raw) To: linux-arm-kernel Hi Ard, Akashi, On Wed, Dec 13, 2017 at 5:47 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 13 December 2017 at 12:16, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >>> On 13 December 2017 at 10:26, AKASHI Takahiro >>> <takahiro.akashi@linaro.org> wrote: >>> > Bhupesh, Ard, >>> > >>> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >>> >> Hi Ard, Akashi >>> >> >>> > (snip) >>> > >>> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >>> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >>> >> identify its own usable memory and exclude, at its boot time, any >>> >> other memory areas that are part of the panicked kernel's memory. >>> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >>> >> , for details) >>> > >>> > Right. >>> > >>> >> 1). Now when 'kexec -p' is executed, this node is patched up only >>> >> with the crashkernel memory range: >>> >> >>> >> /* add linux,usable-memory-range */ >>> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >>> >> result = fdt_setprop_range(new_buf, nodeoffset, >>> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >>> >> address_cells, size_cells); >>> >> >>> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >>> >> , for details) >>> >> >>> >> 2). This excludes the ACPI reclaim regions irrespective of whether >>> >> they are marked as System RAM or as RESERVED. As, >>> >> 'linux,usable-memory-range' dt node is patched up only with >>> >> 'crash_reserved_mem' and not 'system_memory_ranges' >>> >> >>> >> 3). As a result when the crashkernel boots up it doesn't find this >>> >> ACPI memory and crashes while trying to access the same: >>> >> >>> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >>> >> -r`.img --reuse-cmdline -d >>> >> >>> >> [snip..] >>> >> >>> >> Reserved memory range >>> >> 000000000e800000-000000002e7fffff (0) >>> >> >>> >> Coredump memory ranges >>> >> 0000000000000000-000000000e7fffff (0) >>> >> 000000002e800000-000000003961ffff (0) >>> >> 0000000039d40000-000000003ed2ffff (0) >>> >> 000000003ed60000-000000003fbfffff (0) >>> >> 0000001040000000-0000001ffbffffff (0) >>> >> 0000002000000000-0000002ffbffffff (0) >>> >> 0000009000000000-0000009ffbffffff (0) >>> >> 000000a000000000-000000affbffffff (0) >>> >> >>> >> 4). So if we revert Ard's patch or just comment the fixing up of the >>> >> memory cap'ing passed to the crash kernel inside >>> >> 'arch/arm64/mm/init.c' (see below): >>> >> >>> >> static void __init fdt_enforce_memory_region(void) >>> >> { >>> >> struct memblock_region reg = { >>> >> .size = 0, >>> >> }; >>> >> >>> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >>> >> >>> >> if (reg.size) >>> >> //memblock_cap_memory_range(reg.base, reg.size); /* >>> >> comment this out */ >>> >> } >>> > >>> > Please just don't do that. It can cause a fatal damage on >>> > memory contents of the *crashed* kernel. >>> > >>> >> 5). Both the above temporary solutions fix the problem. >>> >> >>> >> 6). However exposing all System RAM regions to the crashkernel is not >>> >> advisable and may cause the crashkernel or some crashkernel drivers to >>> >> fail. >>> >> >>> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >>> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >>> >> kernel code and on the other hand the user-space 'kexec-tools' will >>> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >>> >> dt node 'linux,usable-memory-range' >>> > >>> > I still don't understand why we need to carry over the information >>> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >>> > such regions are free to be reused by the kernel after some point of >>> > initialization. Why does crash dump kernel need to know about them? >>> > >>> >>> Not really. According to the UEFI spec, they can be reclaimed after >>> the OS has initialized, i.e., when it has consumed the ACPI tables and >>> no longer needs them. Of course, in order to be able to boot a kexec >>> kernel, those regions needs to be preserved, which is why they are >>> memblock_reserve()'d now. >> >> For my better understandings, who is actually accessing such regions >> during boot time, uefi itself or efistub? >> > > No, only the kernel. This is where the ACPI tables are stored. For > instance, on QEMU we have > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > 01000013) > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > BXPC 00000001) > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > BXPC 00000001) > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > BXPC 00000001) > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > BXPC 00000001) > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > BXPC 00000001) > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > BXPC 00000001) > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > BXPC 00000001) > > covered by > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > ... > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > >>> So it seems that kexec does not honour the memblock_reserve() table >>> when booting the next kernel. >> >> not really. >> >>> > (In other words, can or should we skip some part of ACPI-related init code >>> > on crash dump kernel?) >>> > >>> >>> I don't think so. And the change to the handling of ACPI reclaim >>> regions only revealed the bug, not created it (given that other >>> memblock_reserve regions may be affected as well) >> >> As whether we should honor such reserved regions over kexec'ing >> depends on each one's specific nature, we will have to take care one-by-one. >> As a matter of fact, no information about "reserved" memblocks is >> exposed to user space (via proc/iomem). >> > > That is why I suggested (somewhere in this thread?) to not expose them > as 'System RAM'. Do you think that could solve this? I agree. So how about my proposal (please see my last reply) - to expose these regions as "ACPI reclaim regions" in /proc/iomem. Please note that we already have several instances where the driver regions are already explicitly labelled by different concise names across /proc/iomem, for e.g.: # cat /proc/iomem | grep -i serial 1c021000-1c02101f : serial If we expose only the ACPI reclaim regions to the crashkernel (along with the normal crash kernel memory range), we avoid exposing all System RAM or reserved regions to the crashkernel which may cause issues with crashkernel boot or crash coredump save operations. And we can also accordingly modify the 'kexec-tools' to pick these regions along with the normal crash kernel memory range and append them to the 'linux,usable-memory-range' dt node, so that the crash kernel can operate on them. If you think this ok, I can try to send a RFC patch later this week. Please let me know. Regards, Bhupesh >>> >>> >> 6b). The kernel code currently looks like the following: >>> >> >>> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >>> >> index 30ad2f085d1f..867bdec7c692 100644 >>> >> --- a/arch/arm64/kernel/setup.c >>> >> +++ b/arch/arm64/kernel/setup.c >>> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) >>> >> { >>> >> struct memblock_region *region; >>> >> struct resource *res; >>> >> + phys_addr_t addr_start, addr_end; >>> >> >>> >> kernel_code.start = __pa_symbol(_text); >>> >> kernel_code.end = __pa_symbol(__init_begin - 1); >>> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) >>> >> res->name = "reserved"; >>> >> res->flags = IORESOURCE_MEM; >>> >> } else { >>> >> - res->name = "System RAM"; >>> >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >>> >> + addr_start = >>> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); >>> >> + addr_end = >>> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; >>> >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) >>> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { >>> >> + res->name = "ACPI reclaim region"; >>> >> + res->flags = IORESOURCE_MEM; >>> >> + } else { >>> >> + res->name = "System RAM"; >>> >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >>> >> + } >>> >> } >>> >> + >>> >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); >>> >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; >>> >> >>> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) >>> >> >>> >> request_standard_resources(); >>> >> >>> >> + efi_memmap_unmap(); >>> >> early_ioremap_reset(); >>> >> >>> >> if (acpi_disabled) >>> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c >>> >> index 80d1a885def5..a7c522eac640 100644 >>> >> --- a/drivers/firmware/efi/arm-init.c >>> >> +++ b/drivers/firmware/efi/arm-init.c >>> >> @@ -259,7 +259,6 @@ void __init efi_init(void) >>> >> >>> >> reserve_regions(); >>> >> efi_esrt_init(); >>> >> - efi_memmap_unmap(); >>> >> >>> >> memblock_reserve(params.mmap & PAGE_MASK, >>> >> PAGE_ALIGN(params.mmap_size + >>> >> >>> >> >>> >> After this change the ACPI reclaim regions are properly recognized in >>> >> '/proc/iomem': >>> >> >>> >> # cat /proc/iomem | grep -i ACPI >>> >> 396c0000-3975ffff : ACPI reclaim region >>> >> 39770000-397affff : ACPI reclaim region >>> >> 398a0000-398bffff : ACPI reclaim region >>> >> >>> >> 6c). I am currently changing the 'kexec-tools' and will finish the >>> >> testing over the next few days. >>> >> >>> >> I just wanted to know your opinion on this issue, so that I will be >>> >> able to propose a fix on the above lines. >>> >> >>> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to >>> >> kexec-tools. >>> >> >>> >> Thanks, >>> >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-13 12:17 ` Ard Biesheuvel @ 2017-12-15 8:59 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-15 8:59 UTC (permalink / raw) To: Ard Biesheuvel Cc: Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > On 13 December 2017 at 12:16, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > Bhupesh, Ard, > >> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> Hi Ard, Akashi > >> >> > >> > (snip) > >> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> , for details) > >> > > >> > Right. > >> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> with the crashkernel memory range: > >> >> > >> >> /* add linux,usable-memory-range */ > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> address_cells, size_cells); > >> >> > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> , for details) > >> >> > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> they are marked as System RAM or as RESERVED. As, > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> ACPI memory and crashes while trying to access the same: > >> >> > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> -r`.img --reuse-cmdline -d > >> >> > >> >> [snip..] > >> >> > >> >> Reserved memory range > >> >> 000000000e800000-000000002e7fffff (0) > >> >> > >> >> Coredump memory ranges > >> >> 0000000000000000-000000000e7fffff (0) > >> >> 000000002e800000-000000003961ffff (0) > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> 000000a000000000-000000affbffffff (0) > >> >> > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> memory cap'ing passed to the crash kernel inside > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> > >> >> static void __init fdt_enforce_memory_region(void) > >> >> { > >> >> struct memblock_region reg = { > >> >> .size = 0, > >> >> }; > >> >> > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> > >> >> if (reg.size) > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> comment this out */ > >> >> } > >> > > >> > Please just don't do that. It can cause a fatal damage on > >> > memory contents of the *crashed* kernel. > >> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> fail. > >> >> > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> dt node 'linux,usable-memory-range' > >> > > >> > I still don't understand why we need to carry over the information > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> > such regions are free to be reused by the kernel after some point of > >> > initialization. Why does crash dump kernel need to know about them? > >> > > >> > >> Not really. According to the UEFI spec, they can be reclaimed after > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> no longer needs them. Of course, in order to be able to boot a kexec > >> kernel, those regions needs to be preserved, which is why they are > >> memblock_reserve()'d now. > > > > For my better understandings, who is actually accessing such regions > > during boot time, uefi itself or efistub? > > > > No, only the kernel. This is where the ACPI tables are stored. For > instance, on QEMU we have > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > 01000013) > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > BXPC 00000001) > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > BXPC 00000001) > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > BXPC 00000001) > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > BXPC 00000001) > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > BXPC 00000001) > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > BXPC 00000001) > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > BXPC 00000001) > > covered by > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > ... > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] OK. I mistakenly understood those regions could be freed after exiting UEFI boot services. > > >> So it seems that kexec does not honour the memblock_reserve() table > >> when booting the next kernel. > > > > not really. > > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> > on crash dump kernel?) > >> > > >> > >> I don't think so. And the change to the handling of ACPI reclaim > >> regions only revealed the bug, not created it (given that other > >> memblock_reserve regions may be affected as well) > > > > As whether we should honor such reserved regions over kexec'ing > > depends on each one's specific nature, we will have to take care one-by-one. > > As a matter of fact, no information about "reserved" memblocks is > > exposed to user space (via proc/iomem). > > > > That is why I suggested (somewhere in this thread?) to not expose them > as 'System RAM'. Do you think that could solve this? Memblock-reserv'ing them is necessary to prevent their corruption and marking them under another name in /proc/iomem would also be good in order not to allocate them as part of crash kernel's memory. But I'm not still convinced that we should export them in useable- memory-range to crash dump kernel. They will be accessed through acpi_os_map_memory() and so won't be required to be part of system ram (or memblocks), I guess. -> Bhupesh? Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel via a kernel command line parameter, "memmap=". Thanks, -Takahiro AKASHI > > > >> > >> >> 6b). The kernel code currently looks like the following: > >> >> > >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > >> >> index 30ad2f085d1f..867bdec7c692 100644 > >> >> --- a/arch/arm64/kernel/setup.c > >> >> +++ b/arch/arm64/kernel/setup.c > >> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) > >> >> { > >> >> struct memblock_region *region; > >> >> struct resource *res; > >> >> + phys_addr_t addr_start, addr_end; > >> >> > >> >> kernel_code.start = __pa_symbol(_text); > >> >> kernel_code.end = __pa_symbol(__init_begin - 1); > >> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) > >> >> res->name = "reserved"; > >> >> res->flags = IORESOURCE_MEM; > >> >> } else { > >> >> - res->name = "System RAM"; > >> >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> >> + addr_start = > >> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); > >> >> + addr_end = > >> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; > >> >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) > >> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { > >> >> + res->name = "ACPI reclaim region"; > >> >> + res->flags = IORESOURCE_MEM; > >> >> + } else { > >> >> + res->name = "System RAM"; > >> >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> >> + } > >> >> } > >> >> + > >> >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); > >> >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; > >> >> > >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) > >> >> > >> >> request_standard_resources(); > >> >> > >> >> + efi_memmap_unmap(); > >> >> early_ioremap_reset(); > >> >> > >> >> if (acpi_disabled) > >> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > >> >> index 80d1a885def5..a7c522eac640 100644 > >> >> --- a/drivers/firmware/efi/arm-init.c > >> >> +++ b/drivers/firmware/efi/arm-init.c > >> >> @@ -259,7 +259,6 @@ void __init efi_init(void) > >> >> > >> >> reserve_regions(); > >> >> efi_esrt_init(); > >> >> - efi_memmap_unmap(); > >> >> > >> >> memblock_reserve(params.mmap & PAGE_MASK, > >> >> PAGE_ALIGN(params.mmap_size + > >> >> > >> >> > >> >> After this change the ACPI reclaim regions are properly recognized in > >> >> '/proc/iomem': > >> >> > >> >> # cat /proc/iomem | grep -i ACPI > >> >> 396c0000-3975ffff : ACPI reclaim region > >> >> 39770000-397affff : ACPI reclaim region > >> >> 398a0000-398bffff : ACPI reclaim region > >> >> > >> >> 6c). I am currently changing the 'kexec-tools' and will finish the > >> >> testing over the next few days. > >> >> > >> >> I just wanted to know your opinion on this issue, so that I will be > >> >> able to propose a fix on the above lines. > >> >> > >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to > >> >> kexec-tools. > >> >> > >> >> Thanks, > >> >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-15 8:59 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-15 8:59 UTC (permalink / raw) To: linux-arm-kernel On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > On 13 December 2017 at 12:16, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> <takahiro.akashi@linaro.org> wrote: > >> > Bhupesh, Ard, > >> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> Hi Ard, Akashi > >> >> > >> > (snip) > >> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> , for details) > >> > > >> > Right. > >> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> with the crashkernel memory range: > >> >> > >> >> /* add linux,usable-memory-range */ > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> address_cells, size_cells); > >> >> > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> , for details) > >> >> > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> they are marked as System RAM or as RESERVED. As, > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> ACPI memory and crashes while trying to access the same: > >> >> > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> -r`.img --reuse-cmdline -d > >> >> > >> >> [snip..] > >> >> > >> >> Reserved memory range > >> >> 000000000e800000-000000002e7fffff (0) > >> >> > >> >> Coredump memory ranges > >> >> 0000000000000000-000000000e7fffff (0) > >> >> 000000002e800000-000000003961ffff (0) > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> 000000a000000000-000000affbffffff (0) > >> >> > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> memory cap'ing passed to the crash kernel inside > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> > >> >> static void __init fdt_enforce_memory_region(void) > >> >> { > >> >> struct memblock_region reg = { > >> >> .size = 0, > >> >> }; > >> >> > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> > >> >> if (reg.size) > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> comment this out */ > >> >> } > >> > > >> > Please just don't do that. It can cause a fatal damage on > >> > memory contents of the *crashed* kernel. > >> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> fail. > >> >> > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> dt node 'linux,usable-memory-range' > >> > > >> > I still don't understand why we need to carry over the information > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> > such regions are free to be reused by the kernel after some point of > >> > initialization. Why does crash dump kernel need to know about them? > >> > > >> > >> Not really. According to the UEFI spec, they can be reclaimed after > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> no longer needs them. Of course, in order to be able to boot a kexec > >> kernel, those regions needs to be preserved, which is why they are > >> memblock_reserve()'d now. > > > > For my better understandings, who is actually accessing such regions > > during boot time, uefi itself or efistub? > > > > No, only the kernel. This is where the ACPI tables are stored. For > instance, on QEMU we have > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > 01000013) > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > BXPC 00000001) > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > BXPC 00000001) > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > BXPC 00000001) > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > BXPC 00000001) > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > BXPC 00000001) > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > BXPC 00000001) > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > BXPC 00000001) > > covered by > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > ... > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] OK. I mistakenly understood those regions could be freed after exiting UEFI boot services. > > >> So it seems that kexec does not honour the memblock_reserve() table > >> when booting the next kernel. > > > > not really. > > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> > on crash dump kernel?) > >> > > >> > >> I don't think so. And the change to the handling of ACPI reclaim > >> regions only revealed the bug, not created it (given that other > >> memblock_reserve regions may be affected as well) > > > > As whether we should honor such reserved regions over kexec'ing > > depends on each one's specific nature, we will have to take care one-by-one. > > As a matter of fact, no information about "reserved" memblocks is > > exposed to user space (via proc/iomem). > > > > That is why I suggested (somewhere in this thread?) to not expose them > as 'System RAM'. Do you think that could solve this? Memblock-reserv'ing them is necessary to prevent their corruption and marking them under another name in /proc/iomem would also be good in order not to allocate them as part of crash kernel's memory. But I'm not still convinced that we should export them in useable- memory-range to crash dump kernel. They will be accessed through acpi_os_map_memory() and so won't be required to be part of system ram (or memblocks), I guess. -> Bhupesh? Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel via a kernel command line parameter, "memmap=". Thanks, -Takahiro AKASHI > > > >> > >> >> 6b). The kernel code currently looks like the following: > >> >> > >> >> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > >> >> index 30ad2f085d1f..867bdec7c692 100644 > >> >> --- a/arch/arm64/kernel/setup.c > >> >> +++ b/arch/arm64/kernel/setup.c > >> >> @@ -206,6 +206,7 @@ static void __init request_standard_resources(void) > >> >> { > >> >> struct memblock_region *region; > >> >> struct resource *res; > >> >> + phys_addr_t addr_start, addr_end; > >> >> > >> >> kernel_code.start = __pa_symbol(_text); > >> >> kernel_code.end = __pa_symbol(__init_begin - 1); > >> >> @@ -218,9 +219,17 @@ static void __init request_standard_resources(void) > >> >> res->name = "reserved"; > >> >> res->flags = IORESOURCE_MEM; > >> >> } else { > >> >> - res->name = "System RAM"; > >> >> - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> >> + addr_start = > >> >> __pfn_to_phys(memblock_region_reserved_base_pfn(region)); > >> >> + addr_end = > >> >> __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1; > >> >> + if ((efi_mem_type(addr_start) == EFI_ACPI_RECLAIM_MEMORY) > >> >> || (efi_mem_type(addr_end) == EFI_ACPI_RECLAIM_MEMORY)) { > >> >> + res->name = "ACPI reclaim region"; > >> >> + res->flags = IORESOURCE_MEM; > >> >> + } else { > >> >> + res->name = "System RAM"; > >> >> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > >> >> + } > >> >> } > >> >> + > >> >> res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); > >> >> res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; > >> >> > >> >> @@ -292,6 +301,7 @@ void __init setup_arch(char **cmdline_p) > >> >> > >> >> request_standard_resources(); > >> >> > >> >> + efi_memmap_unmap(); > >> >> early_ioremap_reset(); > >> >> > >> >> if (acpi_disabled) > >> >> diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > >> >> index 80d1a885def5..a7c522eac640 100644 > >> >> --- a/drivers/firmware/efi/arm-init.c > >> >> +++ b/drivers/firmware/efi/arm-init.c > >> >> @@ -259,7 +259,6 @@ void __init efi_init(void) > >> >> > >> >> reserve_regions(); > >> >> efi_esrt_init(); > >> >> - efi_memmap_unmap(); > >> >> > >> >> memblock_reserve(params.mmap & PAGE_MASK, > >> >> PAGE_ALIGN(params.mmap_size + > >> >> > >> >> > >> >> After this change the ACPI reclaim regions are properly recognized in > >> >> '/proc/iomem': > >> >> > >> >> # cat /proc/iomem | grep -i ACPI > >> >> 396c0000-3975ffff : ACPI reclaim region > >> >> 39770000-397affff : ACPI reclaim region > >> >> 398a0000-398bffff : ACPI reclaim region > >> >> > >> >> 6c). I am currently changing the 'kexec-tools' and will finish the > >> >> testing over the next few days. > >> >> > >> >> I just wanted to know your opinion on this issue, so that I will be > >> >> able to propose a fix on the above lines. > >> >> > >> >> Also Cc'ing kexec mailing list for more inputs on changes proposed to > >> >> kexec-tools. > >> >> > >> >> Thanks, > >> >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-15 8:59 ` AKASHI Takahiro @ 2017-12-15 9:35 ` Ard Biesheuvel -1 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-15 9:35 UTC (permalink / raw) To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On 15 December 2017 at 09:59, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> On 13 December 2017 at 12:16, AKASHI Takahiro >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> > Bhupesh, Ard, >> >> > >> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >> Hi Ard, Akashi >> >> >> >> >> > (snip) >> >> > >> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >> identify its own usable memory and exclude, at its boot time, any >> >> >> other memory areas that are part of the panicked kernel's memory. >> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >> , for details) >> >> > >> >> > Right. >> >> > >> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >> with the crashkernel memory range: >> >> >> >> >> >> /* add linux,usable-memory-range */ >> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >> address_cells, size_cells); >> >> >> >> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >> , for details) >> >> >> >> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >> they are marked as System RAM or as RESERVED. As, >> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >> ACPI memory and crashes while trying to access the same: >> >> >> >> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >> -r`.img --reuse-cmdline -d >> >> >> >> >> >> [snip..] >> >> >> >> >> >> Reserved memory range >> >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >> >> >> Coredump memory ranges >> >> >> 0000000000000000-000000000e7fffff (0) >> >> >> 000000002e800000-000000003961ffff (0) >> >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >> 000000003ed60000-000000003fbfffff (0) >> >> >> 0000001040000000-0000001ffbffffff (0) >> >> >> 0000002000000000-0000002ffbffffff (0) >> >> >> 0000009000000000-0000009ffbffffff (0) >> >> >> 000000a000000000-000000affbffffff (0) >> >> >> >> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >> memory cap'ing passed to the crash kernel inside >> >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >> >> >> static void __init fdt_enforce_memory_region(void) >> >> >> { >> >> >> struct memblock_region reg = { >> >> >> .size = 0, >> >> >> }; >> >> >> >> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >> >> >> if (reg.size) >> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >> comment this out */ >> >> >> } >> >> > >> >> > Please just don't do that. It can cause a fatal damage on >> >> > memory contents of the *crashed* kernel. >> >> > >> >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >> fail. >> >> >> >> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >> dt node 'linux,usable-memory-range' >> >> > >> >> > I still don't understand why we need to carry over the information >> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> > such regions are free to be reused by the kernel after some point of >> >> > initialization. Why does crash dump kernel need to know about them? >> >> > >> >> >> >> Not really. According to the UEFI spec, they can be reclaimed after >> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> no longer needs them. Of course, in order to be able to boot a kexec >> >> kernel, those regions needs to be preserved, which is why they are >> >> memblock_reserve()'d now. >> > >> > For my better understandings, who is actually accessing such regions >> > during boot time, uefi itself or efistub? >> > >> >> No, only the kernel. This is where the ACPI tables are stored. For >> instance, on QEMU we have >> >> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> 01000013) >> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> BXPC 00000001) >> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> BXPC 00000001) >> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> BXPC 00000001) >> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> BXPC 00000001) >> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> BXPC 00000001) >> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> BXPC 00000001) >> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> BXPC 00000001) >> >> covered by >> >> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> ... >> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > OK. I mistakenly understood those regions could be freed after exiting > UEFI boot services. > >> >> >> So it seems that kexec does not honour the memblock_reserve() table >> >> when booting the next kernel. >> > >> > not really. >> > >> >> > (In other words, can or should we skip some part of ACPI-related init code >> >> > on crash dump kernel?) >> >> > >> >> >> >> I don't think so. And the change to the handling of ACPI reclaim >> >> regions only revealed the bug, not created it (given that other >> >> memblock_reserve regions may be affected as well) >> > >> > As whether we should honor such reserved regions over kexec'ing >> > depends on each one's specific nature, we will have to take care one-by-one. >> > As a matter of fact, no information about "reserved" memblocks is >> > exposed to user space (via proc/iomem). >> > >> >> That is why I suggested (somewhere in this thread?) to not expose them >> as 'System RAM'. Do you think that could solve this? > > Memblock-reserv'ing them is necessary to prevent their corruption and > marking them under another name in /proc/iomem would also be good in order > not to allocate them as part of crash kernel's memory. > I agree. However, this may not be entirely trivial, since iterating over the memblock_reserved table and creating iomem entries may result in collisions. > But I'm not still convinced that we should export them in useable- > memory-range to crash dump kernel. They will be accessed through > acpi_os_map_memory() and so won't be required to be part of system ram > (or memblocks), I guess. Agreed. They will be covered by the linear mapping in the boot kernel, and be mapped explicitly via ioremap_cache() in the kexec kernel, which is exactly what we want in this case. > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > via a kernel command line parameter, "memmap=". > ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-15 9:35 ` Ard Biesheuvel 0 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-15 9:35 UTC (permalink / raw) To: linux-arm-kernel On 15 December 2017 at 09:59, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> On 13 December 2017 at 12:16, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> <takahiro.akashi@linaro.org> wrote: >> >> > Bhupesh, Ard, >> >> > >> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >> Hi Ard, Akashi >> >> >> >> >> > (snip) >> >> > >> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >> identify its own usable memory and exclude, at its boot time, any >> >> >> other memory areas that are part of the panicked kernel's memory. >> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >> , for details) >> >> > >> >> > Right. >> >> > >> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >> with the crashkernel memory range: >> >> >> >> >> >> /* add linux,usable-memory-range */ >> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >> address_cells, size_cells); >> >> >> >> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >> , for details) >> >> >> >> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >> they are marked as System RAM or as RESERVED. As, >> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >> ACPI memory and crashes while trying to access the same: >> >> >> >> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >> -r`.img --reuse-cmdline -d >> >> >> >> >> >> [snip..] >> >> >> >> >> >> Reserved memory range >> >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >> >> >> Coredump memory ranges >> >> >> 0000000000000000-000000000e7fffff (0) >> >> >> 000000002e800000-000000003961ffff (0) >> >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >> 000000003ed60000-000000003fbfffff (0) >> >> >> 0000001040000000-0000001ffbffffff (0) >> >> >> 0000002000000000-0000002ffbffffff (0) >> >> >> 0000009000000000-0000009ffbffffff (0) >> >> >> 000000a000000000-000000affbffffff (0) >> >> >> >> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >> memory cap'ing passed to the crash kernel inside >> >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >> >> >> static void __init fdt_enforce_memory_region(void) >> >> >> { >> >> >> struct memblock_region reg = { >> >> >> .size = 0, >> >> >> }; >> >> >> >> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >> >> >> if (reg.size) >> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >> comment this out */ >> >> >> } >> >> > >> >> > Please just don't do that. It can cause a fatal damage on >> >> > memory contents of the *crashed* kernel. >> >> > >> >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >> fail. >> >> >> >> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >> dt node 'linux,usable-memory-range' >> >> > >> >> > I still don't understand why we need to carry over the information >> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> > such regions are free to be reused by the kernel after some point of >> >> > initialization. Why does crash dump kernel need to know about them? >> >> > >> >> >> >> Not really. According to the UEFI spec, they can be reclaimed after >> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> no longer needs them. Of course, in order to be able to boot a kexec >> >> kernel, those regions needs to be preserved, which is why they are >> >> memblock_reserve()'d now. >> > >> > For my better understandings, who is actually accessing such regions >> > during boot time, uefi itself or efistub? >> > >> >> No, only the kernel. This is where the ACPI tables are stored. For >> instance, on QEMU we have >> >> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> 01000013) >> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> BXPC 00000001) >> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> BXPC 00000001) >> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> BXPC 00000001) >> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> BXPC 00000001) >> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> BXPC 00000001) >> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> BXPC 00000001) >> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> BXPC 00000001) >> >> covered by >> >> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> ... >> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > OK. I mistakenly understood those regions could be freed after exiting > UEFI boot services. > >> >> >> So it seems that kexec does not honour the memblock_reserve() table >> >> when booting the next kernel. >> > >> > not really. >> > >> >> > (In other words, can or should we skip some part of ACPI-related init code >> >> > on crash dump kernel?) >> >> > >> >> >> >> I don't think so. And the change to the handling of ACPI reclaim >> >> regions only revealed the bug, not created it (given that other >> >> memblock_reserve regions may be affected as well) >> > >> > As whether we should honor such reserved regions over kexec'ing >> > depends on each one's specific nature, we will have to take care one-by-one. >> > As a matter of fact, no information about "reserved" memblocks is >> > exposed to user space (via proc/iomem). >> > >> >> That is why I suggested (somewhere in this thread?) to not expose them >> as 'System RAM'. Do you think that could solve this? > > Memblock-reserv'ing them is necessary to prevent their corruption and > marking them under another name in /proc/iomem would also be good in order > not to allocate them as part of crash kernel's memory. > I agree. However, this may not be entirely trivial, since iterating over the memblock_reserved table and creating iomem entries may result in collisions. > But I'm not still convinced that we should export them in useable- > memory-range to crash dump kernel. They will be accessed through > acpi_os_map_memory() and so won't be required to be part of system ram > (or memblocks), I guess. Agreed. They will be covered by the linear mapping in the boot kernel, and be mapped explicitly via ioremap_cache() in the kexec kernel, which is exactly what we want in this case. > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > via a kernel command line parameter, "memmap=". > ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAKv+Gu-W5VpVrgA=FVZCCevksaRGOVvPdE+B8WkpZc6AE1jOPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-15 9:35 ` Ard Biesheuvel @ 2017-12-17 21:01 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-17 21:01 UTC (permalink / raw) To: Ard Biesheuvel Cc: AKASHI Takahiro, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On 15 December 2017 at 09:59, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >>> On 13 December 2017 at 12:16, AKASHI Takahiro >>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >>> >> > Bhupesh, Ard, >>> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >>> >> >> Hi Ard, Akashi >>> >> >> >>> >> > (snip) >>> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >>> >> >> identify its own usable memory and exclude, at its boot time, any >>> >> >> other memory areas that are part of the panicked kernel's memory. >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >>> >> >> , for details) >>> >> > >>> >> > Right. >>> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >>> >> >> with the crashkernel memory range: >>> >> >> >>> >> >> /* add linux,usable-memory-range */ >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >>> >> >> address_cells, size_cells); >>> >> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >>> >> >> , for details) >>> >> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >>> >> >> they are marked as System RAM or as RESERVED. As, >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >>> >> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >>> >> >> ACPI memory and crashes while trying to access the same: >>> >> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >>> >> >> -r`.img --reuse-cmdline -d >>> >> >> >>> >> >> [snip..] >>> >> >> >>> >> >> Reserved memory range >>> >> >> 000000000e800000-000000002e7fffff (0) >>> >> >> >>> >> >> Coredump memory ranges >>> >> >> 0000000000000000-000000000e7fffff (0) >>> >> >> 000000002e800000-000000003961ffff (0) >>> >> >> 0000000039d40000-000000003ed2ffff (0) >>> >> >> 000000003ed60000-000000003fbfffff (0) >>> >> >> 0000001040000000-0000001ffbffffff (0) >>> >> >> 0000002000000000-0000002ffbffffff (0) >>> >> >> 0000009000000000-0000009ffbffffff (0) >>> >> >> 000000a000000000-000000affbffffff (0) >>> >> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >>> >> >> memory cap'ing passed to the crash kernel inside >>> >> >> 'arch/arm64/mm/init.c' (see below): >>> >> >> >>> >> >> static void __init fdt_enforce_memory_region(void) >>> >> >> { >>> >> >> struct memblock_region reg = { >>> >> >> .size = 0, >>> >> >> }; >>> >> >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >>> >> >> >>> >> >> if (reg.size) >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >>> >> >> comment this out */ >>> >> >> } >>> >> > >>> >> > Please just don't do that. It can cause a fatal damage on >>> >> > memory contents of the *crashed* kernel. >>> >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >>> >> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >>> >> >> fail. >>> >> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >>> >> >> dt node 'linux,usable-memory-range' >>> >> > >>> >> > I still don't understand why we need to carry over the information >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >>> >> > such regions are free to be reused by the kernel after some point of >>> >> > initialization. Why does crash dump kernel need to know about them? >>> >> > >>> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >>> >> no longer needs them. Of course, in order to be able to boot a kexec >>> >> kernel, those regions needs to be preserved, which is why they are >>> >> memblock_reserve()'d now. >>> > >>> > For my better understandings, who is actually accessing such regions >>> > during boot time, uefi itself or efistub? >>> > >>> >>> No, only the kernel. This is where the ACPI tables are stored. For >>> instance, on QEMU we have >>> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >>> 01000013) >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >>> BXPC 00000001) >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >>> BXPC 00000001) >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >>> BXPC 00000001) >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >>> BXPC 00000001) >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >>> BXPC 00000001) >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >>> BXPC 00000001) >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >>> BXPC 00000001) >>> >>> covered by >>> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >>> ... >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> OK. I mistakenly understood those regions could be freed after exiting >> UEFI boot services. >> >>> >>> >> So it seems that kexec does not honour the memblock_reserve() table >>> >> when booting the next kernel. >>> > >>> > not really. >>> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >>> >> > on crash dump kernel?) >>> >> > >>> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim >>> >> regions only revealed the bug, not created it (given that other >>> >> memblock_reserve regions may be affected as well) >>> > >>> > As whether we should honor such reserved regions over kexec'ing >>> > depends on each one's specific nature, we will have to take care one-by-one. >>> > As a matter of fact, no information about "reserved" memblocks is >>> > exposed to user space (via proc/iomem). >>> > >>> >>> That is why I suggested (somewhere in this thread?) to not expose them >>> as 'System RAM'. Do you think that could solve this? >> >> Memblock-reserv'ing them is necessary to prevent their corruption and >> marking them under another name in /proc/iomem would also be good in order >> not to allocate them as part of crash kernel's memory. >> > > I agree. However, this may not be entirely trivial, since iterating > over the memblock_reserved table and creating iomem entries may result > in collisions. I found a method (using the patch I shared earlier in this thread) to mark these entries as 'ACPI reclaim memory' ranges rather than System RAM or reserved regions. >> But I'm not still convinced that we should export them in useable- >> memory-range to crash dump kernel. They will be accessed through >> acpi_os_map_memory() and so won't be required to be part of system ram >> (or memblocks), I guess. > > Agreed. They will be covered by the linear mapping in the boot kernel, > and be mapped explicitly via ioremap_cache() in the kexec kernel, > which is exactly what we want in this case. Now this is what is confusing me. I don't see the above happening. I see that the primary kernel boots up and adds the ACPI regions via: acpi_os_ioremap -> ioremap_cache But during the crashkernel boot, ''acpi_os_ioremap' calls 'ioremap' for the ACPI Reclaim Memory regions and not the _cache variant. And it fails while accessing the ACPI tables: [ 0.039205] ACPI: Core revision 20170728 pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 [ 0.095098] Internal error: Oops: 96000021 [#1] SMP [ 0.100022] Modules linked in: [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] pstate: 60000045 [ 0.132647] sp : ffff000008ccfb40 [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146718] x25: 000000000000001b x24: 0000000000000001 [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162812] x19: 000000000000001b x18: 0000000000000005 [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 [ 0.173541] x15: 0000000000000000 x14: 000000000000038e [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223224] Call trace: [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232194] fa00: 0000000000000000 ffff000009710027 ffff0000095e3980 ffff000008ccfbe0 [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 ffff000008ccfc50 0000000000000000 [ 0.248018] fa40: ffff8000126d0140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263843] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 ffff000009710027 0000000000000001 [ 0.279667] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 ffff00000849b4f8 ffff000008ccfb40 [ 0.295491] fb00: ffff0000084a6764 0000000060000045 ffff000008ccfb40 ffff000008260a18 [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 ffff000008ccfb40 ffff0000084a6764 [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- [ 0.399160] Kernel panic - not syncing: Fatal exception [ 0.404437] Rebooting in 10 seconds. So, I think the linear mapping done by the primary kernel does not make these accessible in the crash kernel directly. Any pointers? Regards, Bhupesh >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> via a kernel command line parameter, "memmap=". >> ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-17 21:01 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-17 21:01 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 15 December 2017 at 09:59, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >>> On 13 December 2017 at 12:16, AKASHI Takahiro >>> <takahiro.akashi@linaro.org> wrote: >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >>> >> <takahiro.akashi@linaro.org> wrote: >>> >> > Bhupesh, Ard, >>> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >>> >> >> Hi Ard, Akashi >>> >> >> >>> >> > (snip) >>> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >>> >> >> identify its own usable memory and exclude, at its boot time, any >>> >> >> other memory areas that are part of the panicked kernel's memory. >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >>> >> >> , for details) >>> >> > >>> >> > Right. >>> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >>> >> >> with the crashkernel memory range: >>> >> >> >>> >> >> /* add linux,usable-memory-range */ >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >>> >> >> address_cells, size_cells); >>> >> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >>> >> >> , for details) >>> >> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >>> >> >> they are marked as System RAM or as RESERVED. As, >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >>> >> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >>> >> >> ACPI memory and crashes while trying to access the same: >>> >> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >>> >> >> -r`.img --reuse-cmdline -d >>> >> >> >>> >> >> [snip..] >>> >> >> >>> >> >> Reserved memory range >>> >> >> 000000000e800000-000000002e7fffff (0) >>> >> >> >>> >> >> Coredump memory ranges >>> >> >> 0000000000000000-000000000e7fffff (0) >>> >> >> 000000002e800000-000000003961ffff (0) >>> >> >> 0000000039d40000-000000003ed2ffff (0) >>> >> >> 000000003ed60000-000000003fbfffff (0) >>> >> >> 0000001040000000-0000001ffbffffff (0) >>> >> >> 0000002000000000-0000002ffbffffff (0) >>> >> >> 0000009000000000-0000009ffbffffff (0) >>> >> >> 000000a000000000-000000affbffffff (0) >>> >> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >>> >> >> memory cap'ing passed to the crash kernel inside >>> >> >> 'arch/arm64/mm/init.c' (see below): >>> >> >> >>> >> >> static void __init fdt_enforce_memory_region(void) >>> >> >> { >>> >> >> struct memblock_region reg = { >>> >> >> .size = 0, >>> >> >> }; >>> >> >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >>> >> >> >>> >> >> if (reg.size) >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >>> >> >> comment this out */ >>> >> >> } >>> >> > >>> >> > Please just don't do that. It can cause a fatal damage on >>> >> > memory contents of the *crashed* kernel. >>> >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >>> >> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >>> >> >> fail. >>> >> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >>> >> >> dt node 'linux,usable-memory-range' >>> >> > >>> >> > I still don't understand why we need to carry over the information >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >>> >> > such regions are free to be reused by the kernel after some point of >>> >> > initialization. Why does crash dump kernel need to know about them? >>> >> > >>> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >>> >> no longer needs them. Of course, in order to be able to boot a kexec >>> >> kernel, those regions needs to be preserved, which is why they are >>> >> memblock_reserve()'d now. >>> > >>> > For my better understandings, who is actually accessing such regions >>> > during boot time, uefi itself or efistub? >>> > >>> >>> No, only the kernel. This is where the ACPI tables are stored. For >>> instance, on QEMU we have >>> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >>> 01000013) >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >>> BXPC 00000001) >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >>> BXPC 00000001) >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >>> BXPC 00000001) >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >>> BXPC 00000001) >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >>> BXPC 00000001) >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >>> BXPC 00000001) >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >>> BXPC 00000001) >>> >>> covered by >>> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >>> ... >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> OK. I mistakenly understood those regions could be freed after exiting >> UEFI boot services. >> >>> >>> >> So it seems that kexec does not honour the memblock_reserve() table >>> >> when booting the next kernel. >>> > >>> > not really. >>> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >>> >> > on crash dump kernel?) >>> >> > >>> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim >>> >> regions only revealed the bug, not created it (given that other >>> >> memblock_reserve regions may be affected as well) >>> > >>> > As whether we should honor such reserved regions over kexec'ing >>> > depends on each one's specific nature, we will have to take care one-by-one. >>> > As a matter of fact, no information about "reserved" memblocks is >>> > exposed to user space (via proc/iomem). >>> > >>> >>> That is why I suggested (somewhere in this thread?) to not expose them >>> as 'System RAM'. Do you think that could solve this? >> >> Memblock-reserv'ing them is necessary to prevent their corruption and >> marking them under another name in /proc/iomem would also be good in order >> not to allocate them as part of crash kernel's memory. >> > > I agree. However, this may not be entirely trivial, since iterating > over the memblock_reserved table and creating iomem entries may result > in collisions. I found a method (using the patch I shared earlier in this thread) to mark these entries as 'ACPI reclaim memory' ranges rather than System RAM or reserved regions. >> But I'm not still convinced that we should export them in useable- >> memory-range to crash dump kernel. They will be accessed through >> acpi_os_map_memory() and so won't be required to be part of system ram >> (or memblocks), I guess. > > Agreed. They will be covered by the linear mapping in the boot kernel, > and be mapped explicitly via ioremap_cache() in the kexec kernel, > which is exactly what we want in this case. Now this is what is confusing me. I don't see the above happening. I see that the primary kernel boots up and adds the ACPI regions via: acpi_os_ioremap -> ioremap_cache But during the crashkernel boot, ''acpi_os_ioremap' calls 'ioremap' for the ACPI Reclaim Memory regions and not the _cache variant. And it fails while accessing the ACPI tables: [ 0.039205] ACPI: Core revision 20170728 pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 [ 0.095098] Internal error: Oops: 96000021 [#1] SMP [ 0.100022] Modules linked in: [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] pstate: 60000045 [ 0.132647] sp : ffff000008ccfb40 [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 [ 0.146718] x25: 000000000000001b x24: 0000000000000001 [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 [ 0.162812] x19: 000000000000001b x18: 0000000000000005 [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 [ 0.173541] x15: 0000000000000000 x14: 000000000000038e [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) [ 0.223224] Call trace: [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) [ 0.232194] fa00: 0000000000000000 ffff000009710027 ffff0000095e3980 ffff000008ccfbe0 [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 ffff000008ccfc50 0000000000000000 [ 0.248018] fa40: ffff8000126d0140 000000000000005f 00000000ffffff76 0000000000000006 [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 000000000000038e 0000000000000000 [ 0.263843] fa80: 0000000000000000 0000000000000000 0000000000000005 000000000000001b [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 ffff000009710027 0000000000000001 [ 0.279667] fac0: 0000000000000001 000000000000001b 0000000000000000 ffff0000088be820 [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 ffff00000849b4f8 ffff000008ccfb40 [ 0.295491] fb00: ffff0000084a6764 0000000060000045 ffff000008ccfb40 ffff000008260a18 [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 ffff000008ccfb40 ffff0000084a6764 [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- [ 0.399160] Kernel panic - not syncing: Fatal exception [ 0.404437] Rebooting in 10 seconds. So, I think the linear mapping done by the primary kernel does not make these accessible in the crash kernel directly. Any pointers? Regards, Bhupesh >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> via a kernel command line parameter, "memmap=". >> ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-17 21:01 ` Bhupesh Sharma (?) (?) @ 2017-12-18 5:16 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:16 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it to kexec@lists.infradead.org Also add linux-acpi list On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 15 December 2017 at 09:59, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >>> <takahiro.akashi@linaro.org> wrote: > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> >> <takahiro.akashi@linaro.org> wrote: > >>> >> > Bhupesh, Ard, > >>> >> > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> >> >> Hi Ard, Akashi > >>> >> >> > >>> >> > (snip) > >>> >> > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> >> >> , for details) > >>> >> > > >>> >> > Right. > >>> >> > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> >> >> with the crashkernel memory range: > >>> >> >> > >>> >> >> /* add linux,usable-memory-range */ > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> >> >> address_cells, size_cells); > >>> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> >> >> , for details) > >>> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> >> >> they are marked as System RAM or as RESERVED. As, > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> >> >> ACPI memory and crashes while trying to access the same: > >>> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> >> >> -r`.img --reuse-cmdline -d > >>> >> >> > >>> >> >> [snip..] > >>> >> >> > >>> >> >> Reserved memory range > >>> >> >> 000000000e800000-000000002e7fffff (0) > >>> >> >> > >>> >> >> Coredump memory ranges > >>> >> >> 0000000000000000-000000000e7fffff (0) > >>> >> >> 000000002e800000-000000003961ffff (0) > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >>> >> >> 000000a000000000-000000affbffffff (0) > >>> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> >> >> memory cap'ing passed to the crash kernel inside > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >>> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >>> >> >> { > >>> >> >> struct memblock_region reg = { > >>> >> >> .size = 0, > >>> >> >> }; > >>> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> >> >> > >>> >> >> if (reg.size) > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> >> >> comment this out */ > >>> >> >> } > >>> >> > > >>> >> > Please just don't do that. It can cause a fatal damage on > >>> >> > memory contents of the *crashed* kernel. > >>> >> > > >>> >> >> 5). Both the above temporary solutions fix the problem. > >>> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> >> >> fail. > >>> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> >> >> dt node 'linux,usable-memory-range' > >>> >> > > >>> >> > I still don't understand why we need to carry over the information > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> >> > such regions are free to be reused by the kernel after some point of > >>> >> > initialization. Why does crash dump kernel need to know about them? > >>> >> > > >>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >>> >> kernel, those regions needs to be preserved, which is why they are > >>> >> memblock_reserve()'d now. > >>> > > >>> > For my better understandings, who is actually accessing such regions > >>> > during boot time, uefi itself or efistub? > >>> > > >>> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >>> instance, on QEMU we have > >>> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> 01000013) > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> BXPC 00000001) > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> BXPC 00000001) > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> BXPC 00000001) > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> BXPC 00000001) > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> BXPC 00000001) > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> BXPC 00000001) > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> BXPC 00000001) > >>> > >>> covered by > >>> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> ... > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> UEFI boot services. > >> > >>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >>> >> when booting the next kernel. > >>> > > >>> > not really. > >>> > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >>> >> > on crash dump kernel?) > >>> >> > > >>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >>> >> regions only revealed the bug, not created it (given that other > >>> >> memblock_reserve regions may be affected as well) > >>> > > >>> > As whether we should honor such reserved regions over kexec'ing > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >>> > As a matter of fact, no information about "reserved" memblocks is > >>> > exposed to user space (via proc/iomem). > >>> > > >>> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >>> as 'System RAM'. Do you think that could solve this? > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> marking them under another name in /proc/iomem would also be good in order > >> not to allocate them as part of crash kernel's memory. > >> > > > > I agree. However, this may not be entirely trivial, since iterating > > over the memblock_reserved table and creating iomem entries may result > > in collisions. > > I found a method (using the patch I shared earlier in this thread) to mark these > entries as 'ACPI reclaim memory' ranges rather than System RAM or > reserved regions. > > >> But I'm not still convinced that we should export them in useable- > >> memory-range to crash dump kernel. They will be accessed through > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> (or memblocks), I guess. > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > which is exactly what we want in this case. > > Now this is what is confusing me. I don't see the above happening. > > I see that the primary kernel boots up and adds the ACPI regions via: > acpi_os_ioremap > -> ioremap_cache > > But during the crashkernel boot, ''acpi_os_ioremap' calls > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > variant. > > And it fails while accessing the ACPI tables: > > [ 0.039205] ACPI: Core revision 20170728 > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > [ 0.100022] Modules linked in: > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > pstate: 60000045 > [ 0.132647] sp : ffff000008ccfb40 > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > [ 0.223224] Call trace: > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > ffff0000095e3980 ffff000008ccfbe0 > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > ffff000008ccfc50 0000000000000000 > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > 00000000ffffff76 0000000000000006 > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > 000000000000038e 0000000000000000 > [ 0.263843] fa80: 0000000000000000 0000000000000000 > 0000000000000005 000000000000001b > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > ffff000009710027 0000000000000001 > [ 0.279667] fac0: 0000000000000001 000000000000001b > 0000000000000000 ffff0000088be820 > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > ffff00000849b4f8 ffff000008ccfb40 > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > ffff000008ccfb40 ffff000008260a18 > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > ffff000008ccfb40 ffff0000084a6764 > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > [ 0.399160] Kernel panic - not syncing: Fatal exception > [ 0.404437] Rebooting in 10 seconds. > > So, I think the linear mapping done by the primary kernel does not > make these accessible in the crash kernel directly. > > Any pointers? Can you get the code line number for acpi_ns_lookup+0x25c? > > Regards, > Bhupesh > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> via a kernel command line parameter, "memmap=". > >> > _______________________________________________ > kexec mailing list -- kexec@lists.fedoraproject.org > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:16 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:16 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, AKASHI Takahiro, Matt Fleming, Ard Biesheuvel, kexec, linux-kernel, linux-acpi, James Morse, Bhupesh SHARMA, linux-arm-kernel kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it to kexec@lists.infradead.org Also add linux-acpi list On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 15 December 2017 at 09:59, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >>> <takahiro.akashi@linaro.org> wrote: > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> >> <takahiro.akashi@linaro.org> wrote: > >>> >> > Bhupesh, Ard, > >>> >> > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> >> >> Hi Ard, Akashi > >>> >> >> > >>> >> > (snip) > >>> >> > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> >> >> , for details) > >>> >> > > >>> >> > Right. > >>> >> > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> >> >> with the crashkernel memory range: > >>> >> >> > >>> >> >> /* add linux,usable-memory-range */ > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> >> >> address_cells, size_cells); > >>> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> >> >> , for details) > >>> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> >> >> they are marked as System RAM or as RESERVED. As, > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> >> >> ACPI memory and crashes while trying to access the same: > >>> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> >> >> -r`.img --reuse-cmdline -d > >>> >> >> > >>> >> >> [snip..] > >>> >> >> > >>> >> >> Reserved memory range > >>> >> >> 000000000e800000-000000002e7fffff (0) > >>> >> >> > >>> >> >> Coredump memory ranges > >>> >> >> 0000000000000000-000000000e7fffff (0) > >>> >> >> 000000002e800000-000000003961ffff (0) > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >>> >> >> 000000a000000000-000000affbffffff (0) > >>> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> >> >> memory cap'ing passed to the crash kernel inside > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >>> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >>> >> >> { > >>> >> >> struct memblock_region reg = { > >>> >> >> .size = 0, > >>> >> >> }; > >>> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> >> >> > >>> >> >> if (reg.size) > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> >> >> comment this out */ > >>> >> >> } > >>> >> > > >>> >> > Please just don't do that. It can cause a fatal damage on > >>> >> > memory contents of the *crashed* kernel. > >>> >> > > >>> >> >> 5). Both the above temporary solutions fix the problem. > >>> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> >> >> fail. > >>> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> >> >> dt node 'linux,usable-memory-range' > >>> >> > > >>> >> > I still don't understand why we need to carry over the information > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> >> > such regions are free to be reused by the kernel after some point of > >>> >> > initialization. Why does crash dump kernel need to know about them? > >>> >> > > >>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >>> >> kernel, those regions needs to be preserved, which is why they are > >>> >> memblock_reserve()'d now. > >>> > > >>> > For my better understandings, who is actually accessing such regions > >>> > during boot time, uefi itself or efistub? > >>> > > >>> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >>> instance, on QEMU we have > >>> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> 01000013) > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> BXPC 00000001) > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> BXPC 00000001) > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> BXPC 00000001) > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> BXPC 00000001) > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> BXPC 00000001) > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> BXPC 00000001) > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> BXPC 00000001) > >>> > >>> covered by > >>> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> ... > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> UEFI boot services. > >> > >>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >>> >> when booting the next kernel. > >>> > > >>> > not really. > >>> > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >>> >> > on crash dump kernel?) > >>> >> > > >>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >>> >> regions only revealed the bug, not created it (given that other > >>> >> memblock_reserve regions may be affected as well) > >>> > > >>> > As whether we should honor such reserved regions over kexec'ing > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >>> > As a matter of fact, no information about "reserved" memblocks is > >>> > exposed to user space (via proc/iomem). > >>> > > >>> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >>> as 'System RAM'. Do you think that could solve this? > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> marking them under another name in /proc/iomem would also be good in order > >> not to allocate them as part of crash kernel's memory. > >> > > > > I agree. However, this may not be entirely trivial, since iterating > > over the memblock_reserved table and creating iomem entries may result > > in collisions. > > I found a method (using the patch I shared earlier in this thread) to mark these > entries as 'ACPI reclaim memory' ranges rather than System RAM or > reserved regions. > > >> But I'm not still convinced that we should export them in useable- > >> memory-range to crash dump kernel. They will be accessed through > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> (or memblocks), I guess. > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > which is exactly what we want in this case. > > Now this is what is confusing me. I don't see the above happening. > > I see that the primary kernel boots up and adds the ACPI regions via: > acpi_os_ioremap > -> ioremap_cache > > But during the crashkernel boot, ''acpi_os_ioremap' calls > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > variant. > > And it fails while accessing the ACPI tables: > > [ 0.039205] ACPI: Core revision 20170728 > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > [ 0.100022] Modules linked in: > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > pstate: 60000045 > [ 0.132647] sp : ffff000008ccfb40 > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > [ 0.223224] Call trace: > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > ffff0000095e3980 ffff000008ccfbe0 > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > ffff000008ccfc50 0000000000000000 > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > 00000000ffffff76 0000000000000006 > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > 000000000000038e 0000000000000000 > [ 0.263843] fa80: 0000000000000000 0000000000000000 > 0000000000000005 000000000000001b > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > ffff000009710027 0000000000000001 > [ 0.279667] fac0: 0000000000000001 000000000000001b > 0000000000000000 ffff0000088be820 > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > ffff00000849b4f8 ffff000008ccfb40 > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > ffff000008ccfb40 ffff000008260a18 > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > ffff000008ccfb40 ffff0000084a6764 > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > [ 0.399160] Kernel panic - not syncing: Fatal exception > [ 0.404437] Rebooting in 10 seconds. > > So, I think the linear mapping done by the primary kernel does not > make these accessible in the crash kernel directly. > > Any pointers? Can you get the code line number for acpi_ns_lookup+0x25c? > > Regards, > Bhupesh > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> via a kernel command line parameter, "memmap=". > >> > _______________________________________________ > kexec mailing list -- kexec@lists.fedoraproject.org > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:16 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:16 UTC (permalink / raw) To: linux-arm-kernel kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it to kexec at lists.infradead.org Also add linux-acpi list On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 15 December 2017 at 09:59, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >>> <takahiro.akashi@linaro.org> wrote: > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> >> <takahiro.akashi@linaro.org> wrote: > >>> >> > Bhupesh, Ard, > >>> >> > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> >> >> Hi Ard, Akashi > >>> >> >> > >>> >> > (snip) > >>> >> > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> >> >> , for details) > >>> >> > > >>> >> > Right. > >>> >> > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> >> >> with the crashkernel memory range: > >>> >> >> > >>> >> >> /* add linux,usable-memory-range */ > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> >> >> address_cells, size_cells); > >>> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> >> >> , for details) > >>> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> >> >> they are marked as System RAM or as RESERVED. As, > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> >> >> ACPI memory and crashes while trying to access the same: > >>> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> >> >> -r`.img --reuse-cmdline -d > >>> >> >> > >>> >> >> [snip..] > >>> >> >> > >>> >> >> Reserved memory range > >>> >> >> 000000000e800000-000000002e7fffff (0) > >>> >> >> > >>> >> >> Coredump memory ranges > >>> >> >> 0000000000000000-000000000e7fffff (0) > >>> >> >> 000000002e800000-000000003961ffff (0) > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >>> >> >> 000000a000000000-000000affbffffff (0) > >>> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> >> >> memory cap'ing passed to the crash kernel inside > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >>> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >>> >> >> { > >>> >> >> struct memblock_region reg = { > >>> >> >> .size = 0, > >>> >> >> }; > >>> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> >> >> > >>> >> >> if (reg.size) > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> >> >> comment this out */ > >>> >> >> } > >>> >> > > >>> >> > Please just don't do that. It can cause a fatal damage on > >>> >> > memory contents of the *crashed* kernel. > >>> >> > > >>> >> >> 5). Both the above temporary solutions fix the problem. > >>> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> >> >> fail. > >>> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> >> >> dt node 'linux,usable-memory-range' > >>> >> > > >>> >> > I still don't understand why we need to carry over the information > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> >> > such regions are free to be reused by the kernel after some point of > >>> >> > initialization. Why does crash dump kernel need to know about them? > >>> >> > > >>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >>> >> kernel, those regions needs to be preserved, which is why they are > >>> >> memblock_reserve()'d now. > >>> > > >>> > For my better understandings, who is actually accessing such regions > >>> > during boot time, uefi itself or efistub? > >>> > > >>> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >>> instance, on QEMU we have > >>> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> 01000013) > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> BXPC 00000001) > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> BXPC 00000001) > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> BXPC 00000001) > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> BXPC 00000001) > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> BXPC 00000001) > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> BXPC 00000001) > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> BXPC 00000001) > >>> > >>> covered by > >>> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> ... > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> UEFI boot services. > >> > >>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >>> >> when booting the next kernel. > >>> > > >>> > not really. > >>> > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >>> >> > on crash dump kernel?) > >>> >> > > >>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >>> >> regions only revealed the bug, not created it (given that other > >>> >> memblock_reserve regions may be affected as well) > >>> > > >>> > As whether we should honor such reserved regions over kexec'ing > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >>> > As a matter of fact, no information about "reserved" memblocks is > >>> > exposed to user space (via proc/iomem). > >>> > > >>> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >>> as 'System RAM'. Do you think that could solve this? > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> marking them under another name in /proc/iomem would also be good in order > >> not to allocate them as part of crash kernel's memory. > >> > > > > I agree. However, this may not be entirely trivial, since iterating > > over the memblock_reserved table and creating iomem entries may result > > in collisions. > > I found a method (using the patch I shared earlier in this thread) to mark these > entries as 'ACPI reclaim memory' ranges rather than System RAM or > reserved regions. > > >> But I'm not still convinced that we should export them in useable- > >> memory-range to crash dump kernel. They will be accessed through > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> (or memblocks), I guess. > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > which is exactly what we want in this case. > > Now this is what is confusing me. I don't see the above happening. > > I see that the primary kernel boots up and adds the ACPI regions via: > acpi_os_ioremap > -> ioremap_cache > > But during the crashkernel boot, ''acpi_os_ioremap' calls > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > variant. > > And it fails while accessing the ACPI tables: > > [ 0.039205] ACPI: Core revision 20170728 > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > [ 0.100022] Modules linked in: > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > pstate: 60000045 > [ 0.132647] sp : ffff000008ccfb40 > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > [ 0.223224] Call trace: > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > ffff0000095e3980 ffff000008ccfbe0 > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > ffff000008ccfc50 0000000000000000 > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > 00000000ffffff76 0000000000000006 > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > 000000000000038e 0000000000000000 > [ 0.263843] fa80: 0000000000000000 0000000000000000 > 0000000000000005 000000000000001b > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > ffff000009710027 0000000000000001 > [ 0.279667] fac0: 0000000000000001 000000000000001b > 0000000000000000 ffff0000088be820 > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > ffff00000849b4f8 ffff000008ccfb40 > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > ffff000008ccfb40 ffff000008260a18 > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > ffff000008ccfb40 ffff0000084a6764 > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > [ 0.399160] Kernel panic - not syncing: Fatal exception > [ 0.404437] Rebooting in 10 seconds. > > So, I think the linear mapping done by the primary kernel does not > make these accessible in the crash kernel directly. > > Any pointers? Can you get the code line number for acpi_ns_lookup+0x25c? > > Regards, > Bhupesh > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> via a kernel command line parameter, "memmap=". > >> > _______________________________________________ > kexec mailing list -- kexec at lists.fedoraproject.org > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:16 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:16 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it to kexec@lists.infradead.org Also add linux-acpi list On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 15 December 2017 at 09:59, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >>> <takahiro.akashi@linaro.org> wrote: > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> >> <takahiro.akashi@linaro.org> wrote: > >>> >> > Bhupesh, Ard, > >>> >> > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> >> >> Hi Ard, Akashi > >>> >> >> > >>> >> > (snip) > >>> >> > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> >> >> , for details) > >>> >> > > >>> >> > Right. > >>> >> > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> >> >> with the crashkernel memory range: > >>> >> >> > >>> >> >> /* add linux,usable-memory-range */ > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> >> >> address_cells, size_cells); > >>> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> >> >> , for details) > >>> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> >> >> they are marked as System RAM or as RESERVED. As, > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> >> >> ACPI memory and crashes while trying to access the same: > >>> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> >> >> -r`.img --reuse-cmdline -d > >>> >> >> > >>> >> >> [snip..] > >>> >> >> > >>> >> >> Reserved memory range > >>> >> >> 000000000e800000-000000002e7fffff (0) > >>> >> >> > >>> >> >> Coredump memory ranges > >>> >> >> 0000000000000000-000000000e7fffff (0) > >>> >> >> 000000002e800000-000000003961ffff (0) > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >>> >> >> 000000a000000000-000000affbffffff (0) > >>> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> >> >> memory cap'ing passed to the crash kernel inside > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >>> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >>> >> >> { > >>> >> >> struct memblock_region reg = { > >>> >> >> .size = 0, > >>> >> >> }; > >>> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> >> >> > >>> >> >> if (reg.size) > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> >> >> comment this out */ > >>> >> >> } > >>> >> > > >>> >> > Please just don't do that. It can cause a fatal damage on > >>> >> > memory contents of the *crashed* kernel. > >>> >> > > >>> >> >> 5). Both the above temporary solutions fix the problem. > >>> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> >> >> fail. > >>> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> >> >> dt node 'linux,usable-memory-range' > >>> >> > > >>> >> > I still don't understand why we need to carry over the information > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> >> > such regions are free to be reused by the kernel after some point of > >>> >> > initialization. Why does crash dump kernel need to know about them? > >>> >> > > >>> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >>> >> kernel, those regions needs to be preserved, which is why they are > >>> >> memblock_reserve()'d now. > >>> > > >>> > For my better understandings, who is actually accessing such regions > >>> > during boot time, uefi itself or efistub? > >>> > > >>> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >>> instance, on QEMU we have > >>> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> 01000013) > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> BXPC 00000001) > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> BXPC 00000001) > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> BXPC 00000001) > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> BXPC 00000001) > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> BXPC 00000001) > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> BXPC 00000001) > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> BXPC 00000001) > >>> > >>> covered by > >>> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> ... > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> UEFI boot services. > >> > >>> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >>> >> when booting the next kernel. > >>> > > >>> > not really. > >>> > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >>> >> > on crash dump kernel?) > >>> >> > > >>> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >>> >> regions only revealed the bug, not created it (given that other > >>> >> memblock_reserve regions may be affected as well) > >>> > > >>> > As whether we should honor such reserved regions over kexec'ing > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >>> > As a matter of fact, no information about "reserved" memblocks is > >>> > exposed to user space (via proc/iomem). > >>> > > >>> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >>> as 'System RAM'. Do you think that could solve this? > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> marking them under another name in /proc/iomem would also be good in order > >> not to allocate them as part of crash kernel's memory. > >> > > > > I agree. However, this may not be entirely trivial, since iterating > > over the memblock_reserved table and creating iomem entries may result > > in collisions. > > I found a method (using the patch I shared earlier in this thread) to mark these > entries as 'ACPI reclaim memory' ranges rather than System RAM or > reserved regions. > > >> But I'm not still convinced that we should export them in useable- > >> memory-range to crash dump kernel. They will be accessed through > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> (or memblocks), I guess. > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > which is exactly what we want in this case. > > Now this is what is confusing me. I don't see the above happening. > > I see that the primary kernel boots up and adds the ACPI regions via: > acpi_os_ioremap > -> ioremap_cache > > But during the crashkernel boot, ''acpi_os_ioremap' calls > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > variant. > > And it fails while accessing the ACPI tables: > > [ 0.039205] ACPI: Core revision 20170728 > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > [ 0.100022] Modules linked in: > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > pstate: 60000045 > [ 0.132647] sp : ffff000008ccfb40 > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > [ 0.223224] Call trace: > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > ffff0000095e3980 ffff000008ccfbe0 > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > ffff000008ccfc50 0000000000000000 > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > 00000000ffffff76 0000000000000006 > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > 000000000000038e 0000000000000000 > [ 0.263843] fa80: 0000000000000000 0000000000000000 > 0000000000000005 000000000000001b > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > ffff000009710027 0000000000000001 > [ 0.279667] fac0: 0000000000000001 000000000000001b > 0000000000000000 ffff0000088be820 > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > ffff00000849b4f8 ffff000008ccfb40 > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > ffff000008ccfb40 ffff000008260a18 > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > ffff000008ccfb40 ffff0000084a6764 > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > [ 0.399160] Kernel panic - not syncing: Fatal exception > [ 0.404437] Rebooting in 10 seconds. > > So, I think the linear mapping done by the primary kernel does not > make these accessible in the crash kernel directly. > > Any pointers? Can you get the code line number for acpi_ns_lookup+0x25c? > > Regards, > Bhupesh > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> via a kernel command line parameter, "memmap=". > >> > _______________________________________________ > kexec mailing list -- kexec@lists.fedoraproject.org > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 5:16 ` Dave Young (?) (?) @ 2017-12-18 5:54 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 5:54 UTC (permalink / raw) To: Dave Young Cc: Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec@lists.infradead.org > > Also add linux-acpi list Thank you. > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > On 15 December 2017 at 09:59, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > > >>> <takahiro.akashi@linaro.org> wrote: > > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > > >>> >> <takahiro.akashi@linaro.org> wrote: > > >>> >> > Bhupesh, Ard, > > >>> >> > > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > >>> >> >> Hi Ard, Akashi > > >>> >> >> > > >>> >> > (snip) > > >>> >> > > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > >>> >> >> identify its own usable memory and exclude, at its boot time, any > > >>> >> >> other memory areas that are part of the panicked kernel's memory. > > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > >>> >> >> , for details) > > >>> >> > > > >>> >> > Right. > > >>> >> > > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > >>> >> >> with the crashkernel memory range: > > >>> >> >> > > >>> >> >> /* add linux,usable-memory-range */ > > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > >>> >> >> address_cells, size_cells); > > >>> >> >> > > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > >>> >> >> , for details) > > >>> >> >> > > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > >>> >> >> they are marked as System RAM or as RESERVED. As, > > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > >>> >> >> > > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > > >>> >> >> ACPI memory and crashes while trying to access the same: > > >>> >> >> > > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > >>> >> >> -r`.img --reuse-cmdline -d > > >>> >> >> > > >>> >> >> [snip..] > > >>> >> >> > > >>> >> >> Reserved memory range > > >>> >> >> 000000000e800000-000000002e7fffff (0) > > >>> >> >> > > >>> >> >> Coredump memory ranges > > >>> >> >> 0000000000000000-000000000e7fffff (0) > > >>> >> >> 000000002e800000-000000003961ffff (0) > > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > > >>> >> >> 000000003ed60000-000000003fbfffff (0) > > >>> >> >> 0000001040000000-0000001ffbffffff (0) > > >>> >> >> 0000002000000000-0000002ffbffffff (0) > > >>> >> >> 0000009000000000-0000009ffbffffff (0) > > >>> >> >> 000000a000000000-000000affbffffff (0) > > >>> >> >> > > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > >>> >> >> memory cap'ing passed to the crash kernel inside > > >>> >> >> 'arch/arm64/mm/init.c' (see below): > > >>> >> >> > > >>> >> >> static void __init fdt_enforce_memory_region(void) > > >>> >> >> { > > >>> >> >> struct memblock_region reg = { > > >>> >> >> .size = 0, > > >>> >> >> }; > > >>> >> >> > > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > >>> >> >> > > >>> >> >> if (reg.size) > > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > >>> >> >> comment this out */ > > >>> >> >> } > > >>> >> > > > >>> >> > Please just don't do that. It can cause a fatal damage on > > >>> >> > memory contents of the *crashed* kernel. > > >>> >> > > > >>> >> >> 5). Both the above temporary solutions fix the problem. > > >>> >> >> > > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > >>> >> >> fail. > > >>> >> >> > > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > >>> >> >> dt node 'linux,usable-memory-range' > > >>> >> > > > >>> >> > I still don't understand why we need to carry over the information > > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > >>> >> > such regions are free to be reused by the kernel after some point of > > >>> >> > initialization. Why does crash dump kernel need to know about them? > > >>> >> > > > >>> >> > > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > > >>> >> kernel, those regions needs to be preserved, which is why they are > > >>> >> memblock_reserve()'d now. > > >>> > > > >>> > For my better understandings, who is actually accessing such regions > > >>> > during boot time, uefi itself or efistub? > > >>> > > > >>> > > >>> No, only the kernel. This is where the ACPI tables are stored. For > > >>> instance, on QEMU we have > > >>> > > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > >>> 01000013) > > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > >>> BXPC 00000001) > > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > >>> BXPC 00000001) > > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > >>> BXPC 00000001) > > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > >>> BXPC 00000001) > > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > >>> BXPC 00000001) > > >>> > > >>> covered by > > >>> > > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > >>> ... > > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > >> > > >> OK. I mistakenly understood those regions could be freed after exiting > > >> UEFI boot services. > > >> > > >>> > > >>> >> So it seems that kexec does not honour the memblock_reserve() table > > >>> >> when booting the next kernel. > > >>> > > > >>> > not really. > > >>> > > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > > >>> >> > on crash dump kernel?) > > >>> >> > > > >>> >> > > >>> >> I don't think so. And the change to the handling of ACPI reclaim > > >>> >> regions only revealed the bug, not created it (given that other > > >>> >> memblock_reserve regions may be affected as well) > > >>> > > > >>> > As whether we should honor such reserved regions over kexec'ing > > >>> > depends on each one's specific nature, we will have to take care one-by-one. > > >>> > As a matter of fact, no information about "reserved" memblocks is > > >>> > exposed to user space (via proc/iomem). > > >>> > > > >>> > > >>> That is why I suggested (somewhere in this thread?) to not expose them > > >>> as 'System RAM'. Do you think that could solve this? > > >> > > >> Memblock-reserv'ing them is necessary to prevent their corruption and > > >> marking them under another name in /proc/iomem would also be good in order > > >> not to allocate them as part of crash kernel's memory. > > >> > > > > > > I agree. However, this may not be entirely trivial, since iterating > > > over the memblock_reserved table and creating iomem entries may result > > > in collisions. > > > > I found a method (using the patch I shared earlier in this thread) to mark these > > entries as 'ACPI reclaim memory' ranges rather than System RAM or > > reserved regions. > > > > >> But I'm not still convinced that we should export them in useable- > > >> memory-range to crash dump kernel. They will be accessed through > > >> acpi_os_map_memory() and so won't be required to be part of system ram > > >> (or memblocks), I guess. > > > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > > which is exactly what we want in this case. > > > > Now this is what is confusing me. I don't see the above happening. > > > > I see that the primary kernel boots up and adds the ACPI regions via: > > acpi_os_ioremap > > -> ioremap_cache > > > > But during the crashkernel boot, ''acpi_os_ioremap' calls > > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > > variant. It is natural if that region is out of memblocks. > > And it fails while accessing the ACPI tables: > > > > [ 0.039205] ACPI: Core revision 20170728 > > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. As ioremap() makes the mapping as "Device memory", unaligned memory access won't be allowed. > > [ 0.100022] Modules linked in: > > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > > pstate: 60000045 > > [ 0.132647] sp : ffff000008ccfb40 > > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > > [ 0.223224] Call trace: > > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > > ffff0000095e3980 ffff000008ccfbe0 > > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > > ffff000008ccfc50 0000000000000000 > > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > > 00000000ffffff76 0000000000000006 > > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > > 000000000000038e 0000000000000000 > > [ 0.263843] fa80: 0000000000000000 0000000000000000 > > 0000000000000005 000000000000001b > > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > > ffff000009710027 0000000000000001 > > [ 0.279667] fac0: 0000000000000001 000000000000001b > > 0000000000000000 ffff0000088be820 > > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > > ffff00000849b4f8 ffff000008ccfb40 > > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > > ffff000008ccfb40 ffff000008260a18 > > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > > ffff000008ccfb40 ffff0000084a6764 > > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > > [ 0.399160] Kernel panic - not syncing: Fatal exception > > [ 0.404437] Rebooting in 10 seconds. > > > > So, I think the linear mapping done by the primary kernel does not > > make these accessible in the crash kernel directly. > > > > Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? So should we always avoid ioremap() in acpi_os_ioremap() entirely, or modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned accesses? (I didn't find out how unaligned accesses could happen there.) Thanks, -Takahiro AKASHI > > > > Regards, > > Bhupesh > > > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > >> via a kernel command line parameter, "memmap=". > > >> > > _______________________________________________ > > kexec mailing list -- kexec@lists.fedoraproject.org > > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:54 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 5:54 UTC (permalink / raw) To: Dave Young Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, Bhupesh Sharma, kexec, linux-kernel, linux-acpi, James Morse, Bhupesh SHARMA, linux-arm-kernel On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec@lists.infradead.org > > Also add linux-acpi list Thank you. > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > On 15 December 2017 at 09:59, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > > >>> <takahiro.akashi@linaro.org> wrote: > > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > > >>> >> <takahiro.akashi@linaro.org> wrote: > > >>> >> > Bhupesh, Ard, > > >>> >> > > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > >>> >> >> Hi Ard, Akashi > > >>> >> >> > > >>> >> > (snip) > > >>> >> > > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > >>> >> >> identify its own usable memory and exclude, at its boot time, any > > >>> >> >> other memory areas that are part of the panicked kernel's memory. > > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > >>> >> >> , for details) > > >>> >> > > > >>> >> > Right. > > >>> >> > > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > >>> >> >> with the crashkernel memory range: > > >>> >> >> > > >>> >> >> /* add linux,usable-memory-range */ > > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > >>> >> >> address_cells, size_cells); > > >>> >> >> > > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > >>> >> >> , for details) > > >>> >> >> > > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > >>> >> >> they are marked as System RAM or as RESERVED. As, > > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > >>> >> >> > > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > > >>> >> >> ACPI memory and crashes while trying to access the same: > > >>> >> >> > > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > >>> >> >> -r`.img --reuse-cmdline -d > > >>> >> >> > > >>> >> >> [snip..] > > >>> >> >> > > >>> >> >> Reserved memory range > > >>> >> >> 000000000e800000-000000002e7fffff (0) > > >>> >> >> > > >>> >> >> Coredump memory ranges > > >>> >> >> 0000000000000000-000000000e7fffff (0) > > >>> >> >> 000000002e800000-000000003961ffff (0) > > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > > >>> >> >> 000000003ed60000-000000003fbfffff (0) > > >>> >> >> 0000001040000000-0000001ffbffffff (0) > > >>> >> >> 0000002000000000-0000002ffbffffff (0) > > >>> >> >> 0000009000000000-0000009ffbffffff (0) > > >>> >> >> 000000a000000000-000000affbffffff (0) > > >>> >> >> > > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > >>> >> >> memory cap'ing passed to the crash kernel inside > > >>> >> >> 'arch/arm64/mm/init.c' (see below): > > >>> >> >> > > >>> >> >> static void __init fdt_enforce_memory_region(void) > > >>> >> >> { > > >>> >> >> struct memblock_region reg = { > > >>> >> >> .size = 0, > > >>> >> >> }; > > >>> >> >> > > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > >>> >> >> > > >>> >> >> if (reg.size) > > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > >>> >> >> comment this out */ > > >>> >> >> } > > >>> >> > > > >>> >> > Please just don't do that. It can cause a fatal damage on > > >>> >> > memory contents of the *crashed* kernel. > > >>> >> > > > >>> >> >> 5). Both the above temporary solutions fix the problem. > > >>> >> >> > > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > >>> >> >> fail. > > >>> >> >> > > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > >>> >> >> dt node 'linux,usable-memory-range' > > >>> >> > > > >>> >> > I still don't understand why we need to carry over the information > > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > >>> >> > such regions are free to be reused by the kernel after some point of > > >>> >> > initialization. Why does crash dump kernel need to know about them? > > >>> >> > > > >>> >> > > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > > >>> >> kernel, those regions needs to be preserved, which is why they are > > >>> >> memblock_reserve()'d now. > > >>> > > > >>> > For my better understandings, who is actually accessing such regions > > >>> > during boot time, uefi itself or efistub? > > >>> > > > >>> > > >>> No, only the kernel. This is where the ACPI tables are stored. For > > >>> instance, on QEMU we have > > >>> > > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > >>> 01000013) > > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > >>> BXPC 00000001) > > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > >>> BXPC 00000001) > > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > >>> BXPC 00000001) > > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > >>> BXPC 00000001) > > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > >>> BXPC 00000001) > > >>> > > >>> covered by > > >>> > > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > >>> ... > > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > >> > > >> OK. I mistakenly understood those regions could be freed after exiting > > >> UEFI boot services. > > >> > > >>> > > >>> >> So it seems that kexec does not honour the memblock_reserve() table > > >>> >> when booting the next kernel. > > >>> > > > >>> > not really. > > >>> > > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > > >>> >> > on crash dump kernel?) > > >>> >> > > > >>> >> > > >>> >> I don't think so. And the change to the handling of ACPI reclaim > > >>> >> regions only revealed the bug, not created it (given that other > > >>> >> memblock_reserve regions may be affected as well) > > >>> > > > >>> > As whether we should honor such reserved regions over kexec'ing > > >>> > depends on each one's specific nature, we will have to take care one-by-one. > > >>> > As a matter of fact, no information about "reserved" memblocks is > > >>> > exposed to user space (via proc/iomem). > > >>> > > > >>> > > >>> That is why I suggested (somewhere in this thread?) to not expose them > > >>> as 'System RAM'. Do you think that could solve this? > > >> > > >> Memblock-reserv'ing them is necessary to prevent their corruption and > > >> marking them under another name in /proc/iomem would also be good in order > > >> not to allocate them as part of crash kernel's memory. > > >> > > > > > > I agree. However, this may not be entirely trivial, since iterating > > > over the memblock_reserved table and creating iomem entries may result > > > in collisions. > > > > I found a method (using the patch I shared earlier in this thread) to mark these > > entries as 'ACPI reclaim memory' ranges rather than System RAM or > > reserved regions. > > > > >> But I'm not still convinced that we should export them in useable- > > >> memory-range to crash dump kernel. They will be accessed through > > >> acpi_os_map_memory() and so won't be required to be part of system ram > > >> (or memblocks), I guess. > > > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > > which is exactly what we want in this case. > > > > Now this is what is confusing me. I don't see the above happening. > > > > I see that the primary kernel boots up and adds the ACPI regions via: > > acpi_os_ioremap > > -> ioremap_cache > > > > But during the crashkernel boot, ''acpi_os_ioremap' calls > > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > > variant. It is natural if that region is out of memblocks. > > And it fails while accessing the ACPI tables: > > > > [ 0.039205] ACPI: Core revision 20170728 > > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. As ioremap() makes the mapping as "Device memory", unaligned memory access won't be allowed. > > [ 0.100022] Modules linked in: > > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > > pstate: 60000045 > > [ 0.132647] sp : ffff000008ccfb40 > > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > > [ 0.223224] Call trace: > > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > > ffff0000095e3980 ffff000008ccfbe0 > > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > > ffff000008ccfc50 0000000000000000 > > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > > 00000000ffffff76 0000000000000006 > > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > > 000000000000038e 0000000000000000 > > [ 0.263843] fa80: 0000000000000000 0000000000000000 > > 0000000000000005 000000000000001b > > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > > ffff000009710027 0000000000000001 > > [ 0.279667] fac0: 0000000000000001 000000000000001b > > 0000000000000000 ffff0000088be820 > > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > > ffff00000849b4f8 ffff000008ccfb40 > > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > > ffff000008ccfb40 ffff000008260a18 > > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > > ffff000008ccfb40 ffff0000084a6764 > > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > > [ 0.399160] Kernel panic - not syncing: Fatal exception > > [ 0.404437] Rebooting in 10 seconds. > > > > So, I think the linear mapping done by the primary kernel does not > > make these accessible in the crash kernel directly. > > > > Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? So should we always avoid ioremap() in acpi_os_ioremap() entirely, or modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned accesses? (I didn't find out how unaligned accesses could happen there.) Thanks, -Takahiro AKASHI > > > > Regards, > > Bhupesh > > > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > >> via a kernel command line parameter, "memmap=". > > >> > > _______________________________________________ > > kexec mailing list -- kexec@lists.fedoraproject.org > > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:54 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 5:54 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec at lists.infradead.org > > Also add linux-acpi list Thank you. > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > On 15 December 2017 at 09:59, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > > >>> <takahiro.akashi@linaro.org> wrote: > > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > > >>> >> <takahiro.akashi@linaro.org> wrote: > > >>> >> > Bhupesh, Ard, > > >>> >> > > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > >>> >> >> Hi Ard, Akashi > > >>> >> >> > > >>> >> > (snip) > > >>> >> > > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > >>> >> >> identify its own usable memory and exclude, at its boot time, any > > >>> >> >> other memory areas that are part of the panicked kernel's memory. > > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > >>> >> >> , for details) > > >>> >> > > > >>> >> > Right. > > >>> >> > > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > >>> >> >> with the crashkernel memory range: > > >>> >> >> > > >>> >> >> /* add linux,usable-memory-range */ > > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > >>> >> >> address_cells, size_cells); > > >>> >> >> > > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > >>> >> >> , for details) > > >>> >> >> > > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > >>> >> >> they are marked as System RAM or as RESERVED. As, > > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > >>> >> >> > > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > > >>> >> >> ACPI memory and crashes while trying to access the same: > > >>> >> >> > > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > >>> >> >> -r`.img --reuse-cmdline -d > > >>> >> >> > > >>> >> >> [snip..] > > >>> >> >> > > >>> >> >> Reserved memory range > > >>> >> >> 000000000e800000-000000002e7fffff (0) > > >>> >> >> > > >>> >> >> Coredump memory ranges > > >>> >> >> 0000000000000000-000000000e7fffff (0) > > >>> >> >> 000000002e800000-000000003961ffff (0) > > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > > >>> >> >> 000000003ed60000-000000003fbfffff (0) > > >>> >> >> 0000001040000000-0000001ffbffffff (0) > > >>> >> >> 0000002000000000-0000002ffbffffff (0) > > >>> >> >> 0000009000000000-0000009ffbffffff (0) > > >>> >> >> 000000a000000000-000000affbffffff (0) > > >>> >> >> > > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > >>> >> >> memory cap'ing passed to the crash kernel inside > > >>> >> >> 'arch/arm64/mm/init.c' (see below): > > >>> >> >> > > >>> >> >> static void __init fdt_enforce_memory_region(void) > > >>> >> >> { > > >>> >> >> struct memblock_region reg = { > > >>> >> >> .size = 0, > > >>> >> >> }; > > >>> >> >> > > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > >>> >> >> > > >>> >> >> if (reg.size) > > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > >>> >> >> comment this out */ > > >>> >> >> } > > >>> >> > > > >>> >> > Please just don't do that. It can cause a fatal damage on > > >>> >> > memory contents of the *crashed* kernel. > > >>> >> > > > >>> >> >> 5). Both the above temporary solutions fix the problem. > > >>> >> >> > > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > >>> >> >> fail. > > >>> >> >> > > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > >>> >> >> dt node 'linux,usable-memory-range' > > >>> >> > > > >>> >> > I still don't understand why we need to carry over the information > > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > >>> >> > such regions are free to be reused by the kernel after some point of > > >>> >> > initialization. Why does crash dump kernel need to know about them? > > >>> >> > > > >>> >> > > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > > >>> >> kernel, those regions needs to be preserved, which is why they are > > >>> >> memblock_reserve()'d now. > > >>> > > > >>> > For my better understandings, who is actually accessing such regions > > >>> > during boot time, uefi itself or efistub? > > >>> > > > >>> > > >>> No, only the kernel. This is where the ACPI tables are stored. For > > >>> instance, on QEMU we have > > >>> > > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > >>> 01000013) > > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > >>> BXPC 00000001) > > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > >>> BXPC 00000001) > > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > >>> BXPC 00000001) > > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > >>> BXPC 00000001) > > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > >>> BXPC 00000001) > > >>> > > >>> covered by > > >>> > > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > >>> ... > > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > >> > > >> OK. I mistakenly understood those regions could be freed after exiting > > >> UEFI boot services. > > >> > > >>> > > >>> >> So it seems that kexec does not honour the memblock_reserve() table > > >>> >> when booting the next kernel. > > >>> > > > >>> > not really. > > >>> > > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > > >>> >> > on crash dump kernel?) > > >>> >> > > > >>> >> > > >>> >> I don't think so. And the change to the handling of ACPI reclaim > > >>> >> regions only revealed the bug, not created it (given that other > > >>> >> memblock_reserve regions may be affected as well) > > >>> > > > >>> > As whether we should honor such reserved regions over kexec'ing > > >>> > depends on each one's specific nature, we will have to take care one-by-one. > > >>> > As a matter of fact, no information about "reserved" memblocks is > > >>> > exposed to user space (via proc/iomem). > > >>> > > > >>> > > >>> That is why I suggested (somewhere in this thread?) to not expose them > > >>> as 'System RAM'. Do you think that could solve this? > > >> > > >> Memblock-reserv'ing them is necessary to prevent their corruption and > > >> marking them under another name in /proc/iomem would also be good in order > > >> not to allocate them as part of crash kernel's memory. > > >> > > > > > > I agree. However, this may not be entirely trivial, since iterating > > > over the memblock_reserved table and creating iomem entries may result > > > in collisions. > > > > I found a method (using the patch I shared earlier in this thread) to mark these > > entries as 'ACPI reclaim memory' ranges rather than System RAM or > > reserved regions. > > > > >> But I'm not still convinced that we should export them in useable- > > >> memory-range to crash dump kernel. They will be accessed through > > >> acpi_os_map_memory() and so won't be required to be part of system ram > > >> (or memblocks), I guess. > > > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > > which is exactly what we want in this case. > > > > Now this is what is confusing me. I don't see the above happening. > > > > I see that the primary kernel boots up and adds the ACPI regions via: > > acpi_os_ioremap > > -> ioremap_cache > > > > But during the crashkernel boot, ''acpi_os_ioremap' calls > > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > > variant. It is natural if that region is out of memblocks. > > And it fails while accessing the ACPI tables: > > > > [ 0.039205] ACPI: Core revision 20170728 > > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. As ioremap() makes the mapping as "Device memory", unaligned memory access won't be allowed. > > [ 0.100022] Modules linked in: > > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > > pstate: 60000045 > > [ 0.132647] sp : ffff000008ccfb40 > > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > > [ 0.223224] Call trace: > > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > > ffff0000095e3980 ffff000008ccfbe0 > > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > > ffff000008ccfc50 0000000000000000 > > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > > 00000000ffffff76 0000000000000006 > > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > > 000000000000038e 0000000000000000 > > [ 0.263843] fa80: 0000000000000000 0000000000000000 > > 0000000000000005 000000000000001b > > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > > ffff000009710027 0000000000000001 > > [ 0.279667] fac0: 0000000000000001 000000000000001b > > 0000000000000000 ffff0000088be820 > > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > > ffff00000849b4f8 ffff000008ccfb40 > > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > > ffff000008ccfb40 ffff000008260a18 > > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > > ffff000008ccfb40 ffff0000084a6764 > > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > > [ 0.399160] Kernel panic - not syncing: Fatal exception > > [ 0.404437] Rebooting in 10 seconds. > > > > So, I think the linear mapping done by the primary kernel does not > > make these accessible in the crash kernel directly. > > > > Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? So should we always avoid ioremap() in acpi_os_ioremap() entirely, or modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned accesses? (I didn't find out how unaligned accesses could happen there.) Thanks, -Takahiro AKASHI > > > > Regards, > > Bhupesh > > > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > >> via a kernel command line parameter, "memmap=". > > >> > > _______________________________________________ > > kexec mailing list -- kexec at lists.fedoraproject.org > > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:54 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 5:54 UTC (permalink / raw) To: Dave Young Cc: Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec@lists.infradead.org > > Also add linux-acpi list Thank you. > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > On 15 December 2017 at 09:59, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > > >>> <takahiro.akashi@linaro.org> wrote: > > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > > >>> >> <takahiro.akashi@linaro.org> wrote: > > >>> >> > Bhupesh, Ard, > > >>> >> > > > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > >>> >> >> Hi Ard, Akashi > > >>> >> >> > > >>> >> > (snip) > > >>> >> > > > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > >>> >> >> identify its own usable memory and exclude, at its boot time, any > > >>> >> >> other memory areas that are part of the panicked kernel's memory. > > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > >>> >> >> , for details) > > >>> >> > > > >>> >> > Right. > > >>> >> > > > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > >>> >> >> with the crashkernel memory range: > > >>> >> >> > > >>> >> >> /* add linux,usable-memory-range */ > > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > >>> >> >> address_cells, size_cells); > > >>> >> >> > > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > >>> >> >> , for details) > > >>> >> >> > > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > >>> >> >> they are marked as System RAM or as RESERVED. As, > > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > >>> >> >> > > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > > >>> >> >> ACPI memory and crashes while trying to access the same: > > >>> >> >> > > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > >>> >> >> -r`.img --reuse-cmdline -d > > >>> >> >> > > >>> >> >> [snip..] > > >>> >> >> > > >>> >> >> Reserved memory range > > >>> >> >> 000000000e800000-000000002e7fffff (0) > > >>> >> >> > > >>> >> >> Coredump memory ranges > > >>> >> >> 0000000000000000-000000000e7fffff (0) > > >>> >> >> 000000002e800000-000000003961ffff (0) > > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > > >>> >> >> 000000003ed60000-000000003fbfffff (0) > > >>> >> >> 0000001040000000-0000001ffbffffff (0) > > >>> >> >> 0000002000000000-0000002ffbffffff (0) > > >>> >> >> 0000009000000000-0000009ffbffffff (0) > > >>> >> >> 000000a000000000-000000affbffffff (0) > > >>> >> >> > > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > >>> >> >> memory cap'ing passed to the crash kernel inside > > >>> >> >> 'arch/arm64/mm/init.c' (see below): > > >>> >> >> > > >>> >> >> static void __init fdt_enforce_memory_region(void) > > >>> >> >> { > > >>> >> >> struct memblock_region reg = { > > >>> >> >> .size = 0, > > >>> >> >> }; > > >>> >> >> > > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > >>> >> >> > > >>> >> >> if (reg.size) > > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > >>> >> >> comment this out */ > > >>> >> >> } > > >>> >> > > > >>> >> > Please just don't do that. It can cause a fatal damage on > > >>> >> > memory contents of the *crashed* kernel. > > >>> >> > > > >>> >> >> 5). Both the above temporary solutions fix the problem. > > >>> >> >> > > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > >>> >> >> fail. > > >>> >> >> > > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > >>> >> >> dt node 'linux,usable-memory-range' > > >>> >> > > > >>> >> > I still don't understand why we need to carry over the information > > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > >>> >> > such regions are free to be reused by the kernel after some point of > > >>> >> > initialization. Why does crash dump kernel need to know about them? > > >>> >> > > > >>> >> > > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > > >>> >> kernel, those regions needs to be preserved, which is why they are > > >>> >> memblock_reserve()'d now. > > >>> > > > >>> > For my better understandings, who is actually accessing such regions > > >>> > during boot time, uefi itself or efistub? > > >>> > > > >>> > > >>> No, only the kernel. This is where the ACPI tables are stored. For > > >>> instance, on QEMU we have > > >>> > > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > >>> 01000013) > > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > >>> BXPC 00000001) > > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > >>> BXPC 00000001) > > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > >>> BXPC 00000001) > > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > >>> BXPC 00000001) > > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > >>> BXPC 00000001) > > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > >>> BXPC 00000001) > > >>> > > >>> covered by > > >>> > > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > >>> ... > > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > >> > > >> OK. I mistakenly understood those regions could be freed after exiting > > >> UEFI boot services. > > >> > > >>> > > >>> >> So it seems that kexec does not honour the memblock_reserve() table > > >>> >> when booting the next kernel. > > >>> > > > >>> > not really. > > >>> > > > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > > >>> >> > on crash dump kernel?) > > >>> >> > > > >>> >> > > >>> >> I don't think so. And the change to the handling of ACPI reclaim > > >>> >> regions only revealed the bug, not created it (given that other > > >>> >> memblock_reserve regions may be affected as well) > > >>> > > > >>> > As whether we should honor such reserved regions over kexec'ing > > >>> > depends on each one's specific nature, we will have to take care one-by-one. > > >>> > As a matter of fact, no information about "reserved" memblocks is > > >>> > exposed to user space (via proc/iomem). > > >>> > > > >>> > > >>> That is why I suggested (somewhere in this thread?) to not expose them > > >>> as 'System RAM'. Do you think that could solve this? > > >> > > >> Memblock-reserv'ing them is necessary to prevent their corruption and > > >> marking them under another name in /proc/iomem would also be good in order > > >> not to allocate them as part of crash kernel's memory. > > >> > > > > > > I agree. However, this may not be entirely trivial, since iterating > > > over the memblock_reserved table and creating iomem entries may result > > > in collisions. > > > > I found a method (using the patch I shared earlier in this thread) to mark these > > entries as 'ACPI reclaim memory' ranges rather than System RAM or > > reserved regions. > > > > >> But I'm not still convinced that we should export them in useable- > > >> memory-range to crash dump kernel. They will be accessed through > > >> acpi_os_map_memory() and so won't be required to be part of system ram > > >> (or memblocks), I guess. > > > > > > Agreed. They will be covered by the linear mapping in the boot kernel, > > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > > > which is exactly what we want in this case. > > > > Now this is what is confusing me. I don't see the above happening. > > > > I see that the primary kernel boots up and adds the ACPI regions via: > > acpi_os_ioremap > > -> ioremap_cache > > > > But during the crashkernel boot, ''acpi_os_ioremap' calls > > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > > variant. It is natural if that region is out of memblocks. > > And it fails while accessing the ACPI tables: > > > > [ 0.039205] ACPI: Core revision 20170728 > > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. As ioremap() makes the mapping as "Device memory", unaligned memory access won't be allowed. > > [ 0.100022] Modules linked in: > > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > > pstate: 60000045 > > [ 0.132647] sp : ffff000008ccfb40 > > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > > [ 0.223224] Call trace: > > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > > ffff0000095e3980 ffff000008ccfbe0 > > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > > ffff000008ccfc50 0000000000000000 > > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > > 00000000ffffff76 0000000000000006 > > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > > 000000000000038e 0000000000000000 > > [ 0.263843] fa80: 0000000000000000 0000000000000000 > > 0000000000000005 000000000000001b > > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > > ffff000009710027 0000000000000001 > > [ 0.279667] fac0: 0000000000000001 000000000000001b > > 0000000000000000 ffff0000088be820 > > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > > ffff00000849b4f8 ffff000008ccfb40 > > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > > ffff000008ccfb40 ffff000008260a18 > > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > > ffff000008ccfb40 ffff0000084a6764 > > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > > [ 0.399160] Kernel panic - not syncing: Fatal exception > > [ 0.404437] Rebooting in 10 seconds. > > > > So, I think the linear mapping done by the primary kernel does not > > make these accessible in the crash kernel directly. > > > > Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? So should we always avoid ioremap() in acpi_os_ioremap() entirely, or modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned accesses? (I didn't find out how unaligned accesses could happen there.) Thanks, -Takahiro AKASHI > > > > Regards, > > Bhupesh > > > > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > >> via a kernel command line parameter, "memmap=". > > >> > > _______________________________________________ > > kexec mailing list -- kexec@lists.fedoraproject.org > > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 5:54 ` AKASHI Takahiro (?) (?) @ 2017-12-18 8:59 ` Bhupesh SHARMA -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-12-18 8:59 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse, Bhupesh SHARMA, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it >> to kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org >> >> Also add linux-acpi list > > Thank you. > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> > <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> > >>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> > >>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > >>> >> > Bhupesh, Ard, >> > >>> >> > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> > >>> >> >> Hi Ard, Akashi >> > >>> >> >> >> > >>> >> > (snip) >> > >>> >> > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> > >>> >> >> , for details) >> > >>> >> > >> > >>> >> > Right. >> > >>> >> > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> > >>> >> >> with the crashkernel memory range: >> > >>> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> > >>> >> >> address_cells, size_cells); >> > >>> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> > >>> >> >> , for details) >> > >>> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> > >>> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> > >>> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> > >>> >> >> -r`.img --reuse-cmdline -d >> > >>> >> >> >> > >>> >> >> [snip..] >> > >>> >> >> >> > >>> >> >> Reserved memory range >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> > >>> >> >> >> > >>> >> >> Coredump memory ranges >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> > >>> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> > >>> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> > >>> >> >> { >> > >>> >> >> struct memblock_region reg = { >> > >>> >> >> .size = 0, >> > >>> >> >> }; >> > >>> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >>> >> >> >> > >>> >> >> if (reg.size) >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> > >>> >> >> comment this out */ >> > >>> >> >> } >> > >>> >> > >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> > >>> >> > memory contents of the *crashed* kernel. >> > >>> >> > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> > >>> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> > >>> >> >> fail. >> > >>> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> > >>> >> >> dt node 'linux,usable-memory-range' >> > >>> >> > >> > >>> >> > I still don't understand why we need to carry over the information >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > >>> >> > such regions are free to be reused by the kernel after some point of >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> > >>> >> > >> > >>> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> > >>> >> memblock_reserve()'d now. >> > >>> > >> > >>> > For my better understandings, who is actually accessing such regions >> > >>> > during boot time, uefi itself or efistub? >> > >>> > >> > >>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> > >>> instance, on QEMU we have >> > >>> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> > >>> 01000013) >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> > >>> BXPC 00000001) >> > >>> >> > >>> covered by >> > >>> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> > >>> ... >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> > >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> > >> UEFI boot services. >> > >> >> > >>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> > >>> >> when booting the next kernel. >> > >>> > >> > >>> > not really. >> > >>> > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> > >>> >> > on crash dump kernel?) >> > >>> >> > >> > >>> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> > >>> >> regions only revealed the bug, not created it (given that other >> > >>> >> memblock_reserve regions may be affected as well) >> > >>> > >> > >>> > As whether we should honor such reserved regions over kexec'ing >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> > >>> > exposed to user space (via proc/iomem). >> > >>> > >> > >>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> > >>> as 'System RAM'. Do you think that could solve this? >> > >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> > >> marking them under another name in /proc/iomem would also be good in order >> > >> not to allocate them as part of crash kernel's memory. >> > >> >> > > >> > > I agree. However, this may not be entirely trivial, since iterating >> > > over the memblock_reserved table and creating iomem entries may result >> > > in collisions. >> > >> > I found a method (using the patch I shared earlier in this thread) to mark these >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> > reserved regions. >> > >> > >> But I'm not still convinced that we should export them in useable- >> > >> memory-range to crash dump kernel. They will be accessed through >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> > >> (or memblocks), I guess. >> > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > > which is exactly what we want in this case. >> > >> > Now this is what is confusing me. I don't see the above happening. >> > >> > I see that the primary kernel boots up and adds the ACPI regions via: >> > acpi_os_ioremap >> > -> ioremap_cache >> > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> > variant. > > It is natural if that region is out of memblocks. Thanks for the confirmation. This was my understanding as well. >> > And it fails while accessing the ACPI tables: >> > >> > [ 0.039205] ACPI: Core revision 20170728 >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > As ioremap() makes the mapping as "Device memory", unaligned memory > access won't be allowed. > >> > [ 0.100022] Modules linked in: >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> > pstate: 60000045 >> > [ 0.132647] sp : ffff000008ccfb40 >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> > [ 0.223224] Call trace: >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> > ffff0000095e3980 ffff000008ccfbe0 >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> > ffff000008ccfc50 0000000000000000 >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> > 00000000ffffff76 0000000000000006 >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> > 000000000000038e 0000000000000000 >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> > 0000000000000005 000000000000001b >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> > ffff000009710027 0000000000000001 >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> > 0000000000000000 ffff0000088be820 >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> > ffff00000849b4f8 ffff000008ccfb40 >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> > ffff000008ccfb40 ffff000008260a18 >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> > ffff000008ccfb40 ffff0000084a6764 >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> > [ 0.404437] Rebooting in 10 seconds. >> > >> > So, I think the linear mapping done by the primary kernel does not >> > make these accessible in the crash kernel directly. >> > >> > Any pointers? >> >> Can you get the code line number for acpi_ns_lookup+0x25c? > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > accesses? > (I didn't find out how unaligned accesses could happen there.) > Right. Like I captured somewhere in this thread (perhaps the first email on this subject), this is indeed an unaligned address access. Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding assigning this memory range as device memory doesn't seem a neat solution as it means we are not marking some thing with the right memory attribute and we can fall in similar/related issues later. Regarding the later suggestion, what I am seeing now is that the acpi table access functions are perhaps reused from the earlier x86 implementation, but on the arm64 (or even arm) arch we should not be allowing unaligned accesses which might cause UNDEFINED behaviour and resultant crash. So I can try going this approach and see if it works for me. However, I am still not very sure as to why the crashkernel ranges historically do not include the System RAM regions (which may include the ACPI regions as well). These regions are available for the kernel usage and perhaps should be exported to the crashkernel as well. I am not fully aware of the previous discussions on capp'ing the crashkernel memory being passed to the kdump kernel, but did we run into any issues while doing so? Also, even if I extend the kexec-tools to modify the linux,usable-memory-range and add the ACPI regions to it, the crashkernel fails to boot with the below message (I have added some logic to print the DTB on the crash kernel boot start): [ 0.000000] chosen { [ 0.000000] linux,usable-memory-range [ 0.000000] = < [ 0.000000] 0x00000000 [ 0.000000] 0x0e800000 [ 0.000000] 0x00000000 [ 0.000000] 0x20000000 [ 0.000000] 0x00000000 [ 0.000000] 0x396c0000 [ 0.000000] 0x00000000 [ 0.000000] 0x000a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x39770000 [ 0.000000] 0x00000000 [ 0.000000] 0x00040000 [ 0.000000] 0x00000000 [ 0.000000] 0x398a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x00020000 [ 0.000000] > [ 0.000000] ; [snip..] [ 0.000000] linux,usable-memory-range base e800000, size 20000000 [ 0.000000] - e800000 , 20000000 [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 [ 0.000000] - 396c0000 , a0000 [ 0.000000] linux,usable-memory-range base 39770000, size 40000 [ 0.000000] - 39770000 , 40000 [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 [ 0.000000] - 398a0000 , 20000 [ 0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ... [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 arm64_memblock_init+0x210/0x484 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] pstate: 600000c5 [ 0.000000] sp : ffff000008ccfe80 [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 [ 0.000000] Call trace: [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) [ 0.000000] fd40: 0000000000000056 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] fd60: 0000000000000001 ffff000008c96360 000000000000000d 746f6f622072756f [ 0.000000] fd80: ffff000008517414 00000000000000f4 2065687420616976 6d207261656e696c [ 0.000000] fda0: 2d20676e69707061 657361656c70202d 79206b6365686320 000000002be00842 [ 0.000000] fdc0: ffff000008d05580 0000000000000000 000000000c283806 ffff000008afa000 [ 0.000000] fde0: ffff000008080000 ffff000008afa000 ffff000009680000 ffff000008ec0000 [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 00000000013b0000 0000000011230000 [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 ffff000008b76984 ffff000008ccfe80 [ 0.000000] fe40: ffff000008b76984 00000000600000c5 ffff00000959b7a8 ffff000008ec0000 [ 0.000000] fe60: ffffffffffffffff 0000000000000005 ffff000008ccfe80 ffff000008b76984 [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x50/0x6c with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr [ 0.000000] cma: Failed to reserve 512 MiB [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W ------------ 4.14.0+ #7 [ 0.000000] Call trace: [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] I guess it is because of the 1G alignment requirement between the kernel image and the initrd and how we populate the holes between the kernel image, segments (including dtb) and the initrd from the kexec-tools. Akashi, any pointers on this will be helpful as well. Regards, Bhupesh >> > >> > Regards, >> > Bhupesh >> > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> > >> via a kernel command line parameter, "memmap=". >> > >> >> > _______________________________________________ >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 8:59 ` Bhupesh SHARMA 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-12-18 8:59 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it >> to kexec@lists.infradead.org >> >> Also add linux-acpi list > > Thank you. > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> > <ard.biesheuvel@linaro.org> wrote: >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> > > <takahiro.akashi@linaro.org> wrote: >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> > >>> <takahiro.akashi@linaro.org> wrote: >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> > >>> >> > Bhupesh, Ard, >> > >>> >> > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> > >>> >> >> Hi Ard, Akashi >> > >>> >> >> >> > >>> >> > (snip) >> > >>> >> > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> > >>> >> >> , for details) >> > >>> >> > >> > >>> >> > Right. >> > >>> >> > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> > >>> >> >> with the crashkernel memory range: >> > >>> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> > >>> >> >> address_cells, size_cells); >> > >>> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> > >>> >> >> , for details) >> > >>> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> > >>> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> > >>> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> > >>> >> >> -r`.img --reuse-cmdline -d >> > >>> >> >> >> > >>> >> >> [snip..] >> > >>> >> >> >> > >>> >> >> Reserved memory range >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> > >>> >> >> >> > >>> >> >> Coredump memory ranges >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> > >>> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> > >>> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> > >>> >> >> { >> > >>> >> >> struct memblock_region reg = { >> > >>> >> >> .size = 0, >> > >>> >> >> }; >> > >>> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >>> >> >> >> > >>> >> >> if (reg.size) >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> > >>> >> >> comment this out */ >> > >>> >> >> } >> > >>> >> > >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> > >>> >> > memory contents of the *crashed* kernel. >> > >>> >> > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> > >>> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> > >>> >> >> fail. >> > >>> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> > >>> >> >> dt node 'linux,usable-memory-range' >> > >>> >> > >> > >>> >> > I still don't understand why we need to carry over the information >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > >>> >> > such regions are free to be reused by the kernel after some point of >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> > >>> >> > >> > >>> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> > >>> >> memblock_reserve()'d now. >> > >>> > >> > >>> > For my better understandings, who is actually accessing such regions >> > >>> > during boot time, uefi itself or efistub? >> > >>> > >> > >>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> > >>> instance, on QEMU we have >> > >>> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> > >>> 01000013) >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> > >>> BXPC 00000001) >> > >>> >> > >>> covered by >> > >>> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> > >>> ... >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> > >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> > >> UEFI boot services. >> > >> >> > >>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> > >>> >> when booting the next kernel. >> > >>> > >> > >>> > not really. >> > >>> > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> > >>> >> > on crash dump kernel?) >> > >>> >> > >> > >>> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> > >>> >> regions only revealed the bug, not created it (given that other >> > >>> >> memblock_reserve regions may be affected as well) >> > >>> > >> > >>> > As whether we should honor such reserved regions over kexec'ing >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> > >>> > exposed to user space (via proc/iomem). >> > >>> > >> > >>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> > >>> as 'System RAM'. Do you think that could solve this? >> > >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> > >> marking them under another name in /proc/iomem would also be good in order >> > >> not to allocate them as part of crash kernel's memory. >> > >> >> > > >> > > I agree. However, this may not be entirely trivial, since iterating >> > > over the memblock_reserved table and creating iomem entries may result >> > > in collisions. >> > >> > I found a method (using the patch I shared earlier in this thread) to mark these >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> > reserved regions. >> > >> > >> But I'm not still convinced that we should export them in useable- >> > >> memory-range to crash dump kernel. They will be accessed through >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> > >> (or memblocks), I guess. >> > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > > which is exactly what we want in this case. >> > >> > Now this is what is confusing me. I don't see the above happening. >> > >> > I see that the primary kernel boots up and adds the ACPI regions via: >> > acpi_os_ioremap >> > -> ioremap_cache >> > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> > variant. > > It is natural if that region is out of memblocks. Thanks for the confirmation. This was my understanding as well. >> > And it fails while accessing the ACPI tables: >> > >> > [ 0.039205] ACPI: Core revision 20170728 >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > As ioremap() makes the mapping as "Device memory", unaligned memory > access won't be allowed. > >> > [ 0.100022] Modules linked in: >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> > pstate: 60000045 >> > [ 0.132647] sp : ffff000008ccfb40 >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> > [ 0.223224] Call trace: >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> > ffff0000095e3980 ffff000008ccfbe0 >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> > ffff000008ccfc50 0000000000000000 >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> > 00000000ffffff76 0000000000000006 >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> > 000000000000038e 0000000000000000 >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> > 0000000000000005 000000000000001b >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> > ffff000009710027 0000000000000001 >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> > 0000000000000000 ffff0000088be820 >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> > ffff00000849b4f8 ffff000008ccfb40 >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> > ffff000008ccfb40 ffff000008260a18 >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> > ffff000008ccfb40 ffff0000084a6764 >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> > [ 0.404437] Rebooting in 10 seconds. >> > >> > So, I think the linear mapping done by the primary kernel does not >> > make these accessible in the crash kernel directly. >> > >> > Any pointers? >> >> Can you get the code line number for acpi_ns_lookup+0x25c? > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > accesses? > (I didn't find out how unaligned accesses could happen there.) > Right. Like I captured somewhere in this thread (perhaps the first email on this subject), this is indeed an unaligned address access. Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding assigning this memory range as device memory doesn't seem a neat solution as it means we are not marking some thing with the right memory attribute and we can fall in similar/related issues later. Regarding the later suggestion, what I am seeing now is that the acpi table access functions are perhaps reused from the earlier x86 implementation, but on the arm64 (or even arm) arch we should not be allowing unaligned accesses which might cause UNDEFINED behaviour and resultant crash. So I can try going this approach and see if it works for me. However, I am still not very sure as to why the crashkernel ranges historically do not include the System RAM regions (which may include the ACPI regions as well). These regions are available for the kernel usage and perhaps should be exported to the crashkernel as well. I am not fully aware of the previous discussions on capp'ing the crashkernel memory being passed to the kdump kernel, but did we run into any issues while doing so? Also, even if I extend the kexec-tools to modify the linux,usable-memory-range and add the ACPI regions to it, the crashkernel fails to boot with the below message (I have added some logic to print the DTB on the crash kernel boot start): [ 0.000000] chosen { [ 0.000000] linux,usable-memory-range [ 0.000000] = < [ 0.000000] 0x00000000 [ 0.000000] 0x0e800000 [ 0.000000] 0x00000000 [ 0.000000] 0x20000000 [ 0.000000] 0x00000000 [ 0.000000] 0x396c0000 [ 0.000000] 0x00000000 [ 0.000000] 0x000a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x39770000 [ 0.000000] 0x00000000 [ 0.000000] 0x00040000 [ 0.000000] 0x00000000 [ 0.000000] 0x398a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x00020000 [ 0.000000] > [ 0.000000] ; [snip..] [ 0.000000] linux,usable-memory-range base e800000, size 20000000 [ 0.000000] - e800000 , 20000000 [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 [ 0.000000] - 396c0000 , a0000 [ 0.000000] linux,usable-memory-range base 39770000, size 40000 [ 0.000000] - 39770000 , 40000 [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 [ 0.000000] - 398a0000 , 20000 [ 0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ... [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 arm64_memblock_init+0x210/0x484 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] pstate: 600000c5 [ 0.000000] sp : ffff000008ccfe80 [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 [ 0.000000] Call trace: [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) [ 0.000000] fd40: 0000000000000056 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] fd60: 0000000000000001 ffff000008c96360 000000000000000d 746f6f622072756f [ 0.000000] fd80: ffff000008517414 00000000000000f4 2065687420616976 6d207261656e696c [ 0.000000] fda0: 2d20676e69707061 657361656c70202d 79206b6365686320 000000002be00842 [ 0.000000] fdc0: ffff000008d05580 0000000000000000 000000000c283806 ffff000008afa000 [ 0.000000] fde0: ffff000008080000 ffff000008afa000 ffff000009680000 ffff000008ec0000 [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 00000000013b0000 0000000011230000 [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 ffff000008b76984 ffff000008ccfe80 [ 0.000000] fe40: ffff000008b76984 00000000600000c5 ffff00000959b7a8 ffff000008ec0000 [ 0.000000] fe60: ffffffffffffffff 0000000000000005 ffff000008ccfe80 ffff000008b76984 [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x50/0x6c with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr [ 0.000000] cma: Failed to reserve 512 MiB [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W ------------ 4.14.0+ #7 [ 0.000000] Call trace: [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] I guess it is because of the 1G alignment requirement between the kernel image and the initrd and how we populate the holes between the kernel image, segments (including dtb) and the initrd from the kexec-tools. Akashi, any pointers on this will be helpful as well. Regards, Bhupesh >> > >> > Regards, >> > Bhupesh >> > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> > >> via a kernel command line parameter, "memmap=". >> > >> >> > _______________________________________________ >> > kexec mailing list -- kexec@lists.fedoraproject.org >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 8:59 ` Bhupesh SHARMA 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-12-18 8:59 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it >> to kexec at lists.infradead.org >> >> Also add linux-acpi list > > Thank you. > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> > <ard.biesheuvel@linaro.org> wrote: >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> > > <takahiro.akashi@linaro.org> wrote: >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> > >>> <takahiro.akashi@linaro.org> wrote: >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> > >>> >> > Bhupesh, Ard, >> > >>> >> > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> > >>> >> >> Hi Ard, Akashi >> > >>> >> >> >> > >>> >> > (snip) >> > >>> >> > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> > >>> >> >> , for details) >> > >>> >> > >> > >>> >> > Right. >> > >>> >> > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> > >>> >> >> with the crashkernel memory range: >> > >>> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> > >>> >> >> address_cells, size_cells); >> > >>> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> > >>> >> >> , for details) >> > >>> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> > >>> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> > >>> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> > >>> >> >> -r`.img --reuse-cmdline -d >> > >>> >> >> >> > >>> >> >> [snip..] >> > >>> >> >> >> > >>> >> >> Reserved memory range >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> > >>> >> >> >> > >>> >> >> Coredump memory ranges >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> > >>> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> > >>> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> > >>> >> >> { >> > >>> >> >> struct memblock_region reg = { >> > >>> >> >> .size = 0, >> > >>> >> >> }; >> > >>> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >>> >> >> >> > >>> >> >> if (reg.size) >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> > >>> >> >> comment this out */ >> > >>> >> >> } >> > >>> >> > >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> > >>> >> > memory contents of the *crashed* kernel. >> > >>> >> > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> > >>> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> > >>> >> >> fail. >> > >>> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> > >>> >> >> dt node 'linux,usable-memory-range' >> > >>> >> > >> > >>> >> > I still don't understand why we need to carry over the information >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > >>> >> > such regions are free to be reused by the kernel after some point of >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> > >>> >> > >> > >>> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> > >>> >> memblock_reserve()'d now. >> > >>> > >> > >>> > For my better understandings, who is actually accessing such regions >> > >>> > during boot time, uefi itself or efistub? >> > >>> > >> > >>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> > >>> instance, on QEMU we have >> > >>> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> > >>> 01000013) >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> > >>> BXPC 00000001) >> > >>> >> > >>> covered by >> > >>> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> > >>> ... >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> > >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> > >> UEFI boot services. >> > >> >> > >>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> > >>> >> when booting the next kernel. >> > >>> > >> > >>> > not really. >> > >>> > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> > >>> >> > on crash dump kernel?) >> > >>> >> > >> > >>> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> > >>> >> regions only revealed the bug, not created it (given that other >> > >>> >> memblock_reserve regions may be affected as well) >> > >>> > >> > >>> > As whether we should honor such reserved regions over kexec'ing >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> > >>> > exposed to user space (via proc/iomem). >> > >>> > >> > >>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> > >>> as 'System RAM'. Do you think that could solve this? >> > >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> > >> marking them under another name in /proc/iomem would also be good in order >> > >> not to allocate them as part of crash kernel's memory. >> > >> >> > > >> > > I agree. However, this may not be entirely trivial, since iterating >> > > over the memblock_reserved table and creating iomem entries may result >> > > in collisions. >> > >> > I found a method (using the patch I shared earlier in this thread) to mark these >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> > reserved regions. >> > >> > >> But I'm not still convinced that we should export them in useable- >> > >> memory-range to crash dump kernel. They will be accessed through >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> > >> (or memblocks), I guess. >> > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > > which is exactly what we want in this case. >> > >> > Now this is what is confusing me. I don't see the above happening. >> > >> > I see that the primary kernel boots up and adds the ACPI regions via: >> > acpi_os_ioremap >> > -> ioremap_cache >> > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> > variant. > > It is natural if that region is out of memblocks. Thanks for the confirmation. This was my understanding as well. >> > And it fails while accessing the ACPI tables: >> > >> > [ 0.039205] ACPI: Core revision 20170728 >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > As ioremap() makes the mapping as "Device memory", unaligned memory > access won't be allowed. > >> > [ 0.100022] Modules linked in: >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> > pstate: 60000045 >> > [ 0.132647] sp : ffff000008ccfb40 >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> > [ 0.223224] Call trace: >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> > ffff0000095e3980 ffff000008ccfbe0 >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> > ffff000008ccfc50 0000000000000000 >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> > 00000000ffffff76 0000000000000006 >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> > 000000000000038e 0000000000000000 >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> > 0000000000000005 000000000000001b >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> > ffff000009710027 0000000000000001 >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> > 0000000000000000 ffff0000088be820 >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> > ffff00000849b4f8 ffff000008ccfb40 >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> > ffff000008ccfb40 ffff000008260a18 >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> > ffff000008ccfb40 ffff0000084a6764 >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> > [ 0.404437] Rebooting in 10 seconds. >> > >> > So, I think the linear mapping done by the primary kernel does not >> > make these accessible in the crash kernel directly. >> > >> > Any pointers? >> >> Can you get the code line number for acpi_ns_lookup+0x25c? > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > accesses? > (I didn't find out how unaligned accesses could happen there.) > Right. Like I captured somewhere in this thread (perhaps the first email on this subject), this is indeed an unaligned address access. Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding assigning this memory range as device memory doesn't seem a neat solution as it means we are not marking some thing with the right memory attribute and we can fall in similar/related issues later. Regarding the later suggestion, what I am seeing now is that the acpi table access functions are perhaps reused from the earlier x86 implementation, but on the arm64 (or even arm) arch we should not be allowing unaligned accesses which might cause UNDEFINED behaviour and resultant crash. So I can try going this approach and see if it works for me. However, I am still not very sure as to why the crashkernel ranges historically do not include the System RAM regions (which may include the ACPI regions as well). These regions are available for the kernel usage and perhaps should be exported to the crashkernel as well. I am not fully aware of the previous discussions on capp'ing the crashkernel memory being passed to the kdump kernel, but did we run into any issues while doing so? Also, even if I extend the kexec-tools to modify the linux,usable-memory-range and add the ACPI regions to it, the crashkernel fails to boot with the below message (I have added some logic to print the DTB on the crash kernel boot start): [ 0.000000] chosen { [ 0.000000] linux,usable-memory-range [ 0.000000] = < [ 0.000000] 0x00000000 [ 0.000000] 0x0e800000 [ 0.000000] 0x00000000 [ 0.000000] 0x20000000 [ 0.000000] 0x00000000 [ 0.000000] 0x396c0000 [ 0.000000] 0x00000000 [ 0.000000] 0x000a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x39770000 [ 0.000000] 0x00000000 [ 0.000000] 0x00040000 [ 0.000000] 0x00000000 [ 0.000000] 0x398a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x00020000 [ 0.000000] > [ 0.000000] ; [snip..] [ 0.000000] linux,usable-memory-range base e800000, size 20000000 [ 0.000000] - e800000 , 20000000 [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 [ 0.000000] - 396c0000 , a0000 [ 0.000000] linux,usable-memory-range base 39770000, size 40000 [ 0.000000] - 39770000 , 40000 [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 [ 0.000000] - 398a0000 , 20000 [ 0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ... [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 arm64_memblock_init+0x210/0x484 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] pstate: 600000c5 [ 0.000000] sp : ffff000008ccfe80 [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 [ 0.000000] Call trace: [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) [ 0.000000] fd40: 0000000000000056 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] fd60: 0000000000000001 ffff000008c96360 000000000000000d 746f6f622072756f [ 0.000000] fd80: ffff000008517414 00000000000000f4 2065687420616976 6d207261656e696c [ 0.000000] fda0: 2d20676e69707061 657361656c70202d 79206b6365686320 000000002be00842 [ 0.000000] fdc0: ffff000008d05580 0000000000000000 000000000c283806 ffff000008afa000 [ 0.000000] fde0: ffff000008080000 ffff000008afa000 ffff000009680000 ffff000008ec0000 [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 00000000013b0000 0000000011230000 [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 ffff000008b76984 ffff000008ccfe80 [ 0.000000] fe40: ffff000008b76984 00000000600000c5 ffff00000959b7a8 ffff000008ec0000 [ 0.000000] fe60: ffffffffffffffff 0000000000000005 ffff000008ccfe80 ffff000008b76984 [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x50/0x6c with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr [ 0.000000] cma: Failed to reserve 512 MiB [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W ------------ 4.14.0+ #7 [ 0.000000] Call trace: [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] I guess it is because of the 1G alignment requirement between the kernel image and the initrd and how we populate the holes between the kernel image, segments (including dtb) and the initrd from the kexec-tools. Akashi, any pointers on this will be helpful as well. Regards, Bhupesh >> > >> > Regards, >> > Bhupesh >> > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> > >> via a kernel command line parameter, "memmap=". >> > >> >> > _______________________________________________ >> > kexec mailing list -- kexec at lists.fedoraproject.org >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 8:59 ` Bhupesh SHARMA 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh SHARMA @ 2017-12-18 8:59 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it >> to kexec@lists.infradead.org >> >> Also add linux-acpi list > > Thank you. > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> > <ard.biesheuvel@linaro.org> wrote: >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> > > <takahiro.akashi@linaro.org> wrote: >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> > >>> <takahiro.akashi@linaro.org> wrote: >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> > >>> >> > Bhupesh, Ard, >> > >>> >> > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> > >>> >> >> Hi Ard, Akashi >> > >>> >> >> >> > >>> >> > (snip) >> > >>> >> > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> > >>> >> >> , for details) >> > >>> >> > >> > >>> >> > Right. >> > >>> >> > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> > >>> >> >> with the crashkernel memory range: >> > >>> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> > >>> >> >> address_cells, size_cells); >> > >>> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> > >>> >> >> , for details) >> > >>> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> > >>> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> > >>> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> > >>> >> >> -r`.img --reuse-cmdline -d >> > >>> >> >> >> > >>> >> >> [snip..] >> > >>> >> >> >> > >>> >> >> Reserved memory range >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> > >>> >> >> >> > >>> >> >> Coredump memory ranges >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> > >>> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> > >>> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> > >>> >> >> { >> > >>> >> >> struct memblock_region reg = { >> > >>> >> >> .size = 0, >> > >>> >> >> }; >> > >>> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >>> >> >> >> > >>> >> >> if (reg.size) >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> > >>> >> >> comment this out */ >> > >>> >> >> } >> > >>> >> > >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> > >>> >> > memory contents of the *crashed* kernel. >> > >>> >> > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> > >>> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> > >>> >> >> fail. >> > >>> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> > >>> >> >> dt node 'linux,usable-memory-range' >> > >>> >> > >> > >>> >> > I still don't understand why we need to carry over the information >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > >>> >> > such regions are free to be reused by the kernel after some point of >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> > >>> >> > >> > >>> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> > >>> >> memblock_reserve()'d now. >> > >>> > >> > >>> > For my better understandings, who is actually accessing such regions >> > >>> > during boot time, uefi itself or efistub? >> > >>> > >> > >>> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> > >>> instance, on QEMU we have >> > >>> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> > >>> 01000013) >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> > >>> BXPC 00000001) >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> > >>> BXPC 00000001) >> > >>> >> > >>> covered by >> > >>> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> > >>> ... >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> > >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> > >> UEFI boot services. >> > >> >> > >>> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> > >>> >> when booting the next kernel. >> > >>> > >> > >>> > not really. >> > >>> > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> > >>> >> > on crash dump kernel?) >> > >>> >> > >> > >>> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> > >>> >> regions only revealed the bug, not created it (given that other >> > >>> >> memblock_reserve regions may be affected as well) >> > >>> > >> > >>> > As whether we should honor such reserved regions over kexec'ing >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> > >>> > exposed to user space (via proc/iomem). >> > >>> > >> > >>> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> > >>> as 'System RAM'. Do you think that could solve this? >> > >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> > >> marking them under another name in /proc/iomem would also be good in order >> > >> not to allocate them as part of crash kernel's memory. >> > >> >> > > >> > > I agree. However, this may not be entirely trivial, since iterating >> > > over the memblock_reserved table and creating iomem entries may result >> > > in collisions. >> > >> > I found a method (using the patch I shared earlier in this thread) to mark these >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> > reserved regions. >> > >> > >> But I'm not still convinced that we should export them in useable- >> > >> memory-range to crash dump kernel. They will be accessed through >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> > >> (or memblocks), I guess. >> > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > > which is exactly what we want in this case. >> > >> > Now this is what is confusing me. I don't see the above happening. >> > >> > I see that the primary kernel boots up and adds the ACPI regions via: >> > acpi_os_ioremap >> > -> ioremap_cache >> > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> > variant. > > It is natural if that region is out of memblocks. Thanks for the confirmation. This was my understanding as well. >> > And it fails while accessing the ACPI tables: >> > >> > [ 0.039205] ACPI: Core revision 20170728 >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > As ioremap() makes the mapping as "Device memory", unaligned memory > access won't be allowed. > >> > [ 0.100022] Modules linked in: >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> > pstate: 60000045 >> > [ 0.132647] sp : ffff000008ccfb40 >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> > [ 0.223224] Call trace: >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> > ffff0000095e3980 ffff000008ccfbe0 >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> > ffff000008ccfc50 0000000000000000 >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> > 00000000ffffff76 0000000000000006 >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> > 000000000000038e 0000000000000000 >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> > 0000000000000005 000000000000001b >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> > ffff000009710027 0000000000000001 >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> > 0000000000000000 ffff0000088be820 >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> > ffff00000849b4f8 ffff000008ccfb40 >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> > ffff000008ccfb40 ffff000008260a18 >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> > ffff000008ccfb40 ffff0000084a6764 >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> > [ 0.404437] Rebooting in 10 seconds. >> > >> > So, I think the linear mapping done by the primary kernel does not >> > make these accessible in the crash kernel directly. >> > >> > Any pointers? >> >> Can you get the code line number for acpi_ns_lookup+0x25c? > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > accesses? > (I didn't find out how unaligned accesses could happen there.) > Right. Like I captured somewhere in this thread (perhaps the first email on this subject), this is indeed an unaligned address access. Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding assigning this memory range as device memory doesn't seem a neat solution as it means we are not marking some thing with the right memory attribute and we can fall in similar/related issues later. Regarding the later suggestion, what I am seeing now is that the acpi table access functions are perhaps reused from the earlier x86 implementation, but on the arm64 (or even arm) arch we should not be allowing unaligned accesses which might cause UNDEFINED behaviour and resultant crash. So I can try going this approach and see if it works for me. However, I am still not very sure as to why the crashkernel ranges historically do not include the System RAM regions (which may include the ACPI regions as well). These regions are available for the kernel usage and perhaps should be exported to the crashkernel as well. I am not fully aware of the previous discussions on capp'ing the crashkernel memory being passed to the kdump kernel, but did we run into any issues while doing so? Also, even if I extend the kexec-tools to modify the linux,usable-memory-range and add the ACPI regions to it, the crashkernel fails to boot with the below message (I have added some logic to print the DTB on the crash kernel boot start): [ 0.000000] chosen { [ 0.000000] linux,usable-memory-range [ 0.000000] = < [ 0.000000] 0x00000000 [ 0.000000] 0x0e800000 [ 0.000000] 0x00000000 [ 0.000000] 0x20000000 [ 0.000000] 0x00000000 [ 0.000000] 0x396c0000 [ 0.000000] 0x00000000 [ 0.000000] 0x000a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x39770000 [ 0.000000] 0x00000000 [ 0.000000] 0x00040000 [ 0.000000] 0x00000000 [ 0.000000] 0x398a0000 [ 0.000000] 0x00000000 [ 0.000000] 0x00020000 [ 0.000000] > [ 0.000000] ; [snip..] [ 0.000000] linux,usable-memory-range base e800000, size 20000000 [ 0.000000] - e800000 , 20000000 [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 [ 0.000000] - 396c0000 , a0000 [ 0.000000] linux,usable-memory-range base 39770000, size 40000 [ 0.000000] - 39770000 , 40000 [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 [ 0.000000] - 398a0000 , 20000 [ 0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ... [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 arm64_memblock_init+0x210/0x484 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] pstate: 600000c5 [ 0.000000] sp : ffff000008ccfe80 [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 [ 0.000000] Call trace: [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) [ 0.000000] fd40: 0000000000000056 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] fd60: 0000000000000001 ffff000008c96360 000000000000000d 746f6f622072756f [ 0.000000] fd80: ffff000008517414 00000000000000f4 2065687420616976 6d207261656e696c [ 0.000000] fda0: 2d20676e69707061 657361656c70202d 79206b6365686320 000000002be00842 [ 0.000000] fdc0: ffff000008d05580 0000000000000000 000000000c283806 ffff000008afa000 [ 0.000000] fde0: ffff000008080000 ffff000008afa000 ffff000009680000 ffff000008ec0000 [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 00000000013b0000 0000000011230000 [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 ffff000008b76984 ffff000008ccfe80 [ 0.000000] fe40: ffff000008b76984 00000000600000c5 ffff00000959b7a8 ffff000008ec0000 [ 0.000000] fe60: ffffffffffffffff 0000000000000005 ffff000008ccfe80 ffff000008b76984 [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x50/0x6c with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr [ 0.000000] cma: Failed to reserve 512 MiB [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W ------------ 4.14.0+ #7 [ 0.000000] Call trace: [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x0000000000010000 bytes below 0x0000000000000000. [ 0.000000] I guess it is because of the 1G alignment requirement between the kernel image and the initrd and how we populate the holes between the kernel image, segments (including dtb) and the initrd from the kexec-tools. Akashi, any pointers on this will be helpful as well. Regards, Bhupesh >> > >> > Regards, >> > Bhupesh >> > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> > >> via a kernel command line parameter, "memmap=". >> > >> >> > _______________________________________________ >> > kexec mailing list -- kexec@lists.fedoraproject.org >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAFTCetQ55zUKe25jSku0DHp8uVZA4hB32d5W6MSCNsTVpxu7Gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 8:59 ` Bhupesh SHARMA (?) (?) @ 2017-12-18 11:18 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw) To: Bhupesh SHARMA Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming Bhupesh, On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > >> to kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org > >> > >> Also add linux-acpi list > > > > Thank you. > > > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> > <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> > >>> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> > >>> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > >>> >> > Bhupesh, Ard, > >> > >>> >> > > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> > >>> >> >> Hi Ard, Akashi > >> > >>> >> >> > >> > >>> >> > (snip) > >> > >>> >> > > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> > >>> >> >> , for details) > >> > >>> >> > > >> > >>> >> > Right. > >> > >>> >> > > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> > >>> >> >> with the crashkernel memory range: > >> > >>> >> >> > >> > >>> >> >> /* add linux,usable-memory-range */ > >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> > >>> >> >> address_cells, size_cells); > >> > >>> >> >> > >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> > >>> >> >> , for details) > >> > >>> >> >> > >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> > >>> >> >> they are marked as System RAM or as RESERVED. As, > >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> > >>> >> >> > >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> > >>> >> >> ACPI memory and crashes while trying to access the same: > >> > >>> >> >> > >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> > >>> >> >> -r`.img --reuse-cmdline -d > >> > >>> >> >> > >> > >>> >> >> [snip..] > >> > >>> >> >> > >> > >>> >> >> Reserved memory range > >> > >>> >> >> 000000000e800000-000000002e7fffff (0) > >> > >>> >> >> > >> > >>> >> >> Coredump memory ranges > >> > >>> >> >> 0000000000000000-000000000e7fffff (0) > >> > >>> >> >> 000000002e800000-000000003961ffff (0) > >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> > >>> >> >> 000000a000000000-000000affbffffff (0) > >> > >>> >> >> > >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> > >>> >> >> memory cap'ing passed to the crash kernel inside > >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> > >>> >> >> > >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >> > >>> >> >> { > >> > >>> >> >> struct memblock_region reg = { > >> > >>> >> >> .size = 0, > >> > >>> >> >> }; > >> > >>> >> >> > >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > >>> >> >> > >> > >>> >> >> if (reg.size) > >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> > >>> >> >> comment this out */ > >> > >>> >> >> } > >> > >>> >> > > >> > >>> >> > Please just don't do that. It can cause a fatal damage on > >> > >>> >> > memory contents of the *crashed* kernel. > >> > >>> >> > > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. > >> > >>> >> >> > >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> > >>> >> >> fail. > >> > >>> >> >> > >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> > >>> >> >> dt node 'linux,usable-memory-range' > >> > >>> >> > > >> > >>> >> > I still don't understand why we need to carry over the information > >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> > >>> >> > such regions are free to be reused by the kernel after some point of > >> > >>> >> > initialization. Why does crash dump kernel need to know about them? > >> > >>> >> > > >> > >>> >> > >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> > >>> >> kernel, those regions needs to be preserved, which is why they are > >> > >>> >> memblock_reserve()'d now. > >> > >>> > > >> > >>> > For my better understandings, who is actually accessing such regions > >> > >>> > during boot time, uefi itself or efistub? > >> > >>> > > >> > >>> > >> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >> > >>> instance, on QEMU we have > >> > >>> > >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> > >>> 01000013) > >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> > >>> BXPC 00000001) > >> > >>> > >> > >>> covered by > >> > >>> > >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> > >>> ... > >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> > >> UEFI boot services. > >> > >> > >> > >>> > >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> > >>> >> when booting the next kernel. > >> > >>> > > >> > >>> > not really. > >> > >>> > > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> > >>> >> > on crash dump kernel?) > >> > >>> >> > > >> > >>> >> > >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> > >>> >> regions only revealed the bug, not created it (given that other > >> > >>> >> memblock_reserve regions may be affected as well) > >> > >>> > > >> > >>> > As whether we should honor such reserved regions over kexec'ing > >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> > >>> > As a matter of fact, no information about "reserved" memblocks is > >> > >>> > exposed to user space (via proc/iomem). > >> > >>> > > >> > >>> > >> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >> > >>> as 'System RAM'. Do you think that could solve this? > >> > >> > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> > >> marking them under another name in /proc/iomem would also be good in order > >> > >> not to allocate them as part of crash kernel's memory. > >> > >> > >> > > > >> > > I agree. However, this may not be entirely trivial, since iterating > >> > > over the memblock_reserved table and creating iomem entries may result > >> > > in collisions. > >> > > >> > I found a method (using the patch I shared earlier in this thread) to mark these > >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> > reserved regions. > >> > > >> > >> But I'm not still convinced that we should export them in useable- > >> > >> memory-range to crash dump kernel. They will be accessed through > >> > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> > >> (or memblocks), I guess. > >> > > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > > which is exactly what we want in this case. > >> > > >> > Now this is what is confusing me. I don't see the above happening. > >> > > >> > I see that the primary kernel boots up and adds the ACPI regions via: > >> > acpi_os_ioremap > >> > -> ioremap_cache > >> > > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls > >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> > variant. > > > > It is natural if that region is out of memblocks. > > Thanks for the confirmation. This was my understanding as well. > > >> > And it fails while accessing the ACPI tables: > >> > > >> > [ 0.039205] ACPI: Core revision 20170728 > >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > > As ioremap() makes the mapping as "Device memory", unaligned memory > > access won't be allowed. > > > >> > [ 0.100022] Modules linked in: > >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> > pstate: 60000045 > >> > [ 0.132647] sp : ffff000008ccfb40 > >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> > [ 0.223224] Call trace: > >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> > ffff0000095e3980 ffff000008ccfbe0 > >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> > ffff000008ccfc50 0000000000000000 > >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> > 00000000ffffff76 0000000000000006 > >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> > 000000000000038e 0000000000000000 > >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> > 0000000000000005 000000000000001b > >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> > ffff000009710027 0000000000000001 > >> > [ 0.279667] fac0: 0000000000000001 000000000000001b > >> > 0000000000000000 ffff0000088be820 > >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> > ffff00000849b4f8 ffff000008ccfb40 > >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> > ffff000008ccfb40 ffff000008260a18 > >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> > ffff000008ccfb40 ffff0000084a6764 > >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> > [ 0.399160] Kernel panic - not syncing: Fatal exception > >> > [ 0.404437] Rebooting in 10 seconds. > >> > > >> > So, I think the linear mapping done by the primary kernel does not > >> > make these accessible in the crash kernel directly. > >> > > >> > Any pointers? > >> > >> Can you get the code line number for acpi_ns_lookup+0x25c? > > > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > > accesses? > > (I didn't find out how unaligned accesses could happen there.) > > > > Right. Like I captured somewhere in this thread (perhaps the first > email on this subject), > this is indeed an unaligned address access. > > Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding > assigning this memory range > as device memory doesn't seem a neat solution as it means we are not > marking some thing with the right memory attribute and we can fall in > similar/related issues later. > > Regarding the later suggestion, what I am seeing now is that the acpi > table access functions are perhaps reused from the earlier x86 > implementation, but on the arm64 (or even arm) arch we should not be > allowing unaligned accesses which might cause UNDEFINED behaviour and > resultant crash. > > So I can try going this approach and see if it works for me. > > However, I am still not very sure as to why the crashkernel ranges > historically do not include the System RAM regions (which may include > the ACPI regions as well). These regions are available for the kernel > usage and perhaps should be exported to the crashkernel as well. > > I am not fully aware of the previous discussions on capp'ing the > crashkernel memory being passed to the kdump kernel, but did we run > into any issues while doing so? > > Also, even if I extend the kexec-tools to modify the > linux,usable-memory-range and add the ACPI regions to it, the > crashkernel fails to boot with the below message (I have added some > logic to print the DTB on the crash kernel boot start): > > [ 0.000000] chosen { > [ 0.000000] linux,usable-memory-range > [ 0.000000] = < > [ 0.000000] 0x00000000 > [ 0.000000] 0x0e800000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x20000000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x396c0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x000a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x39770000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00040000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x398a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00020000 > [ 0.000000] > > [ 0.000000] ; > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. Please show me: * "Virtual kernel memory layout" in dmesg * /proc/iomem * debug messages from kexec-tools (kexec -d) -Takahiro AKASHI > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org > >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 11:18 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw) To: Bhupesh SHARMA Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, Bhupesh Sharma, kexec, linux-kernel, linux-acpi, James Morse, Dave Young, linux-arm-kernel Bhupesh, On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > >> to kexec@lists.infradead.org > >> > >> Also add linux-acpi list > > > > Thank you. > > > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> > <ard.biesheuvel@linaro.org> wrote: > >> > > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > > <takahiro.akashi@linaro.org> wrote: > >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> > >>> <takahiro.akashi@linaro.org> wrote: > >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> > >>> >> <takahiro.akashi@linaro.org> wrote: > >> > >>> >> > Bhupesh, Ard, > >> > >>> >> > > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> > >>> >> >> Hi Ard, Akashi > >> > >>> >> >> > >> > >>> >> > (snip) > >> > >>> >> > > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> > >>> >> >> , for details) > >> > >>> >> > > >> > >>> >> > Right. > >> > >>> >> > > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> > >>> >> >> with the crashkernel memory range: > >> > >>> >> >> > >> > >>> >> >> /* add linux,usable-memory-range */ > >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> > >>> >> >> address_cells, size_cells); > >> > >>> >> >> > >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> > >>> >> >> , for details) > >> > >>> >> >> > >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> > >>> >> >> they are marked as System RAM or as RESERVED. As, > >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> > >>> >> >> > >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> > >>> >> >> ACPI memory and crashes while trying to access the same: > >> > >>> >> >> > >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> > >>> >> >> -r`.img --reuse-cmdline -d > >> > >>> >> >> > >> > >>> >> >> [snip..] > >> > >>> >> >> > >> > >>> >> >> Reserved memory range > >> > >>> >> >> 000000000e800000-000000002e7fffff (0) > >> > >>> >> >> > >> > >>> >> >> Coredump memory ranges > >> > >>> >> >> 0000000000000000-000000000e7fffff (0) > >> > >>> >> >> 000000002e800000-000000003961ffff (0) > >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> > >>> >> >> 000000a000000000-000000affbffffff (0) > >> > >>> >> >> > >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> > >>> >> >> memory cap'ing passed to the crash kernel inside > >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> > >>> >> >> > >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >> > >>> >> >> { > >> > >>> >> >> struct memblock_region reg = { > >> > >>> >> >> .size = 0, > >> > >>> >> >> }; > >> > >>> >> >> > >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > >>> >> >> > >> > >>> >> >> if (reg.size) > >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> > >>> >> >> comment this out */ > >> > >>> >> >> } > >> > >>> >> > > >> > >>> >> > Please just don't do that. It can cause a fatal damage on > >> > >>> >> > memory contents of the *crashed* kernel. > >> > >>> >> > > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. > >> > >>> >> >> > >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> > >>> >> >> fail. > >> > >>> >> >> > >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> > >>> >> >> dt node 'linux,usable-memory-range' > >> > >>> >> > > >> > >>> >> > I still don't understand why we need to carry over the information > >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> > >>> >> > such regions are free to be reused by the kernel after some point of > >> > >>> >> > initialization. Why does crash dump kernel need to know about them? > >> > >>> >> > > >> > >>> >> > >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> > >>> >> kernel, those regions needs to be preserved, which is why they are > >> > >>> >> memblock_reserve()'d now. > >> > >>> > > >> > >>> > For my better understandings, who is actually accessing such regions > >> > >>> > during boot time, uefi itself or efistub? > >> > >>> > > >> > >>> > >> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >> > >>> instance, on QEMU we have > >> > >>> > >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> > >>> 01000013) > >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> > >>> BXPC 00000001) > >> > >>> > >> > >>> covered by > >> > >>> > >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> > >>> ... > >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> > >> UEFI boot services. > >> > >> > >> > >>> > >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> > >>> >> when booting the next kernel. > >> > >>> > > >> > >>> > not really. > >> > >>> > > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> > >>> >> > on crash dump kernel?) > >> > >>> >> > > >> > >>> >> > >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> > >>> >> regions only revealed the bug, not created it (given that other > >> > >>> >> memblock_reserve regions may be affected as well) > >> > >>> > > >> > >>> > As whether we should honor such reserved regions over kexec'ing > >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> > >>> > As a matter of fact, no information about "reserved" memblocks is > >> > >>> > exposed to user space (via proc/iomem). > >> > >>> > > >> > >>> > >> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >> > >>> as 'System RAM'. Do you think that could solve this? > >> > >> > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> > >> marking them under another name in /proc/iomem would also be good in order > >> > >> not to allocate them as part of crash kernel's memory. > >> > >> > >> > > > >> > > I agree. However, this may not be entirely trivial, since iterating > >> > > over the memblock_reserved table and creating iomem entries may result > >> > > in collisions. > >> > > >> > I found a method (using the patch I shared earlier in this thread) to mark these > >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> > reserved regions. > >> > > >> > >> But I'm not still convinced that we should export them in useable- > >> > >> memory-range to crash dump kernel. They will be accessed through > >> > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> > >> (or memblocks), I guess. > >> > > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > > which is exactly what we want in this case. > >> > > >> > Now this is what is confusing me. I don't see the above happening. > >> > > >> > I see that the primary kernel boots up and adds the ACPI regions via: > >> > acpi_os_ioremap > >> > -> ioremap_cache > >> > > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls > >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> > variant. > > > > It is natural if that region is out of memblocks. > > Thanks for the confirmation. This was my understanding as well. > > >> > And it fails while accessing the ACPI tables: > >> > > >> > [ 0.039205] ACPI: Core revision 20170728 > >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > > As ioremap() makes the mapping as "Device memory", unaligned memory > > access won't be allowed. > > > >> > [ 0.100022] Modules linked in: > >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> > pstate: 60000045 > >> > [ 0.132647] sp : ffff000008ccfb40 > >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> > [ 0.223224] Call trace: > >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> > ffff0000095e3980 ffff000008ccfbe0 > >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> > ffff000008ccfc50 0000000000000000 > >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> > 00000000ffffff76 0000000000000006 > >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> > 000000000000038e 0000000000000000 > >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> > 0000000000000005 000000000000001b > >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> > ffff000009710027 0000000000000001 > >> > [ 0.279667] fac0: 0000000000000001 000000000000001b > >> > 0000000000000000 ffff0000088be820 > >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> > ffff00000849b4f8 ffff000008ccfb40 > >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> > ffff000008ccfb40 ffff000008260a18 > >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> > ffff000008ccfb40 ffff0000084a6764 > >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> > [ 0.399160] Kernel panic - not syncing: Fatal exception > >> > [ 0.404437] Rebooting in 10 seconds. > >> > > >> > So, I think the linear mapping done by the primary kernel does not > >> > make these accessible in the crash kernel directly. > >> > > >> > Any pointers? > >> > >> Can you get the code line number for acpi_ns_lookup+0x25c? > > > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > > accesses? > > (I didn't find out how unaligned accesses could happen there.) > > > > Right. Like I captured somewhere in this thread (perhaps the first > email on this subject), > this is indeed an unaligned address access. > > Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding > assigning this memory range > as device memory doesn't seem a neat solution as it means we are not > marking some thing with the right memory attribute and we can fall in > similar/related issues later. > > Regarding the later suggestion, what I am seeing now is that the acpi > table access functions are perhaps reused from the earlier x86 > implementation, but on the arm64 (or even arm) arch we should not be > allowing unaligned accesses which might cause UNDEFINED behaviour and > resultant crash. > > So I can try going this approach and see if it works for me. > > However, I am still not very sure as to why the crashkernel ranges > historically do not include the System RAM regions (which may include > the ACPI regions as well). These regions are available for the kernel > usage and perhaps should be exported to the crashkernel as well. > > I am not fully aware of the previous discussions on capp'ing the > crashkernel memory being passed to the kdump kernel, but did we run > into any issues while doing so? > > Also, even if I extend the kexec-tools to modify the > linux,usable-memory-range and add the ACPI regions to it, the > crashkernel fails to boot with the below message (I have added some > logic to print the DTB on the crash kernel boot start): > > [ 0.000000] chosen { > [ 0.000000] linux,usable-memory-range > [ 0.000000] = < > [ 0.000000] 0x00000000 > [ 0.000000] 0x0e800000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x20000000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x396c0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x000a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x39770000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00040000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x398a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00020000 > [ 0.000000] > > [ 0.000000] ; > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. Please show me: * "Virtual kernel memory layout" in dmesg * /proc/iomem * debug messages from kexec-tools (kexec -d) -Takahiro AKASHI > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec@lists.fedoraproject.org > >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 11:18 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw) To: linux-arm-kernel Bhupesh, On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > >> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it > >> to kexec at lists.infradead.org > >> > >> Also add linux-acpi list > > > > Thank you. > > > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> > <ard.biesheuvel@linaro.org> wrote: > >> > > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > > <takahiro.akashi@linaro.org> wrote: > >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> > >>> <takahiro.akashi@linaro.org> wrote: > >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> > >>> >> <takahiro.akashi@linaro.org> wrote: > >> > >>> >> > Bhupesh, Ard, > >> > >>> >> > > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> > >>> >> >> Hi Ard, Akashi > >> > >>> >> >> > >> > >>> >> > (snip) > >> > >>> >> > > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> > >>> >> >> , for details) > >> > >>> >> > > >> > >>> >> > Right. > >> > >>> >> > > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> > >>> >> >> with the crashkernel memory range: > >> > >>> >> >> > >> > >>> >> >> /* add linux,usable-memory-range */ > >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> > >>> >> >> address_cells, size_cells); > >> > >>> >> >> > >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> > >>> >> >> , for details) > >> > >>> >> >> > >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> > >>> >> >> they are marked as System RAM or as RESERVED. As, > >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> > >>> >> >> > >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> > >>> >> >> ACPI memory and crashes while trying to access the same: > >> > >>> >> >> > >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> > >>> >> >> -r`.img --reuse-cmdline -d > >> > >>> >> >> > >> > >>> >> >> [snip..] > >> > >>> >> >> > >> > >>> >> >> Reserved memory range > >> > >>> >> >> 000000000e800000-000000002e7fffff (0) > >> > >>> >> >> > >> > >>> >> >> Coredump memory ranges > >> > >>> >> >> 0000000000000000-000000000e7fffff (0) > >> > >>> >> >> 000000002e800000-000000003961ffff (0) > >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> > >>> >> >> 000000a000000000-000000affbffffff (0) > >> > >>> >> >> > >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> > >>> >> >> memory cap'ing passed to the crash kernel inside > >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> > >>> >> >> > >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >> > >>> >> >> { > >> > >>> >> >> struct memblock_region reg = { > >> > >>> >> >> .size = 0, > >> > >>> >> >> }; > >> > >>> >> >> > >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > >>> >> >> > >> > >>> >> >> if (reg.size) > >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> > >>> >> >> comment this out */ > >> > >>> >> >> } > >> > >>> >> > > >> > >>> >> > Please just don't do that. It can cause a fatal damage on > >> > >>> >> > memory contents of the *crashed* kernel. > >> > >>> >> > > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. > >> > >>> >> >> > >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> > >>> >> >> fail. > >> > >>> >> >> > >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> > >>> >> >> dt node 'linux,usable-memory-range' > >> > >>> >> > > >> > >>> >> > I still don't understand why we need to carry over the information > >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> > >>> >> > such regions are free to be reused by the kernel after some point of > >> > >>> >> > initialization. Why does crash dump kernel need to know about them? > >> > >>> >> > > >> > >>> >> > >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> > >>> >> kernel, those regions needs to be preserved, which is why they are > >> > >>> >> memblock_reserve()'d now. > >> > >>> > > >> > >>> > For my better understandings, who is actually accessing such regions > >> > >>> > during boot time, uefi itself or efistub? > >> > >>> > > >> > >>> > >> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >> > >>> instance, on QEMU we have > >> > >>> > >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> > >>> 01000013) > >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> > >>> BXPC 00000001) > >> > >>> > >> > >>> covered by > >> > >>> > >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> > >>> ... > >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> > >> UEFI boot services. > >> > >> > >> > >>> > >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> > >>> >> when booting the next kernel. > >> > >>> > > >> > >>> > not really. > >> > >>> > > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> > >>> >> > on crash dump kernel?) > >> > >>> >> > > >> > >>> >> > >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> > >>> >> regions only revealed the bug, not created it (given that other > >> > >>> >> memblock_reserve regions may be affected as well) > >> > >>> > > >> > >>> > As whether we should honor such reserved regions over kexec'ing > >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> > >>> > As a matter of fact, no information about "reserved" memblocks is > >> > >>> > exposed to user space (via proc/iomem). > >> > >>> > > >> > >>> > >> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >> > >>> as 'System RAM'. Do you think that could solve this? > >> > >> > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> > >> marking them under another name in /proc/iomem would also be good in order > >> > >> not to allocate them as part of crash kernel's memory. > >> > >> > >> > > > >> > > I agree. However, this may not be entirely trivial, since iterating > >> > > over the memblock_reserved table and creating iomem entries may result > >> > > in collisions. > >> > > >> > I found a method (using the patch I shared earlier in this thread) to mark these > >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> > reserved regions. > >> > > >> > >> But I'm not still convinced that we should export them in useable- > >> > >> memory-range to crash dump kernel. They will be accessed through > >> > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> > >> (or memblocks), I guess. > >> > > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > > which is exactly what we want in this case. > >> > > >> > Now this is what is confusing me. I don't see the above happening. > >> > > >> > I see that the primary kernel boots up and adds the ACPI regions via: > >> > acpi_os_ioremap > >> > -> ioremap_cache > >> > > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls > >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> > variant. > > > > It is natural if that region is out of memblocks. > > Thanks for the confirmation. This was my understanding as well. > > >> > And it fails while accessing the ACPI tables: > >> > > >> > [ 0.039205] ACPI: Core revision 20170728 > >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > > As ioremap() makes the mapping as "Device memory", unaligned memory > > access won't be allowed. > > > >> > [ 0.100022] Modules linked in: > >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> > pstate: 60000045 > >> > [ 0.132647] sp : ffff000008ccfb40 > >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> > [ 0.223224] Call trace: > >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> > ffff0000095e3980 ffff000008ccfbe0 > >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> > ffff000008ccfc50 0000000000000000 > >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> > 00000000ffffff76 0000000000000006 > >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> > 000000000000038e 0000000000000000 > >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> > 0000000000000005 000000000000001b > >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> > ffff000009710027 0000000000000001 > >> > [ 0.279667] fac0: 0000000000000001 000000000000001b > >> > 0000000000000000 ffff0000088be820 > >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> > ffff00000849b4f8 ffff000008ccfb40 > >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> > ffff000008ccfb40 ffff000008260a18 > >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> > ffff000008ccfb40 ffff0000084a6764 > >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> > [ 0.399160] Kernel panic - not syncing: Fatal exception > >> > [ 0.404437] Rebooting in 10 seconds. > >> > > >> > So, I think the linear mapping done by the primary kernel does not > >> > make these accessible in the crash kernel directly. > >> > > >> > Any pointers? > >> > >> Can you get the code line number for acpi_ns_lookup+0x25c? > > > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > > accesses? > > (I didn't find out how unaligned accesses could happen there.) > > > > Right. Like I captured somewhere in this thread (perhaps the first > email on this subject), > this is indeed an unaligned address access. > > Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding > assigning this memory range > as device memory doesn't seem a neat solution as it means we are not > marking some thing with the right memory attribute and we can fall in > similar/related issues later. > > Regarding the later suggestion, what I am seeing now is that the acpi > table access functions are perhaps reused from the earlier x86 > implementation, but on the arm64 (or even arm) arch we should not be > allowing unaligned accesses which might cause UNDEFINED behaviour and > resultant crash. > > So I can try going this approach and see if it works for me. > > However, I am still not very sure as to why the crashkernel ranges > historically do not include the System RAM regions (which may include > the ACPI regions as well). These regions are available for the kernel > usage and perhaps should be exported to the crashkernel as well. > > I am not fully aware of the previous discussions on capp'ing the > crashkernel memory being passed to the kdump kernel, but did we run > into any issues while doing so? > > Also, even if I extend the kexec-tools to modify the > linux,usable-memory-range and add the ACPI regions to it, the > crashkernel fails to boot with the below message (I have added some > logic to print the DTB on the crash kernel boot start): > > [ 0.000000] chosen { > [ 0.000000] linux,usable-memory-range > [ 0.000000] = < > [ 0.000000] 0x00000000 > [ 0.000000] 0x0e800000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x20000000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x396c0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x000a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x39770000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00040000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x398a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00020000 > [ 0.000000] > > [ 0.000000] ; > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. Please show me: * "Virtual kernel memory layout" in dmesg * /proc/iomem * debug messages from kexec-tools (kexec -d) -Takahiro AKASHI > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec at lists.fedoraproject.org > >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 11:18 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-18 11:18 UTC (permalink / raw) To: Bhupesh SHARMA Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming Bhupesh, On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: > >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > >> to kexec@lists.infradead.org > >> > >> Also add linux-acpi list > > > > Thank you. > > > >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> > <ard.biesheuvel@linaro.org> wrote: > >> > > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > > <takahiro.akashi@linaro.org> wrote: > >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> > >>> <takahiro.akashi@linaro.org> wrote: > >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> > >>> >> <takahiro.akashi@linaro.org> wrote: > >> > >>> >> > Bhupesh, Ard, > >> > >>> >> > > >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> > >>> >> >> Hi Ard, Akashi > >> > >>> >> >> > >> > >>> >> > (snip) > >> > >>> >> > > >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> > >>> >> >> , for details) > >> > >>> >> > > >> > >>> >> > Right. > >> > >>> >> > > >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> > >>> >> >> with the crashkernel memory range: > >> > >>> >> >> > >> > >>> >> >> /* add linux,usable-memory-range */ > >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> > >>> >> >> address_cells, size_cells); > >> > >>> >> >> > >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> > >>> >> >> , for details) > >> > >>> >> >> > >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> > >>> >> >> they are marked as System RAM or as RESERVED. As, > >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> > >>> >> >> > >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> > >>> >> >> ACPI memory and crashes while trying to access the same: > >> > >>> >> >> > >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> > >>> >> >> -r`.img --reuse-cmdline -d > >> > >>> >> >> > >> > >>> >> >> [snip..] > >> > >>> >> >> > >> > >>> >> >> Reserved memory range > >> > >>> >> >> 000000000e800000-000000002e7fffff (0) > >> > >>> >> >> > >> > >>> >> >> Coredump memory ranges > >> > >>> >> >> 0000000000000000-000000000e7fffff (0) > >> > >>> >> >> 000000002e800000-000000003961ffff (0) > >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> > >>> >> >> 000000a000000000-000000affbffffff (0) > >> > >>> >> >> > >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> > >>> >> >> memory cap'ing passed to the crash kernel inside > >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> > >>> >> >> > >> > >>> >> >> static void __init fdt_enforce_memory_region(void) > >> > >>> >> >> { > >> > >>> >> >> struct memblock_region reg = { > >> > >>> >> >> .size = 0, > >> > >>> >> >> }; > >> > >>> >> >> > >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > >>> >> >> > >> > >>> >> >> if (reg.size) > >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> > >>> >> >> comment this out */ > >> > >>> >> >> } > >> > >>> >> > > >> > >>> >> > Please just don't do that. It can cause a fatal damage on > >> > >>> >> > memory contents of the *crashed* kernel. > >> > >>> >> > > >> > >>> >> >> 5). Both the above temporary solutions fix the problem. > >> > >>> >> >> > >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> > >>> >> >> fail. > >> > >>> >> >> > >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> > >>> >> >> dt node 'linux,usable-memory-range' > >> > >>> >> > > >> > >>> >> > I still don't understand why we need to carry over the information > >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> > >>> >> > such regions are free to be reused by the kernel after some point of > >> > >>> >> > initialization. Why does crash dump kernel need to know about them? > >> > >>> >> > > >> > >>> >> > >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> > >>> >> kernel, those regions needs to be preserved, which is why they are > >> > >>> >> memblock_reserve()'d now. > >> > >>> > > >> > >>> > For my better understandings, who is actually accessing such regions > >> > >>> > during boot time, uefi itself or efistub? > >> > >>> > > >> > >>> > >> > >>> No, only the kernel. This is where the ACPI tables are stored. For > >> > >>> instance, on QEMU we have > >> > >>> > >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> > >>> 01000013) > >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> > >>> BXPC 00000001) > >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> > >>> BXPC 00000001) > >> > >>> > >> > >>> covered by > >> > >>> > >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> > >>> ... > >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> > >> > >> > >> OK. I mistakenly understood those regions could be freed after exiting > >> > >> UEFI boot services. > >> > >> > >> > >>> > >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> > >>> >> when booting the next kernel. > >> > >>> > > >> > >>> > not really. > >> > >>> > > >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> > >>> >> > on crash dump kernel?) > >> > >>> >> > > >> > >>> >> > >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> > >>> >> regions only revealed the bug, not created it (given that other > >> > >>> >> memblock_reserve regions may be affected as well) > >> > >>> > > >> > >>> > As whether we should honor such reserved regions over kexec'ing > >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> > >>> > As a matter of fact, no information about "reserved" memblocks is > >> > >>> > exposed to user space (via proc/iomem). > >> > >>> > > >> > >>> > >> > >>> That is why I suggested (somewhere in this thread?) to not expose them > >> > >>> as 'System RAM'. Do you think that could solve this? > >> > >> > >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> > >> marking them under another name in /proc/iomem would also be good in order > >> > >> not to allocate them as part of crash kernel's memory. > >> > >> > >> > > > >> > > I agree. However, this may not be entirely trivial, since iterating > >> > > over the memblock_reserved table and creating iomem entries may result > >> > > in collisions. > >> > > >> > I found a method (using the patch I shared earlier in this thread) to mark these > >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> > reserved regions. > >> > > >> > >> But I'm not still convinced that we should export them in useable- > >> > >> memory-range to crash dump kernel. They will be accessed through > >> > >> acpi_os_map_memory() and so won't be required to be part of system ram > >> > >> (or memblocks), I guess. > >> > > > >> > > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > > which is exactly what we want in this case. > >> > > >> > Now this is what is confusing me. I don't see the above happening. > >> > > >> > I see that the primary kernel boots up and adds the ACPI regions via: > >> > acpi_os_ioremap > >> > -> ioremap_cache > >> > > >> > But during the crashkernel boot, ''acpi_os_ioremap' calls > >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> > variant. > > > > It is natural if that region is out of memblocks. > > Thanks for the confirmation. This was my understanding as well. > > >> > And it fails while accessing the ACPI tables: > >> > > >> > [ 0.039205] ACPI: Core revision 20170728 > >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > > > > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. > > As ioremap() makes the mapping as "Device memory", unaligned memory > > access won't be allowed. > > > >> > [ 0.100022] Modules linked in: > >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> > pstate: 60000045 > >> > [ 0.132647] sp : ffff000008ccfb40 > >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> > [ 0.223224] Call trace: > >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> > ffff0000095e3980 ffff000008ccfbe0 > >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> > ffff000008ccfc50 0000000000000000 > >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> > 00000000ffffff76 0000000000000006 > >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> > 000000000000038e 0000000000000000 > >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> > 0000000000000005 000000000000001b > >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> > ffff000009710027 0000000000000001 > >> > [ 0.279667] fac0: 0000000000000001 000000000000001b > >> > 0000000000000000 ffff0000088be820 > >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> > ffff00000849b4f8 ffff000008ccfb40 > >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> > ffff000008ccfb40 ffff000008260a18 > >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> > ffff000008ccfb40 ffff0000084a6764 > >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> > [ 0.399160] Kernel panic - not syncing: Fatal exception > >> > [ 0.404437] Rebooting in 10 seconds. > >> > > >> > So, I think the linear mapping done by the primary kernel does not > >> > make these accessible in the crash kernel directly. > >> > > >> > Any pointers? > >> > >> Can you get the code line number for acpi_ns_lookup+0x25c? > > > > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or > > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned > > accesses? > > (I didn't find out how unaligned accesses could happen there.) > > > > Right. Like I captured somewhere in this thread (perhaps the first > email on this subject), > this is indeed an unaligned address access. > > Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding > assigning this memory range > as device memory doesn't seem a neat solution as it means we are not > marking some thing with the right memory attribute and we can fall in > similar/related issues later. > > Regarding the later suggestion, what I am seeing now is that the acpi > table access functions are perhaps reused from the earlier x86 > implementation, but on the arm64 (or even arm) arch we should not be > allowing unaligned accesses which might cause UNDEFINED behaviour and > resultant crash. > > So I can try going this approach and see if it works for me. > > However, I am still not very sure as to why the crashkernel ranges > historically do not include the System RAM regions (which may include > the ACPI regions as well). These regions are available for the kernel > usage and perhaps should be exported to the crashkernel as well. > > I am not fully aware of the previous discussions on capp'ing the > crashkernel memory being passed to the kdump kernel, but did we run > into any issues while doing so? > > Also, even if I extend the kexec-tools to modify the > linux,usable-memory-range and add the ACPI regions to it, the > crashkernel fails to boot with the below message (I have added some > logic to print the DTB on the crash kernel boot start): > > [ 0.000000] chosen { > [ 0.000000] linux,usable-memory-range > [ 0.000000] = < > [ 0.000000] 0x00000000 > [ 0.000000] 0x0e800000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x20000000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x396c0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x000a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x39770000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00040000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x398a0000 > [ 0.000000] 0x00000000 > [ 0.000000] 0x00020000 > [ 0.000000] > > [ 0.000000] ; > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. Please show me: * "Virtual kernel memory layout" in dmesg * /proc/iomem * debug messages from kexec-tools (kexec -d) -Takahiro AKASHI > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec@lists.fedoraproject.org > >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 11:18 ` AKASHI Takahiro (?) (?) @ 2017-12-18 22:28 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it >> >> to kexec@lists.infradead.org >> >> >> >> Also add linux-acpi list >> > >> > Thank you. >> > >> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> >> > <ard.biesheuvel@linaro.org> wrote: >> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> >> > > <takahiro.akashi@linaro.org> wrote: >> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >> > >>> <takahiro.akashi@linaro.org> wrote: >> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> >> > >>> >> > Bhupesh, Ard, >> >> > >>> >> > >> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> > >>> >> >> Hi Ard, Akashi >> >> > >>> >> >> >> >> > >>> >> > (snip) >> >> > >>> >> > >> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> > >>> >> >> , for details) >> >> > >>> >> > >> >> > >>> >> > Right. >> >> > >>> >> > >> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> > >>> >> >> with the crashkernel memory range: >> >> > >>> >> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> > >>> >> >> address_cells, size_cells); >> >> > >>> >> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> > >>> >> >> , for details) >> >> > >>> >> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> > >>> >> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> >> > >>> >> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> > >>> >> >> -r`.img --reuse-cmdline -d >> >> > >>> >> >> >> >> > >>> >> >> [snip..] >> >> > >>> >> >> >> >> > >>> >> >> Reserved memory range >> >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> >> > >>> >> >> >> >> > >>> >> >> Coredump memory ranges >> >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> >> > >>> >> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >> > >>> >> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> >> > >>> >> >> { >> >> > >>> >> >> struct memblock_region reg = { >> >> > >>> >> >> .size = 0, >> >> > >>> >> >> }; >> >> > >>> >> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >>> >> >> >> >> > >>> >> >> if (reg.size) >> >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> > >>> >> >> comment this out */ >> >> > >>> >> >> } >> >> > >>> >> > >> >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> >> > >>> >> > memory contents of the *crashed* kernel. >> >> > >>> >> > >> >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> >> > >>> >> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> > >>> >> >> fail. >> >> > >>> >> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> > >>> >> >> dt node 'linux,usable-memory-range' >> >> > >>> >> > >> >> > >>> >> > I still don't understand why we need to carry over the information >> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> > >>> >> > such regions are free to be reused by the kernel after some point of >> >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> >> > >>> >> memblock_reserve()'d now. >> >> > >>> > >> >> > >>> > For my better understandings, who is actually accessing such regions >> >> > >>> > during boot time, uefi itself or efistub? >> >> > >>> > >> >> > >>> >> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> >> > >>> instance, on QEMU we have >> >> > >>> >> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> > >>> 01000013) >> >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> >> >> > >>> covered by >> >> > >>> >> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> > >>> ... >> >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> > >> >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> >> > >> UEFI boot services. >> >> > >> >> >> > >>> >> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >> > >>> >> when booting the next kernel. >> >> > >>> > >> >> > >>> > not really. >> >> > >>> > >> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >> > >>> >> > on crash dump kernel?) >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >> > >>> >> regions only revealed the bug, not created it (given that other >> >> > >>> >> memblock_reserve regions may be affected as well) >> >> > >>> > >> >> > >>> > As whether we should honor such reserved regions over kexec'ing >> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> >> > >>> > exposed to user space (via proc/iomem). >> >> > >>> > >> >> > >>> >> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> >> > >>> as 'System RAM'. Do you think that could solve this? >> >> > >> >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> > >> marking them under another name in /proc/iomem would also be good in order >> >> > >> not to allocate them as part of crash kernel's memory. >> >> > >> >> >> > > >> >> > > I agree. However, this may not be entirely trivial, since iterating >> >> > > over the memblock_reserved table and creating iomem entries may result >> >> > > in collisions. >> >> > >> >> > I found a method (using the patch I shared earlier in this thread) to mark these >> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> >> > reserved regions. >> >> > >> >> > >> But I'm not still convinced that we should export them in useable- >> >> > >> memory-range to crash dump kernel. They will be accessed through >> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> > >> (or memblocks), I guess. >> >> > > >> >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> >> > > which is exactly what we want in this case. >> >> > >> >> > Now this is what is confusing me. I don't see the above happening. >> >> > >> >> > I see that the primary kernel boots up and adds the ACPI regions via: >> >> > acpi_os_ioremap >> >> > -> ioremap_cache >> >> > >> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> >> > variant. >> > >> > It is natural if that region is out of memblocks. >> >> Thanks for the confirmation. This was my understanding as well. >> >> >> > And it fails while accessing the ACPI tables: >> >> > >> >> > [ 0.039205] ACPI: Core revision 20170728 >> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> > >> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. >> > As ioremap() makes the mapping as "Device memory", unaligned memory >> > access won't be allowed. >> > >> >> > [ 0.100022] Modules linked in: >> >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> >> > pstate: 60000045 >> >> > [ 0.132647] sp : ffff000008ccfb40 >> >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> >> > [ 0.223224] Call trace: >> >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> >> > ffff0000095e3980 ffff000008ccfbe0 >> >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> >> > ffff000008ccfc50 0000000000000000 >> >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> >> > 00000000ffffff76 0000000000000006 >> >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> >> > 000000000000038e 0000000000000000 >> >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> >> > 0000000000000005 000000000000001b >> >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> >> > ffff000009710027 0000000000000001 >> >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> >> > 0000000000000000 ffff0000088be820 >> >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> >> > ffff00000849b4f8 ffff000008ccfb40 >> >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> >> > ffff000008ccfb40 ffff000008260a18 >> >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> >> > ffff000008ccfb40 ffff0000084a6764 >> >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> >> > [ 0.404437] Rebooting in 10 seconds. >> >> > >> >> > So, I think the linear mapping done by the primary kernel does not >> >> > make these accessible in the crash kernel directly. >> >> > >> >> > Any pointers? >> >> >> >> Can you get the code line number for acpi_ns_lookup+0x25c? >> > >> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or >> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned >> > accesses? >> > (I didn't find out how unaligned accesses could happen there.) >> > >> >> Right. Like I captured somewhere in this thread (perhaps the first >> email on this subject), >> this is indeed an unaligned address access. >> >> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding >> assigning this memory range >> as device memory doesn't seem a neat solution as it means we are not >> marking some thing with the right memory attribute and we can fall in >> similar/related issues later. >> >> Regarding the later suggestion, what I am seeing now is that the acpi >> table access functions are perhaps reused from the earlier x86 >> implementation, but on the arm64 (or even arm) arch we should not be >> allowing unaligned accesses which might cause UNDEFINED behaviour and >> resultant crash. >> >> So I can try going this approach and see if it works for me. >> >> However, I am still not very sure as to why the crashkernel ranges >> historically do not include the System RAM regions (which may include >> the ACPI regions as well). These regions are available for the kernel >> usage and perhaps should be exported to the crashkernel as well. >> >> I am not fully aware of the previous discussions on capp'ing the >> crashkernel memory being passed to the kdump kernel, but did we run >> into any issues while doing so? >> >> Also, even if I extend the kexec-tools to modify the >> linux,usable-memory-range and add the ACPI regions to it, the >> crashkernel fails to boot with the below message (I have added some >> logic to print the DTB on the crash kernel boot start): >> >> [ 0.000000] chosen { >> [ 0.000000] linux,usable-memory-range >> [ 0.000000] = < >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x0e800000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x20000000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x396c0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x000a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x39770000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00040000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x398a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00020000 >> [ 0.000000] > >> [ 0.000000] ; >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. > > Please show me: > * "Virtual kernel memory layout" in dmesg > * /proc/iomem > * debug messages from kexec-tools (kexec -d) So here are the changes which I have done so far in the kernel and kexec-tools to allow mapping ACPI reclaim regions as identifiable regions in '/proc/iomem' and to append them to the DTB property: linux,usable-memory-range: Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>, and <https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848> kexec-tools patch: <https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650> Note that I am not very clear about the hole margins that the kexec-tools adds (so that the crashkernel's expectation that the kernel image and initrd lie within a 1G boundary), so I have not added my temporary changes to the github code - but any suggestions on how to correctly put them in place would be appreciated. And here are the rest of the inputs you asked for: (1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout" [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+ root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 [ 0.000000] PCIe ASPM is disabled [ 0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 267251520K/268169216K available (7868K kernel code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K reserved, 0K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000 (126847 GB) [ 0.000000] .text : 0x (ptrval) - 0x (ptrval) ( 7872 KB) [ 0.000000] .rodata : 0x (ptrval) - 0x (ptrval) ( 3392 KB) [ 0.000000] .init : 0x (ptrval) - 0x (ptrval) ( 1280 KB) [ 0.000000] .data : 0x (ptrval) - 0x (ptrval) ( 1765 KB) [ 0.000000] .bss : 0x (ptrval) - 0x (ptrval) ( 7728 KB) [ 0.000000] fixed : 0xffff7fdffe7b0000 - 0xffff7fdffec00000 ( 4416 KB) [ 0.000000] PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000 ( 16 MB) [ 0.000000] vmemmap : 0xffff7fe000000000 - 0xffff800000000000 ( 128 GB maximum) [ 0.000000] 0xffff7fe000000000 - 0xffff7fe02bff0000 ( 703 MB actual) [ 0.000000] memory : 0xffff800000000000 - 0xffff80affc000000 (720832 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4 [ 0.000000] ftrace: allocating 29903 entries in 8 pages [ 0.000000] Hierarchical RCU implementation. (2) # cat /proc/iomem 00000000-3961ffff : System RAM 00080000-00b7ffff : Kernel code 00cc0000-0166ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : ACPI reclaim region 39760000-3976ffff : reserved 39770000-397affff : ACPI reclaim region 397b0000-3989ffff : reserved 398a0000-398bffff : ACPI reclaim region 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3ed30000-3ed5ffff : reserved 3ed60000-3fbfffff : System RAM 40500000-40500fff : sbsa-gwdt.0 40500000-40500fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 60080000-6008ffff : HISI0152:00 602b0000-602b0fff : ARMH0011:00 602b0000-602b0fff : ARMH0011:00 603c0000-603cffff : HISI0141:00 603c0000-603cffff : HISI0141:00 a0080000-a008ffff : HISI0152:05 a0080000-a008ffff : HISI0152:04 a0080000-a008ffff : HISI0152:03 a00a0000-a00affff : pnp 00:01 a01b0000-a01b0fff : HISI0191:00 a2000000-a200ffff : HISI0162:01 a2000000-a200ffff : HISI0162:01 a3000000-a300ffff : HISI0162:02 a3000000-a300ffff : HISI0162:02 a7020000-a702ffff : PNP0D20:00 a7020000-a702ffff : PNP0D20:00 b0000000-be7fffff : PCI Bus 0002:e8 b0000000-b06fffff : PCI Bus 0002:e9 b0000000-b00fffff : 0002:e9:00.0 b0000000-b00fffff : igb b0100000-b01fffff : 0002:e9:00.0 b0200000-b02fffff : 0002:e9:00.1 b0200000-b02fffff : igb b0300000-b03fffff : 0002:e9:00.1 b0400000-b04fffff : 0002:e9:00.2 b0400000-b04fffff : igb b0500000-b05fffff : 0002:e9:00.3 b0500000-b05fffff : igb b0600000-b0603fff : 0002:e9:00.0 b0600000-b0603fff : igb b0604000-b0607fff : 0002:e9:00.1 b0604000-b0607fff : igb b0608000-b060bfff : 0002:e9:00.2 b0608000-b060bfff : igb b060c000-b060ffff : 0002:e9:00.3 b060c000-b060ffff : igb b0700000-b0afffff : PCI Bus 0002:e9 b0700000-b077ffff : 0002:e9:00.0 b0780000-b07fffff : 0002:e9:00.0 b0800000-b087ffff : 0002:e9:00.1 b0880000-b08fffff : 0002:e9:00.1 b0900000-b097ffff : 0002:e9:00.2 b0980000-b09fffff : 0002:e9:00.2 b0a00000-b0a7ffff : 0002:e9:00.3 b0a80000-b0afffff : 0002:e9:00.3 b0b00000-b0b0ffff : 0002:e8:00.0 be800000-beffffff : PCI ECAM c0080000-c008ffff : HISI0152:02 c0080000-c008ffff : HISI0152:01 c3000000-c300ffff : HISI0162:00 c3000000-c300ffff : HISI0162:00 c5000000-c588ffff : HISI00B2:00 c5000000-c588ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 d0080000-d008ffff : HISI0152:07 d0080000-d008ffff : HISI0152:06 d0100000-d010ffff : HISI02A1:00 d0100000-d010ffff : HISI02A1:00 400000000-4007fffff : PCI ECAM 440000000-4ffffffff : PCI Bus 0005:00 440000000-4407fffff : PCI Bus 0005:01 440000000-4403fffff : 0005:01:00.0 440400000-4407fffff : 0005:01:00.1 440800000-4421fffff : PCI Bus 0005:01 440800000-440bfffff : 0005:01:00.0 440800000-440bfffff : ixgbe 440c00000-440ffffff : 0005:01:00.1 440c00000-440ffffff : ixgbe 441000000-4413fffff : 0005:01:00.0 441400000-4417fffff : 0005:01:00.0 441800000-441bfffff : 0005:01:00.1 441c00000-441ffffff : 0005:01:00.1 442000000-442003fff : 0005:01:00.0 442000000-442003fff : ixgbe 442004000-442007fff : 0005:01:00.1 442004000-442007fff : ixgbe 442200000-442200fff : 0005:00:00.0 700090000-70009ffff : pnp 00:03 7000a0000-7000affff : pnp 00:05 7000b0000-7000bffff : pnp 00:06 700200000-70020ffff : pnp 00:04 740800000-740ffffff : PCI ECAM 741000000-77ffeffff : PCI Bus 0006:08 741000000-74100ffff : 0006:08:00.0 784000000-7847fffff : PCI ECAM 784800000-7bffeffff : PCI Bus 0007:40 784800000-7849fffff : PCI Bus 0007:41 784800000-7849fffff : 0007:41:00.0 786000000-787ffffff : PCI Bus 0007:41 786000000-787ffffff : 0007:41:00.0 7c4800000-7c4ffffff : PCI ECAM 7c5000000-7fffeffff : PCI Bus 0004:48 7c5000000-7c51fffff : PCI Bus 0004:49 7c5000000-7c50fffff : 0004:49:00.0 7c5100000-7c513ffff : 0004:49:00.0 7c5100000-7c513ffff : mpt3sas 7c5140000-7c514ffff : 0004:49:00.0 7c5140000-7c514ffff : mpt3sas 7c5200000-7c520ffff : 0004:48:00.0 1040000000-1ffbffffff : System RAM 2000000000-2ffbffffff : System RAM 9000000000-9ffbffffff : System RAM a000000000-affbffffff : System RAM 400c0080000-400c008ffff : HISI0152:08 600a00a0000-600a00affff : pnp 00:08 64001000000-64001ffffff : PCI ECAM 65040000000-650ffffffff : PCI Bus 000a:10 65040000000-6504000ffff : 000a:10:00.0 700a0090000-700a009ffff : pnp 00:0a 700a0200000-700a020ffff : pnp 00:0b 74002000000-74002ffffff : PCI ECAM 75040000000-750ffffffff : PCI Bus 000c:20 75040000000-7504000ffff : 000c:20:00.0 78003000000-78003ffffff : PCI ECAM 79040000000-790ffffffff : PCI Bus 000d:30 79040000000-79040000fff : 000d:30:00.0 (3) # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d arch_process_options:149: command_line: root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img arch_process_options:152: dtb: (null) Try gzip decompression. kernel: 0xffff968d0010 kernel_size: 0xdf9200 get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. image_arm64_load: kernel_segment: 000000000e800000 image_arm64_load: text_offset: 0000000000080000 image_arm64_load: image_size: 00000000015f0000 image_arm64_load: phys_offset: 0000000000000000 image_arm64_load: vp_offset: ffffffffffffffff image_arm64_load: PE format: yes Reserved memory range 000000000e800000-000000002e7fffff (0) Coredump memory ranges 0000000000000000-000000000e7fffff (0) 000000002e800000-000000003961ffff (0) 0000000039d40000-000000003ed2ffff (0) 000000003ed60000-000000003fbfffff (0) 0000001040000000-0000001ffbffffff (0) 0000002000000000-0000002ffbffffff (0) 0000009000000000-0000009ffbffffff (0) 000000a000000000-000000affbffffff (0) ACPI reclaim memory ranges 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) crashkernel memory ranges 000000000e800000-000000002e7fffff (0) 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) kernel symbol _text vaddr = ffff000008080000 load_crashdump_segments: page_offset: ffff800000000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr = 0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000 Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000 p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000 Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000 p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000 Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000 p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000 Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000 p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz = 0xfbc000000 Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000 p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000 p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000 p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000 p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000 Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000 p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000 Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000 p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000 load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff read_1st_dtb: found /sys/firmware/fdt get_cells_size: #address-cells:2 #size-cells:2 cells_size_fitted: 2e7f0000-2e7f0fff cells_size_fitted: e800000-2e7fffff cells_size_fitted: 396c0000-3975ffff cells_size_fitted: 39770000-397affff cells_size_fitted: 398a0000-398bffff / { #size-cells = <0x00000002>; #address-cells = <0x00000002>; chosen { linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000 0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000 0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000 0x00020000>; linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>; linux,uefi-mmap-desc-ver = <0x00000001>; linux,uefi-mmap-desc-size = <0x00000030>; linux,uefi-mmap-size = <0x00000e40>; linux,uefi-mmap-start = <0x00000000 0x30288018>; linux,uefi-system-table = <0x00000000 0x3ed50018>; bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200"; linux,initrd-end = <0x00000000 0x2fbff9e0>; linux,initrd-start = <0x00000000 0x2e84d000>; }; }; initrd: base fe70000, size 13b29e0h (20654560), end 112229e0 [snip..] sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c sym: sha256_starts value: 11240eb0 addr: 11240018 machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6 sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c sym: sha256_update value: 11245158 addr: 11240034 machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449 sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc sym: sha256_finish value: 11245164 addr: 11240050 machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445 sym: memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34 sym: memcmp value: 11240634 addr: 11240060 machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240070 machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240078 machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240088 machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400a8 machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400b0 machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400c0 machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400d4 machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112453a8 addr: 112400f0 machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245338 addr: 112400f8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245358 addr: 11240100 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245368 addr: 11240108 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 1124536e addr: 11240110 machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245370 addr: 11240118 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 1124012c machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106 sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4 sym: setup_arch value: 11240ea8 addr: 11240130 machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0 sym: verify_sha256_digest value: 11240000 addr: 11240134 machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3 sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4 sym: post_verification_setup_arch value: 11240ea4 addr: 11240144 machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245380 addr: 11240148 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 112401ac machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240220 machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240478 machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245392 addr: 112404b8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 11240538 machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 112405c8 machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2 sym: purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28 sym: purgatory value: 11240120 addr: 11240678 machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8 sym: arm64_kernel_entry value: 112454c8 addr: 1124067c machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271 sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8 sym: arm64_dtb_addr value: 112454d0 addr: 11240680 machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 112450bc machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98 sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245118 machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245130 machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 1124513c machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78 sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112454d8 addr: 11245330 machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8 kexec_load: entry = 0x11240670 flags = 0xb70001 nr_segments = 5 segment[0].buf = 0xffff968d0010 segment[0].bufsz = 0xdf9200 segment[0].mem = 0xe880000 segment[0].memsz = 0x15f0000 segment[1].buf = 0xffff950e0010 segment[1].bufsz = 0x13b29e0 segment[1].mem = 0xfe70000 segment[1].memsz = 0x13c0000 segment[2].buf = 0x1115b440 segment[2].bufsz = 0x33d segment[2].mem = 0x11230000 segment[2].memsz = 0x10000 segment[3].buf = 0x1115bb70 segment[3].bufsz = 0x5518 segment[3].mem = 0x11240000 segment[3].memsz = 0x10000 segment[4].buf = 0x11159ca0 segment[4].bufsz = 0x1000 segment[4].mem = 0x2e7f0000 segment[4].memsz = 0x10000 Regards, Bhupesh > > >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec@lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 22:28 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it >> >> to kexec@lists.infradead.org >> >> >> >> Also add linux-acpi list >> > >> > Thank you. >> > >> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> >> > <ard.biesheuvel@linaro.org> wrote: >> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> >> > > <takahiro.akashi@linaro.org> wrote: >> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >> > >>> <takahiro.akashi@linaro.org> wrote: >> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> >> > >>> >> > Bhupesh, Ard, >> >> > >>> >> > >> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> > >>> >> >> Hi Ard, Akashi >> >> > >>> >> >> >> >> > >>> >> > (snip) >> >> > >>> >> > >> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> > >>> >> >> , for details) >> >> > >>> >> > >> >> > >>> >> > Right. >> >> > >>> >> > >> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> > >>> >> >> with the crashkernel memory range: >> >> > >>> >> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> > >>> >> >> address_cells, size_cells); >> >> > >>> >> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> > >>> >> >> , for details) >> >> > >>> >> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> > >>> >> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> >> > >>> >> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> > >>> >> >> -r`.img --reuse-cmdline -d >> >> > >>> >> >> >> >> > >>> >> >> [snip..] >> >> > >>> >> >> >> >> > >>> >> >> Reserved memory range >> >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> >> > >>> >> >> >> >> > >>> >> >> Coredump memory ranges >> >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> >> > >>> >> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >> > >>> >> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> >> > >>> >> >> { >> >> > >>> >> >> struct memblock_region reg = { >> >> > >>> >> >> .size = 0, >> >> > >>> >> >> }; >> >> > >>> >> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >>> >> >> >> >> > >>> >> >> if (reg.size) >> >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> > >>> >> >> comment this out */ >> >> > >>> >> >> } >> >> > >>> >> > >> >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> >> > >>> >> > memory contents of the *crashed* kernel. >> >> > >>> >> > >> >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> >> > >>> >> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> > >>> >> >> fail. >> >> > >>> >> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> > >>> >> >> dt node 'linux,usable-memory-range' >> >> > >>> >> > >> >> > >>> >> > I still don't understand why we need to carry over the information >> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> > >>> >> > such regions are free to be reused by the kernel after some point of >> >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> >> > >>> >> memblock_reserve()'d now. >> >> > >>> > >> >> > >>> > For my better understandings, who is actually accessing such regions >> >> > >>> > during boot time, uefi itself or efistub? >> >> > >>> > >> >> > >>> >> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> >> > >>> instance, on QEMU we have >> >> > >>> >> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> > >>> 01000013) >> >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> >> >> > >>> covered by >> >> > >>> >> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> > >>> ... >> >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> > >> >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> >> > >> UEFI boot services. >> >> > >> >> >> > >>> >> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >> > >>> >> when booting the next kernel. >> >> > >>> > >> >> > >>> > not really. >> >> > >>> > >> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >> > >>> >> > on crash dump kernel?) >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >> > >>> >> regions only revealed the bug, not created it (given that other >> >> > >>> >> memblock_reserve regions may be affected as well) >> >> > >>> > >> >> > >>> > As whether we should honor such reserved regions over kexec'ing >> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> >> > >>> > exposed to user space (via proc/iomem). >> >> > >>> > >> >> > >>> >> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> >> > >>> as 'System RAM'. Do you think that could solve this? >> >> > >> >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> > >> marking them under another name in /proc/iomem would also be good in order >> >> > >> not to allocate them as part of crash kernel's memory. >> >> > >> >> >> > > >> >> > > I agree. However, this may not be entirely trivial, since iterating >> >> > > over the memblock_reserved table and creating iomem entries may result >> >> > > in collisions. >> >> > >> >> > I found a method (using the patch I shared earlier in this thread) to mark these >> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> >> > reserved regions. >> >> > >> >> > >> But I'm not still convinced that we should export them in useable- >> >> > >> memory-range to crash dump kernel. They will be accessed through >> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> > >> (or memblocks), I guess. >> >> > > >> >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> >> > > which is exactly what we want in this case. >> >> > >> >> > Now this is what is confusing me. I don't see the above happening. >> >> > >> >> > I see that the primary kernel boots up and adds the ACPI regions via: >> >> > acpi_os_ioremap >> >> > -> ioremap_cache >> >> > >> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> >> > variant. >> > >> > It is natural if that region is out of memblocks. >> >> Thanks for the confirmation. This was my understanding as well. >> >> >> > And it fails while accessing the ACPI tables: >> >> > >> >> > [ 0.039205] ACPI: Core revision 20170728 >> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> > >> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. >> > As ioremap() makes the mapping as "Device memory", unaligned memory >> > access won't be allowed. >> > >> >> > [ 0.100022] Modules linked in: >> >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> >> > pstate: 60000045 >> >> > [ 0.132647] sp : ffff000008ccfb40 >> >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> >> > [ 0.223224] Call trace: >> >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> >> > ffff0000095e3980 ffff000008ccfbe0 >> >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> >> > ffff000008ccfc50 0000000000000000 >> >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> >> > 00000000ffffff76 0000000000000006 >> >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> >> > 000000000000038e 0000000000000000 >> >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> >> > 0000000000000005 000000000000001b >> >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> >> > ffff000009710027 0000000000000001 >> >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> >> > 0000000000000000 ffff0000088be820 >> >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> >> > ffff00000849b4f8 ffff000008ccfb40 >> >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> >> > ffff000008ccfb40 ffff000008260a18 >> >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> >> > ffff000008ccfb40 ffff0000084a6764 >> >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> >> > [ 0.404437] Rebooting in 10 seconds. >> >> > >> >> > So, I think the linear mapping done by the primary kernel does not >> >> > make these accessible in the crash kernel directly. >> >> > >> >> > Any pointers? >> >> >> >> Can you get the code line number for acpi_ns_lookup+0x25c? >> > >> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or >> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned >> > accesses? >> > (I didn't find out how unaligned accesses could happen there.) >> > >> >> Right. Like I captured somewhere in this thread (perhaps the first >> email on this subject), >> this is indeed an unaligned address access. >> >> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding >> assigning this memory range >> as device memory doesn't seem a neat solution as it means we are not >> marking some thing with the right memory attribute and we can fall in >> similar/related issues later. >> >> Regarding the later suggestion, what I am seeing now is that the acpi >> table access functions are perhaps reused from the earlier x86 >> implementation, but on the arm64 (or even arm) arch we should not be >> allowing unaligned accesses which might cause UNDEFINED behaviour and >> resultant crash. >> >> So I can try going this approach and see if it works for me. >> >> However, I am still not very sure as to why the crashkernel ranges >> historically do not include the System RAM regions (which may include >> the ACPI regions as well). These regions are available for the kernel >> usage and perhaps should be exported to the crashkernel as well. >> >> I am not fully aware of the previous discussions on capp'ing the >> crashkernel memory being passed to the kdump kernel, but did we run >> into any issues while doing so? >> >> Also, even if I extend the kexec-tools to modify the >> linux,usable-memory-range and add the ACPI regions to it, the >> crashkernel fails to boot with the below message (I have added some >> logic to print the DTB on the crash kernel boot start): >> >> [ 0.000000] chosen { >> [ 0.000000] linux,usable-memory-range >> [ 0.000000] = < >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x0e800000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x20000000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x396c0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x000a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x39770000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00040000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x398a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00020000 >> [ 0.000000] > >> [ 0.000000] ; >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. > > Please show me: > * "Virtual kernel memory layout" in dmesg > * /proc/iomem > * debug messages from kexec-tools (kexec -d) So here are the changes which I have done so far in the kernel and kexec-tools to allow mapping ACPI reclaim regions as identifiable regions in '/proc/iomem' and to append them to the DTB property: linux,usable-memory-range: Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>, and <https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848> kexec-tools patch: <https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650> Note that I am not very clear about the hole margins that the kexec-tools adds (so that the crashkernel's expectation that the kernel image and initrd lie within a 1G boundary), so I have not added my temporary changes to the github code - but any suggestions on how to correctly put them in place would be appreciated. And here are the rest of the inputs you asked for: (1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout" [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+ root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 [ 0.000000] PCIe ASPM is disabled [ 0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 267251520K/268169216K available (7868K kernel code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K reserved, 0K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000 (126847 GB) [ 0.000000] .text : 0x (ptrval) - 0x (ptrval) ( 7872 KB) [ 0.000000] .rodata : 0x (ptrval) - 0x (ptrval) ( 3392 KB) [ 0.000000] .init : 0x (ptrval) - 0x (ptrval) ( 1280 KB) [ 0.000000] .data : 0x (ptrval) - 0x (ptrval) ( 1765 KB) [ 0.000000] .bss : 0x (ptrval) - 0x (ptrval) ( 7728 KB) [ 0.000000] fixed : 0xffff7fdffe7b0000 - 0xffff7fdffec00000 ( 4416 KB) [ 0.000000] PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000 ( 16 MB) [ 0.000000] vmemmap : 0xffff7fe000000000 - 0xffff800000000000 ( 128 GB maximum) [ 0.000000] 0xffff7fe000000000 - 0xffff7fe02bff0000 ( 703 MB actual) [ 0.000000] memory : 0xffff800000000000 - 0xffff80affc000000 (720832 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4 [ 0.000000] ftrace: allocating 29903 entries in 8 pages [ 0.000000] Hierarchical RCU implementation. (2) # cat /proc/iomem 00000000-3961ffff : System RAM 00080000-00b7ffff : Kernel code 00cc0000-0166ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : ACPI reclaim region 39760000-3976ffff : reserved 39770000-397affff : ACPI reclaim region 397b0000-3989ffff : reserved 398a0000-398bffff : ACPI reclaim region 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3ed30000-3ed5ffff : reserved 3ed60000-3fbfffff : System RAM 40500000-40500fff : sbsa-gwdt.0 40500000-40500fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 60080000-6008ffff : HISI0152:00 602b0000-602b0fff : ARMH0011:00 602b0000-602b0fff : ARMH0011:00 603c0000-603cffff : HISI0141:00 603c0000-603cffff : HISI0141:00 a0080000-a008ffff : HISI0152:05 a0080000-a008ffff : HISI0152:04 a0080000-a008ffff : HISI0152:03 a00a0000-a00affff : pnp 00:01 a01b0000-a01b0fff : HISI0191:00 a2000000-a200ffff : HISI0162:01 a2000000-a200ffff : HISI0162:01 a3000000-a300ffff : HISI0162:02 a3000000-a300ffff : HISI0162:02 a7020000-a702ffff : PNP0D20:00 a7020000-a702ffff : PNP0D20:00 b0000000-be7fffff : PCI Bus 0002:e8 b0000000-b06fffff : PCI Bus 0002:e9 b0000000-b00fffff : 0002:e9:00.0 b0000000-b00fffff : igb b0100000-b01fffff : 0002:e9:00.0 b0200000-b02fffff : 0002:e9:00.1 b0200000-b02fffff : igb b0300000-b03fffff : 0002:e9:00.1 b0400000-b04fffff : 0002:e9:00.2 b0400000-b04fffff : igb b0500000-b05fffff : 0002:e9:00.3 b0500000-b05fffff : igb b0600000-b0603fff : 0002:e9:00.0 b0600000-b0603fff : igb b0604000-b0607fff : 0002:e9:00.1 b0604000-b0607fff : igb b0608000-b060bfff : 0002:e9:00.2 b0608000-b060bfff : igb b060c000-b060ffff : 0002:e9:00.3 b060c000-b060ffff : igb b0700000-b0afffff : PCI Bus 0002:e9 b0700000-b077ffff : 0002:e9:00.0 b0780000-b07fffff : 0002:e9:00.0 b0800000-b087ffff : 0002:e9:00.1 b0880000-b08fffff : 0002:e9:00.1 b0900000-b097ffff : 0002:e9:00.2 b0980000-b09fffff : 0002:e9:00.2 b0a00000-b0a7ffff : 0002:e9:00.3 b0a80000-b0afffff : 0002:e9:00.3 b0b00000-b0b0ffff : 0002:e8:00.0 be800000-beffffff : PCI ECAM c0080000-c008ffff : HISI0152:02 c0080000-c008ffff : HISI0152:01 c3000000-c300ffff : HISI0162:00 c3000000-c300ffff : HISI0162:00 c5000000-c588ffff : HISI00B2:00 c5000000-c588ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 d0080000-d008ffff : HISI0152:07 d0080000-d008ffff : HISI0152:06 d0100000-d010ffff : HISI02A1:00 d0100000-d010ffff : HISI02A1:00 400000000-4007fffff : PCI ECAM 440000000-4ffffffff : PCI Bus 0005:00 440000000-4407fffff : PCI Bus 0005:01 440000000-4403fffff : 0005:01:00.0 440400000-4407fffff : 0005:01:00.1 440800000-4421fffff : PCI Bus 0005:01 440800000-440bfffff : 0005:01:00.0 440800000-440bfffff : ixgbe 440c00000-440ffffff : 0005:01:00.1 440c00000-440ffffff : ixgbe 441000000-4413fffff : 0005:01:00.0 441400000-4417fffff : 0005:01:00.0 441800000-441bfffff : 0005:01:00.1 441c00000-441ffffff : 0005:01:00.1 442000000-442003fff : 0005:01:00.0 442000000-442003fff : ixgbe 442004000-442007fff : 0005:01:00.1 442004000-442007fff : ixgbe 442200000-442200fff : 0005:00:00.0 700090000-70009ffff : pnp 00:03 7000a0000-7000affff : pnp 00:05 7000b0000-7000bffff : pnp 00:06 700200000-70020ffff : pnp 00:04 740800000-740ffffff : PCI ECAM 741000000-77ffeffff : PCI Bus 0006:08 741000000-74100ffff : 0006:08:00.0 784000000-7847fffff : PCI ECAM 784800000-7bffeffff : PCI Bus 0007:40 784800000-7849fffff : PCI Bus 0007:41 784800000-7849fffff : 0007:41:00.0 786000000-787ffffff : PCI Bus 0007:41 786000000-787ffffff : 0007:41:00.0 7c4800000-7c4ffffff : PCI ECAM 7c5000000-7fffeffff : PCI Bus 0004:48 7c5000000-7c51fffff : PCI Bus 0004:49 7c5000000-7c50fffff : 0004:49:00.0 7c5100000-7c513ffff : 0004:49:00.0 7c5100000-7c513ffff : mpt3sas 7c5140000-7c514ffff : 0004:49:00.0 7c5140000-7c514ffff : mpt3sas 7c5200000-7c520ffff : 0004:48:00.0 1040000000-1ffbffffff : System RAM 2000000000-2ffbffffff : System RAM 9000000000-9ffbffffff : System RAM a000000000-affbffffff : System RAM 400c0080000-400c008ffff : HISI0152:08 600a00a0000-600a00affff : pnp 00:08 64001000000-64001ffffff : PCI ECAM 65040000000-650ffffffff : PCI Bus 000a:10 65040000000-6504000ffff : 000a:10:00.0 700a0090000-700a009ffff : pnp 00:0a 700a0200000-700a020ffff : pnp 00:0b 74002000000-74002ffffff : PCI ECAM 75040000000-750ffffffff : PCI Bus 000c:20 75040000000-7504000ffff : 000c:20:00.0 78003000000-78003ffffff : PCI ECAM 79040000000-790ffffffff : PCI Bus 000d:30 79040000000-79040000fff : 000d:30:00.0 (3) # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d arch_process_options:149: command_line: root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img arch_process_options:152: dtb: (null) Try gzip decompression. kernel: 0xffff968d0010 kernel_size: 0xdf9200 get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. image_arm64_load: kernel_segment: 000000000e800000 image_arm64_load: text_offset: 0000000000080000 image_arm64_load: image_size: 00000000015f0000 image_arm64_load: phys_offset: 0000000000000000 image_arm64_load: vp_offset: ffffffffffffffff image_arm64_load: PE format: yes Reserved memory range 000000000e800000-000000002e7fffff (0) Coredump memory ranges 0000000000000000-000000000e7fffff (0) 000000002e800000-000000003961ffff (0) 0000000039d40000-000000003ed2ffff (0) 000000003ed60000-000000003fbfffff (0) 0000001040000000-0000001ffbffffff (0) 0000002000000000-0000002ffbffffff (0) 0000009000000000-0000009ffbffffff (0) 000000a000000000-000000affbffffff (0) ACPI reclaim memory ranges 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) crashkernel memory ranges 000000000e800000-000000002e7fffff (0) 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) kernel symbol _text vaddr = ffff000008080000 load_crashdump_segments: page_offset: ffff800000000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr = 0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000 Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000 p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000 Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000 p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000 Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000 p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000 Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000 p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz = 0xfbc000000 Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000 p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000 p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000 p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000 p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000 Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000 p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000 Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000 p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000 load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff read_1st_dtb: found /sys/firmware/fdt get_cells_size: #address-cells:2 #size-cells:2 cells_size_fitted: 2e7f0000-2e7f0fff cells_size_fitted: e800000-2e7fffff cells_size_fitted: 396c0000-3975ffff cells_size_fitted: 39770000-397affff cells_size_fitted: 398a0000-398bffff / { #size-cells = <0x00000002>; #address-cells = <0x00000002>; chosen { linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000 0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000 0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000 0x00020000>; linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>; linux,uefi-mmap-desc-ver = <0x00000001>; linux,uefi-mmap-desc-size = <0x00000030>; linux,uefi-mmap-size = <0x00000e40>; linux,uefi-mmap-start = <0x00000000 0x30288018>; linux,uefi-system-table = <0x00000000 0x3ed50018>; bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200"; linux,initrd-end = <0x00000000 0x2fbff9e0>; linux,initrd-start = <0x00000000 0x2e84d000>; }; }; initrd: base fe70000, size 13b29e0h (20654560), end 112229e0 [snip..] sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c sym: sha256_starts value: 11240eb0 addr: 11240018 machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6 sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c sym: sha256_update value: 11245158 addr: 11240034 machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449 sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc sym: sha256_finish value: 11245164 addr: 11240050 machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445 sym: memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34 sym: memcmp value: 11240634 addr: 11240060 machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240070 machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240078 machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240088 machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400a8 machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400b0 machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400c0 machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400d4 machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112453a8 addr: 112400f0 machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245338 addr: 112400f8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245358 addr: 11240100 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245368 addr: 11240108 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 1124536e addr: 11240110 machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245370 addr: 11240118 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 1124012c machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106 sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4 sym: setup_arch value: 11240ea8 addr: 11240130 machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0 sym: verify_sha256_digest value: 11240000 addr: 11240134 machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3 sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4 sym: post_verification_setup_arch value: 11240ea4 addr: 11240144 machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245380 addr: 11240148 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 112401ac machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240220 machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240478 machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245392 addr: 112404b8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 11240538 machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 112405c8 machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2 sym: purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28 sym: purgatory value: 11240120 addr: 11240678 machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8 sym: arm64_kernel_entry value: 112454c8 addr: 1124067c machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271 sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8 sym: arm64_dtb_addr value: 112454d0 addr: 11240680 machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 112450bc machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98 sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245118 machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245130 machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 1124513c machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78 sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112454d8 addr: 11245330 machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8 kexec_load: entry = 0x11240670 flags = 0xb70001 nr_segments = 5 segment[0].buf = 0xffff968d0010 segment[0].bufsz = 0xdf9200 segment[0].mem = 0xe880000 segment[0].memsz = 0x15f0000 segment[1].buf = 0xffff950e0010 segment[1].bufsz = 0x13b29e0 segment[1].mem = 0xfe70000 segment[1].memsz = 0x13c0000 segment[2].buf = 0x1115b440 segment[2].bufsz = 0x33d segment[2].mem = 0x11230000 segment[2].memsz = 0x10000 segment[3].buf = 0x1115bb70 segment[3].bufsz = 0x5518 segment[3].mem = 0x11240000 segment[3].memsz = 0x10000 segment[4].buf = 0x11159ca0 segment[4].bufsz = 0x1000 segment[4].mem = 0x2e7f0000 segment[4].memsz = 0x10000 Regards, Bhupesh > > >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec@lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 22:28 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> >> kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it >> >> to kexec at lists.infradead.org >> >> >> >> Also add linux-acpi list >> > >> > Thank you. >> > >> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> >> > <ard.biesheuvel@linaro.org> wrote: >> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> >> > > <takahiro.akashi@linaro.org> wrote: >> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >> > >>> <takahiro.akashi@linaro.org> wrote: >> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> >> > >>> >> > Bhupesh, Ard, >> >> > >>> >> > >> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> > >>> >> >> Hi Ard, Akashi >> >> > >>> >> >> >> >> > >>> >> > (snip) >> >> > >>> >> > >> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> > >>> >> >> , for details) >> >> > >>> >> > >> >> > >>> >> > Right. >> >> > >>> >> > >> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> > >>> >> >> with the crashkernel memory range: >> >> > >>> >> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> > >>> >> >> address_cells, size_cells); >> >> > >>> >> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> > >>> >> >> , for details) >> >> > >>> >> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> > >>> >> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> >> > >>> >> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> > >>> >> >> -r`.img --reuse-cmdline -d >> >> > >>> >> >> >> >> > >>> >> >> [snip..] >> >> > >>> >> >> >> >> > >>> >> >> Reserved memory range >> >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> >> > >>> >> >> >> >> > >>> >> >> Coredump memory ranges >> >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> >> > >>> >> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >> > >>> >> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> >> > >>> >> >> { >> >> > >>> >> >> struct memblock_region reg = { >> >> > >>> >> >> .size = 0, >> >> > >>> >> >> }; >> >> > >>> >> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >>> >> >> >> >> > >>> >> >> if (reg.size) >> >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> > >>> >> >> comment this out */ >> >> > >>> >> >> } >> >> > >>> >> > >> >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> >> > >>> >> > memory contents of the *crashed* kernel. >> >> > >>> >> > >> >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> >> > >>> >> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> > >>> >> >> fail. >> >> > >>> >> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> > >>> >> >> dt node 'linux,usable-memory-range' >> >> > >>> >> > >> >> > >>> >> > I still don't understand why we need to carry over the information >> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> > >>> >> > such regions are free to be reused by the kernel after some point of >> >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> >> > >>> >> memblock_reserve()'d now. >> >> > >>> > >> >> > >>> > For my better understandings, who is actually accessing such regions >> >> > >>> > during boot time, uefi itself or efistub? >> >> > >>> > >> >> > >>> >> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> >> > >>> instance, on QEMU we have >> >> > >>> >> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> > >>> 01000013) >> >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> >> >> > >>> covered by >> >> > >>> >> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> > >>> ... >> >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> > >> >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> >> > >> UEFI boot services. >> >> > >> >> >> > >>> >> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >> > >>> >> when booting the next kernel. >> >> > >>> > >> >> > >>> > not really. >> >> > >>> > >> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >> > >>> >> > on crash dump kernel?) >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >> > >>> >> regions only revealed the bug, not created it (given that other >> >> > >>> >> memblock_reserve regions may be affected as well) >> >> > >>> > >> >> > >>> > As whether we should honor such reserved regions over kexec'ing >> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> >> > >>> > exposed to user space (via proc/iomem). >> >> > >>> > >> >> > >>> >> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> >> > >>> as 'System RAM'. Do you think that could solve this? >> >> > >> >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> > >> marking them under another name in /proc/iomem would also be good in order >> >> > >> not to allocate them as part of crash kernel's memory. >> >> > >> >> >> > > >> >> > > I agree. However, this may not be entirely trivial, since iterating >> >> > > over the memblock_reserved table and creating iomem entries may result >> >> > > in collisions. >> >> > >> >> > I found a method (using the patch I shared earlier in this thread) to mark these >> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> >> > reserved regions. >> >> > >> >> > >> But I'm not still convinced that we should export them in useable- >> >> > >> memory-range to crash dump kernel. They will be accessed through >> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> > >> (or memblocks), I guess. >> >> > > >> >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> >> > > which is exactly what we want in this case. >> >> > >> >> > Now this is what is confusing me. I don't see the above happening. >> >> > >> >> > I see that the primary kernel boots up and adds the ACPI regions via: >> >> > acpi_os_ioremap >> >> > -> ioremap_cache >> >> > >> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> >> > variant. >> > >> > It is natural if that region is out of memblocks. >> >> Thanks for the confirmation. This was my understanding as well. >> >> >> > And it fails while accessing the ACPI tables: >> >> > >> >> > [ 0.039205] ACPI: Core revision 20170728 >> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> > >> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. >> > As ioremap() makes the mapping as "Device memory", unaligned memory >> > access won't be allowed. >> > >> >> > [ 0.100022] Modules linked in: >> >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> >> > pstate: 60000045 >> >> > [ 0.132647] sp : ffff000008ccfb40 >> >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> >> > [ 0.223224] Call trace: >> >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> >> > ffff0000095e3980 ffff000008ccfbe0 >> >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> >> > ffff000008ccfc50 0000000000000000 >> >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> >> > 00000000ffffff76 0000000000000006 >> >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> >> > 000000000000038e 0000000000000000 >> >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> >> > 0000000000000005 000000000000001b >> >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> >> > ffff000009710027 0000000000000001 >> >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> >> > 0000000000000000 ffff0000088be820 >> >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> >> > ffff00000849b4f8 ffff000008ccfb40 >> >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> >> > ffff000008ccfb40 ffff000008260a18 >> >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> >> > ffff000008ccfb40 ffff0000084a6764 >> >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> >> > [ 0.404437] Rebooting in 10 seconds. >> >> > >> >> > So, I think the linear mapping done by the primary kernel does not >> >> > make these accessible in the crash kernel directly. >> >> > >> >> > Any pointers? >> >> >> >> Can you get the code line number for acpi_ns_lookup+0x25c? >> > >> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or >> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned >> > accesses? >> > (I didn't find out how unaligned accesses could happen there.) >> > >> >> Right. Like I captured somewhere in this thread (perhaps the first >> email on this subject), >> this is indeed an unaligned address access. >> >> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding >> assigning this memory range >> as device memory doesn't seem a neat solution as it means we are not >> marking some thing with the right memory attribute and we can fall in >> similar/related issues later. >> >> Regarding the later suggestion, what I am seeing now is that the acpi >> table access functions are perhaps reused from the earlier x86 >> implementation, but on the arm64 (or even arm) arch we should not be >> allowing unaligned accesses which might cause UNDEFINED behaviour and >> resultant crash. >> >> So I can try going this approach and see if it works for me. >> >> However, I am still not very sure as to why the crashkernel ranges >> historically do not include the System RAM regions (which may include >> the ACPI regions as well). These regions are available for the kernel >> usage and perhaps should be exported to the crashkernel as well. >> >> I am not fully aware of the previous discussions on capp'ing the >> crashkernel memory being passed to the kdump kernel, but did we run >> into any issues while doing so? >> >> Also, even if I extend the kexec-tools to modify the >> linux,usable-memory-range and add the ACPI regions to it, the >> crashkernel fails to boot with the below message (I have added some >> logic to print the DTB on the crash kernel boot start): >> >> [ 0.000000] chosen { >> [ 0.000000] linux,usable-memory-range >> [ 0.000000] = < >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x0e800000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x20000000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x396c0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x000a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x39770000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00040000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x398a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00020000 >> [ 0.000000] > >> [ 0.000000] ; >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. > > Please show me: > * "Virtual kernel memory layout" in dmesg > * /proc/iomem > * debug messages from kexec-tools (kexec -d) So here are the changes which I have done so far in the kernel and kexec-tools to allow mapping ACPI reclaim regions as identifiable regions in '/proc/iomem' and to append them to the DTB property: linux,usable-memory-range: Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>, and <https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848> kexec-tools patch: <https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650> Note that I am not very clear about the hole margins that the kexec-tools adds (so that the crashkernel's expectation that the kernel image and initrd lie within a 1G boundary), so I have not added my temporary changes to the github code - but any suggestions on how to correctly put them in place would be appreciated. And here are the rest of the inputs you asked for: (1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout" [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+ root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 [ 0.000000] PCIe ASPM is disabled [ 0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 267251520K/268169216K available (7868K kernel code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K reserved, 0K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000 (126847 GB) [ 0.000000] .text : 0x (ptrval) - 0x (ptrval) ( 7872 KB) [ 0.000000] .rodata : 0x (ptrval) - 0x (ptrval) ( 3392 KB) [ 0.000000] .init : 0x (ptrval) - 0x (ptrval) ( 1280 KB) [ 0.000000] .data : 0x (ptrval) - 0x (ptrval) ( 1765 KB) [ 0.000000] .bss : 0x (ptrval) - 0x (ptrval) ( 7728 KB) [ 0.000000] fixed : 0xffff7fdffe7b0000 - 0xffff7fdffec00000 ( 4416 KB) [ 0.000000] PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000 ( 16 MB) [ 0.000000] vmemmap : 0xffff7fe000000000 - 0xffff800000000000 ( 128 GB maximum) [ 0.000000] 0xffff7fe000000000 - 0xffff7fe02bff0000 ( 703 MB actual) [ 0.000000] memory : 0xffff800000000000 - 0xffff80affc000000 (720832 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4 [ 0.000000] ftrace: allocating 29903 entries in 8 pages [ 0.000000] Hierarchical RCU implementation. (2) # cat /proc/iomem 00000000-3961ffff : System RAM 00080000-00b7ffff : Kernel code 00cc0000-0166ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : ACPI reclaim region 39760000-3976ffff : reserved 39770000-397affff : ACPI reclaim region 397b0000-3989ffff : reserved 398a0000-398bffff : ACPI reclaim region 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3ed30000-3ed5ffff : reserved 3ed60000-3fbfffff : System RAM 40500000-40500fff : sbsa-gwdt.0 40500000-40500fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 60080000-6008ffff : HISI0152:00 602b0000-602b0fff : ARMH0011:00 602b0000-602b0fff : ARMH0011:00 603c0000-603cffff : HISI0141:00 603c0000-603cffff : HISI0141:00 a0080000-a008ffff : HISI0152:05 a0080000-a008ffff : HISI0152:04 a0080000-a008ffff : HISI0152:03 a00a0000-a00affff : pnp 00:01 a01b0000-a01b0fff : HISI0191:00 a2000000-a200ffff : HISI0162:01 a2000000-a200ffff : HISI0162:01 a3000000-a300ffff : HISI0162:02 a3000000-a300ffff : HISI0162:02 a7020000-a702ffff : PNP0D20:00 a7020000-a702ffff : PNP0D20:00 b0000000-be7fffff : PCI Bus 0002:e8 b0000000-b06fffff : PCI Bus 0002:e9 b0000000-b00fffff : 0002:e9:00.0 b0000000-b00fffff : igb b0100000-b01fffff : 0002:e9:00.0 b0200000-b02fffff : 0002:e9:00.1 b0200000-b02fffff : igb b0300000-b03fffff : 0002:e9:00.1 b0400000-b04fffff : 0002:e9:00.2 b0400000-b04fffff : igb b0500000-b05fffff : 0002:e9:00.3 b0500000-b05fffff : igb b0600000-b0603fff : 0002:e9:00.0 b0600000-b0603fff : igb b0604000-b0607fff : 0002:e9:00.1 b0604000-b0607fff : igb b0608000-b060bfff : 0002:e9:00.2 b0608000-b060bfff : igb b060c000-b060ffff : 0002:e9:00.3 b060c000-b060ffff : igb b0700000-b0afffff : PCI Bus 0002:e9 b0700000-b077ffff : 0002:e9:00.0 b0780000-b07fffff : 0002:e9:00.0 b0800000-b087ffff : 0002:e9:00.1 b0880000-b08fffff : 0002:e9:00.1 b0900000-b097ffff : 0002:e9:00.2 b0980000-b09fffff : 0002:e9:00.2 b0a00000-b0a7ffff : 0002:e9:00.3 b0a80000-b0afffff : 0002:e9:00.3 b0b00000-b0b0ffff : 0002:e8:00.0 be800000-beffffff : PCI ECAM c0080000-c008ffff : HISI0152:02 c0080000-c008ffff : HISI0152:01 c3000000-c300ffff : HISI0162:00 c3000000-c300ffff : HISI0162:00 c5000000-c588ffff : HISI00B2:00 c5000000-c588ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 d0080000-d008ffff : HISI0152:07 d0080000-d008ffff : HISI0152:06 d0100000-d010ffff : HISI02A1:00 d0100000-d010ffff : HISI02A1:00 400000000-4007fffff : PCI ECAM 440000000-4ffffffff : PCI Bus 0005:00 440000000-4407fffff : PCI Bus 0005:01 440000000-4403fffff : 0005:01:00.0 440400000-4407fffff : 0005:01:00.1 440800000-4421fffff : PCI Bus 0005:01 440800000-440bfffff : 0005:01:00.0 440800000-440bfffff : ixgbe 440c00000-440ffffff : 0005:01:00.1 440c00000-440ffffff : ixgbe 441000000-4413fffff : 0005:01:00.0 441400000-4417fffff : 0005:01:00.0 441800000-441bfffff : 0005:01:00.1 441c00000-441ffffff : 0005:01:00.1 442000000-442003fff : 0005:01:00.0 442000000-442003fff : ixgbe 442004000-442007fff : 0005:01:00.1 442004000-442007fff : ixgbe 442200000-442200fff : 0005:00:00.0 700090000-70009ffff : pnp 00:03 7000a0000-7000affff : pnp 00:05 7000b0000-7000bffff : pnp 00:06 700200000-70020ffff : pnp 00:04 740800000-740ffffff : PCI ECAM 741000000-77ffeffff : PCI Bus 0006:08 741000000-74100ffff : 0006:08:00.0 784000000-7847fffff : PCI ECAM 784800000-7bffeffff : PCI Bus 0007:40 784800000-7849fffff : PCI Bus 0007:41 784800000-7849fffff : 0007:41:00.0 786000000-787ffffff : PCI Bus 0007:41 786000000-787ffffff : 0007:41:00.0 7c4800000-7c4ffffff : PCI ECAM 7c5000000-7fffeffff : PCI Bus 0004:48 7c5000000-7c51fffff : PCI Bus 0004:49 7c5000000-7c50fffff : 0004:49:00.0 7c5100000-7c513ffff : 0004:49:00.0 7c5100000-7c513ffff : mpt3sas 7c5140000-7c514ffff : 0004:49:00.0 7c5140000-7c514ffff : mpt3sas 7c5200000-7c520ffff : 0004:48:00.0 1040000000-1ffbffffff : System RAM 2000000000-2ffbffffff : System RAM 9000000000-9ffbffffff : System RAM a000000000-affbffffff : System RAM 400c0080000-400c008ffff : HISI0152:08 600a00a0000-600a00affff : pnp 00:08 64001000000-64001ffffff : PCI ECAM 65040000000-650ffffffff : PCI Bus 000a:10 65040000000-6504000ffff : 000a:10:00.0 700a0090000-700a009ffff : pnp 00:0a 700a0200000-700a020ffff : pnp 00:0b 74002000000-74002ffffff : PCI ECAM 75040000000-750ffffffff : PCI Bus 000c:20 75040000000-7504000ffff : 000c:20:00.0 78003000000-78003ffffff : PCI ECAM 79040000000-790ffffffff : PCI Bus 000d:30 79040000000-79040000fff : 000d:30:00.0 (3) # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d arch_process_options:149: command_line: root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img arch_process_options:152: dtb: (null) Try gzip decompression. kernel: 0xffff968d0010 kernel_size: 0xdf9200 get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. image_arm64_load: kernel_segment: 000000000e800000 image_arm64_load: text_offset: 0000000000080000 image_arm64_load: image_size: 00000000015f0000 image_arm64_load: phys_offset: 0000000000000000 image_arm64_load: vp_offset: ffffffffffffffff image_arm64_load: PE format: yes Reserved memory range 000000000e800000-000000002e7fffff (0) Coredump memory ranges 0000000000000000-000000000e7fffff (0) 000000002e800000-000000003961ffff (0) 0000000039d40000-000000003ed2ffff (0) 000000003ed60000-000000003fbfffff (0) 0000001040000000-0000001ffbffffff (0) 0000002000000000-0000002ffbffffff (0) 0000009000000000-0000009ffbffffff (0) 000000a000000000-000000affbffffff (0) ACPI reclaim memory ranges 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) crashkernel memory ranges 000000000e800000-000000002e7fffff (0) 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) kernel symbol _text vaddr = ffff000008080000 load_crashdump_segments: page_offset: ffff800000000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr = 0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000 Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000 p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000 Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000 p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000 Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000 p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000 Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000 p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz = 0xfbc000000 Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000 p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000 p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000 p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000 p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000 Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000 p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000 Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000 p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000 load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff read_1st_dtb: found /sys/firmware/fdt get_cells_size: #address-cells:2 #size-cells:2 cells_size_fitted: 2e7f0000-2e7f0fff cells_size_fitted: e800000-2e7fffff cells_size_fitted: 396c0000-3975ffff cells_size_fitted: 39770000-397affff cells_size_fitted: 398a0000-398bffff / { #size-cells = <0x00000002>; #address-cells = <0x00000002>; chosen { linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000 0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000 0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000 0x00020000>; linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>; linux,uefi-mmap-desc-ver = <0x00000001>; linux,uefi-mmap-desc-size = <0x00000030>; linux,uefi-mmap-size = <0x00000e40>; linux,uefi-mmap-start = <0x00000000 0x30288018>; linux,uefi-system-table = <0x00000000 0x3ed50018>; bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200"; linux,initrd-end = <0x00000000 0x2fbff9e0>; linux,initrd-start = <0x00000000 0x2e84d000>; }; }; initrd: base fe70000, size 13b29e0h (20654560), end 112229e0 [snip..] sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c sym: sha256_starts value: 11240eb0 addr: 11240018 machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6 sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c sym: sha256_update value: 11245158 addr: 11240034 machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449 sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc sym: sha256_finish value: 11245164 addr: 11240050 machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445 sym: memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34 sym: memcmp value: 11240634 addr: 11240060 machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240070 machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240078 machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240088 machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400a8 machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400b0 machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400c0 machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400d4 machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112453a8 addr: 112400f0 machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245338 addr: 112400f8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245358 addr: 11240100 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245368 addr: 11240108 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 1124536e addr: 11240110 machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245370 addr: 11240118 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 1124012c machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106 sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4 sym: setup_arch value: 11240ea8 addr: 11240130 machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0 sym: verify_sha256_digest value: 11240000 addr: 11240134 machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3 sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4 sym: post_verification_setup_arch value: 11240ea4 addr: 11240144 machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245380 addr: 11240148 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 112401ac machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240220 machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240478 machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245392 addr: 112404b8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 11240538 machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 112405c8 machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2 sym: purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28 sym: purgatory value: 11240120 addr: 11240678 machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8 sym: arm64_kernel_entry value: 112454c8 addr: 1124067c machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271 sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8 sym: arm64_dtb_addr value: 112454d0 addr: 11240680 machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 112450bc machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98 sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245118 machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245130 machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 1124513c machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78 sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112454d8 addr: 11245330 machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8 kexec_load: entry = 0x11240670 flags = 0xb70001 nr_segments = 5 segment[0].buf = 0xffff968d0010 segment[0].bufsz = 0xdf9200 segment[0].mem = 0xe880000 segment[0].memsz = 0x15f0000 segment[1].buf = 0xffff950e0010 segment[1].bufsz = 0x13b29e0 segment[1].mem = 0xfe70000 segment[1].memsz = 0x13c0000 segment[2].buf = 0x1115b440 segment[2].bufsz = 0x33d segment[2].mem = 0x11230000 segment[2].memsz = 0x10000 segment[3].buf = 0x1115bb70 segment[3].bufsz = 0x5518 segment[3].mem = 0x11240000 segment[3].memsz = 0x10000 segment[4].buf = 0x11159ca0 segment[4].bufsz = 0x1000 segment[4].mem = 0x2e7f0000 segment[4].memsz = 0x10000 Regards, Bhupesh > > >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec at lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 22:28 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 22:28 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 4:48 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> On Mon, Dec 18, 2017 at 11:24 AM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Mon, Dec 18, 2017 at 01:16:57PM +0800, Dave Young wrote: >> >> kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it >> >> to kexec@lists.infradead.org >> >> >> >> Also add linux-acpi list >> > >> > Thank you. >> > >> >> On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> >> > On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> >> > <ard.biesheuvel@linaro.org> wrote: >> >> > > On 15 December 2017 at 09:59, AKASHI Takahiro >> >> > > <takahiro.akashi@linaro.org> wrote: >> >> > >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> > >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >> > >>> <takahiro.akashi@linaro.org> wrote: >> >> > >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> > >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> > >>> >> <takahiro.akashi@linaro.org> wrote: >> >> > >>> >> > Bhupesh, Ard, >> >> > >>> >> > >> >> > >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> > >>> >> >> Hi Ard, Akashi >> >> > >>> >> >> >> >> > >>> >> > (snip) >> >> > >>> >> > >> >> > >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> > >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> > >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >> > >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >> > >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> > >>> >> >> , for details) >> >> > >>> >> > >> >> > >>> >> > Right. >> >> > >>> >> > >> >> > >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> > >>> >> >> with the crashkernel memory range: >> >> > >>> >> >> >> >> > >>> >> >> /* add linux,usable-memory-range */ >> >> > >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> > >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> > >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> > >>> >> >> address_cells, size_cells); >> >> > >>> >> >> >> >> > >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> > >>> >> >> , for details) >> >> > >>> >> >> >> >> > >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> > >>> >> >> they are marked as System RAM or as RESERVED. As, >> >> > >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> > >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> > >>> >> >> >> >> > >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> > >>> >> >> ACPI memory and crashes while trying to access the same: >> >> > >>> >> >> >> >> > >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> > >>> >> >> -r`.img --reuse-cmdline -d >> >> > >>> >> >> >> >> > >>> >> >> [snip..] >> >> > >>> >> >> >> >> > >>> >> >> Reserved memory range >> >> > >>> >> >> 000000000e800000-000000002e7fffff (0) >> >> > >>> >> >> >> >> > >>> >> >> Coredump memory ranges >> >> > >>> >> >> 0000000000000000-000000000e7fffff (0) >> >> > >>> >> >> 000000002e800000-000000003961ffff (0) >> >> > >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >> > >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >> > >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >> > >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >> > >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >> > >>> >> >> 000000a000000000-000000affbffffff (0) >> >> > >>> >> >> >> >> > >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> > >>> >> >> memory cap'ing passed to the crash kernel inside >> >> > >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >> > >>> >> >> >> >> > >>> >> >> static void __init fdt_enforce_memory_region(void) >> >> > >>> >> >> { >> >> > >>> >> >> struct memblock_region reg = { >> >> > >>> >> >> .size = 0, >> >> > >>> >> >> }; >> >> > >>> >> >> >> >> > >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >>> >> >> >> >> > >>> >> >> if (reg.size) >> >> > >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> > >>> >> >> comment this out */ >> >> > >>> >> >> } >> >> > >>> >> > >> >> > >>> >> > Please just don't do that. It can cause a fatal damage on >> >> > >>> >> > memory contents of the *crashed* kernel. >> >> > >>> >> > >> >> > >>> >> >> 5). Both the above temporary solutions fix the problem. >> >> > >>> >> >> >> >> > >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> > >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> > >>> >> >> fail. >> >> > >>> >> >> >> >> > >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> > >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> > >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> > >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> > >>> >> >> dt node 'linux,usable-memory-range' >> >> > >>> >> > >> >> > >>> >> > I still don't understand why we need to carry over the information >> >> > >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> > >>> >> > such regions are free to be reused by the kernel after some point of >> >> > >>> >> > initialization. Why does crash dump kernel need to know about them? >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >> > >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> > >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >> > >>> >> kernel, those regions needs to be preserved, which is why they are >> >> > >>> >> memblock_reserve()'d now. >> >> > >>> > >> >> > >>> > For my better understandings, who is actually accessing such regions >> >> > >>> > during boot time, uefi itself or efistub? >> >> > >>> > >> >> > >>> >> >> > >>> No, only the kernel. This is where the ACPI tables are stored. For >> >> > >>> instance, on QEMU we have >> >> > >>> >> >> > >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> > >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> > >>> 01000013) >> >> > >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> > >>> BXPC 00000001) >> >> > >>> >> >> > >>> covered by >> >> > >>> >> >> > >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> > >>> ... >> >> > >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> > >> >> >> > >> OK. I mistakenly understood those regions could be freed after exiting >> >> > >> UEFI boot services. >> >> > >> >> >> > >>> >> >> > >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >> > >>> >> when booting the next kernel. >> >> > >>> > >> >> > >>> > not really. >> >> > >>> > >> >> > >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >> > >>> >> > on crash dump kernel?) >> >> > >>> >> > >> >> > >>> >> >> >> > >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >> > >>> >> regions only revealed the bug, not created it (given that other >> >> > >>> >> memblock_reserve regions may be affected as well) >> >> > >>> > >> >> > >>> > As whether we should honor such reserved regions over kexec'ing >> >> > >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >> > >>> > As a matter of fact, no information about "reserved" memblocks is >> >> > >>> > exposed to user space (via proc/iomem). >> >> > >>> > >> >> > >>> >> >> > >>> That is why I suggested (somewhere in this thread?) to not expose them >> >> > >>> as 'System RAM'. Do you think that could solve this? >> >> > >> >> >> > >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> > >> marking them under another name in /proc/iomem would also be good in order >> >> > >> not to allocate them as part of crash kernel's memory. >> >> > >> >> >> > > >> >> > > I agree. However, this may not be entirely trivial, since iterating >> >> > > over the memblock_reserved table and creating iomem entries may result >> >> > > in collisions. >> >> > >> >> > I found a method (using the patch I shared earlier in this thread) to mark these >> >> > entries as 'ACPI reclaim memory' ranges rather than System RAM or >> >> > reserved regions. >> >> > >> >> > >> But I'm not still convinced that we should export them in useable- >> >> > >> memory-range to crash dump kernel. They will be accessed through >> >> > >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> > >> (or memblocks), I guess. >> >> > > >> >> > > Agreed. They will be covered by the linear mapping in the boot kernel, >> >> > > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> >> > > which is exactly what we want in this case. >> >> > >> >> > Now this is what is confusing me. I don't see the above happening. >> >> > >> >> > I see that the primary kernel boots up and adds the ACPI regions via: >> >> > acpi_os_ioremap >> >> > -> ioremap_cache >> >> > >> >> > But during the crashkernel boot, ''acpi_os_ioremap' calls >> >> > 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> >> > variant. >> > >> > It is natural if that region is out of memblocks. >> >> Thanks for the confirmation. This was my understanding as well. >> >> >> > And it fails while accessing the ACPI tables: >> >> > >> >> > [ 0.039205] ACPI: Core revision 20170728 >> >> > pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> >> > [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> > >> > this (ESR = 0x96000021) means that Data Abort and Alignment fault happened. >> > As ioremap() makes the mapping as "Device memory", unaligned memory >> > access won't be allowed. >> > >> >> > [ 0.100022] Modules linked in: >> >> > [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> >> > [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> >> > [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> >> > pstate: 60000045 >> >> > [ 0.132647] sp : ffff000008ccfb40 >> >> > [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> >> > [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> >> > [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> >> > [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> >> > [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> >> > [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> >> > [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> >> > [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> >> > [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> >> > [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> >> > [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> >> > [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> >> > [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> >> > [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> >> > [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> >> > [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> >> > [ 0.223224] Call trace: >> >> > [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> >> > [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> >> > ffff0000095e3980 ffff000008ccfbe0 >> >> > [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> >> > ffff000008ccfc50 0000000000000000 >> >> > [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> >> > 00000000ffffff76 0000000000000006 >> >> > [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> >> > 000000000000038e 0000000000000000 >> >> > [ 0.263843] fa80: 0000000000000000 0000000000000000 >> >> > 0000000000000005 000000000000001b >> >> > [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> >> > ffff000009710027 0000000000000001 >> >> > [ 0.279667] fac0: 0000000000000001 000000000000001b >> >> > 0000000000000000 ffff0000088be820 >> >> > [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> >> > ffff00000849b4f8 ffff000008ccfb40 >> >> > [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> >> > ffff000008ccfb40 ffff000008260a18 >> >> > [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> >> > ffff000008ccfb40 ffff0000084a6764 >> >> > [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> >> > [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> >> > [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> >> > [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> >> > [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> >> > [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> >> > [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> >> > [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> >> > [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> >> > [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> >> > [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> >> > [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> >> > [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> >> > [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> >> > [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> >> > [ 0.399160] Kernel panic - not syncing: Fatal exception >> >> > [ 0.404437] Rebooting in 10 seconds. >> >> > >> >> > So, I think the linear mapping done by the primary kernel does not >> >> > make these accessible in the crash kernel directly. >> >> > >> >> > Any pointers? >> >> >> >> Can you get the code line number for acpi_ns_lookup+0x25c? >> > >> > So should we always avoid ioremap() in acpi_os_ioremap() entirely, or >> > modify acpi_ns_lookup() (or any acpi functions') to prevent unaligned >> > accesses? >> > (I didn't find out how unaligned accesses could happen there.) >> > >> >> Right. Like I captured somewhere in this thread (perhaps the first >> email on this subject), >> this is indeed an unaligned address access. >> >> Now, modifying acpi_os_ioremap() to not ioremap() and thus avoiding >> assigning this memory range >> as device memory doesn't seem a neat solution as it means we are not >> marking some thing with the right memory attribute and we can fall in >> similar/related issues later. >> >> Regarding the later suggestion, what I am seeing now is that the acpi >> table access functions are perhaps reused from the earlier x86 >> implementation, but on the arm64 (or even arm) arch we should not be >> allowing unaligned accesses which might cause UNDEFINED behaviour and >> resultant crash. >> >> So I can try going this approach and see if it works for me. >> >> However, I am still not very sure as to why the crashkernel ranges >> historically do not include the System RAM regions (which may include >> the ACPI regions as well). These regions are available for the kernel >> usage and perhaps should be exported to the crashkernel as well. >> >> I am not fully aware of the previous discussions on capp'ing the >> crashkernel memory being passed to the kdump kernel, but did we run >> into any issues while doing so? >> >> Also, even if I extend the kexec-tools to modify the >> linux,usable-memory-range and add the ACPI regions to it, the >> crashkernel fails to boot with the below message (I have added some >> logic to print the DTB on the crash kernel boot start): >> >> [ 0.000000] chosen { >> [ 0.000000] linux,usable-memory-range >> [ 0.000000] = < >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x0e800000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x20000000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x396c0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x000a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x39770000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00040000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x398a0000 >> [ 0.000000] 0x00000000 >> [ 0.000000] 0x00020000 >> [ 0.000000] > >> [ 0.000000] ; >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. > > Please show me: > * "Virtual kernel memory layout" in dmesg > * /proc/iomem > * debug messages from kexec-tools (kexec -d) So here are the changes which I have done so far in the kernel and kexec-tools to allow mapping ACPI reclaim regions as identifiable regions in '/proc/iomem' and to append them to the DTB property: linux,usable-memory-range: Linux patch: <https://github.com/bhupesh-sharma/linux/commit/88d2ff6a1c16f5aa107b567a9d9c60343e52f263>, and <https://github.com/bhupesh-sharma/linux/commit/23262febd29a6665d483a707a05f8869757b8848> kexec-tools patch: <https://github.com/bhupesh-sharma/kexec-tools/commit/3e3d7c50648b1195674d1b7667cbbfd8d899b650> Note that I am not very clear about the hole margins that the kexec-tools adds (so that the crashkernel's expectation that the kernel image and initrd lie within a 1G boundary), so I have not added my temporary changes to the github code - but any suggestions on how to correctly put them in place would be appreciated. And here are the rest of the inputs you asked for: (1) # dmesg | grep -A 15 -B 4 -i "Virtual kernel memory layout" [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.15.0-rc2-next-20171207+ root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off crashkernel=512M rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 [ 0.000000] PCIe ASPM is disabled [ 0.000000] software IO TLB [mem 0x35620000-0x39620000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 267251520K/268169216K available (7868K kernel code, 1764K rwdata, 3328K rodata, 1280K init, 7727K bss, 917696K reserved, 0K cma-reserved) [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) [ 0.000000] vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000 (126847 GB) [ 0.000000] .text : 0x (ptrval) - 0x (ptrval) ( 7872 KB) [ 0.000000] .rodata : 0x (ptrval) - 0x (ptrval) ( 3392 KB) [ 0.000000] .init : 0x (ptrval) - 0x (ptrval) ( 1280 KB) [ 0.000000] .data : 0x (ptrval) - 0x (ptrval) ( 1765 KB) [ 0.000000] .bss : 0x (ptrval) - 0x (ptrval) ( 7728 KB) [ 0.000000] fixed : 0xffff7fdffe7b0000 - 0xffff7fdffec00000 ( 4416 KB) [ 0.000000] PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000 ( 16 MB) [ 0.000000] vmemmap : 0xffff7fe000000000 - 0xffff800000000000 ( 128 GB maximum) [ 0.000000] 0xffff7fe000000000 - 0xffff7fe02bff0000 ( 703 MB actual) [ 0.000000] memory : 0xffff800000000000 - 0xffff80affc000000 (720832 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64, Nodes=4 [ 0.000000] ftrace: allocating 29903 entries in 8 pages [ 0.000000] Hierarchical RCU implementation. (2) # cat /proc/iomem 00000000-3961ffff : System RAM 00080000-00b7ffff : Kernel code 00cc0000-0166ffff : Kernel data 0e800000-2e7fffff : Crash kernel 39620000-396bffff : reserved 396c0000-3975ffff : ACPI reclaim region 39760000-3976ffff : reserved 39770000-397affff : ACPI reclaim region 397b0000-3989ffff : reserved 398a0000-398bffff : ACPI reclaim region 398c0000-39d3ffff : reserved 39d40000-3ed2ffff : System RAM 3ed30000-3ed5ffff : reserved 3ed60000-3fbfffff : System RAM 40500000-40500fff : sbsa-gwdt.0 40500000-40500fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 40600000-40600fff : sbsa-gwdt.0 60080000-6008ffff : HISI0152:00 602b0000-602b0fff : ARMH0011:00 602b0000-602b0fff : ARMH0011:00 603c0000-603cffff : HISI0141:00 603c0000-603cffff : HISI0141:00 a0080000-a008ffff : HISI0152:05 a0080000-a008ffff : HISI0152:04 a0080000-a008ffff : HISI0152:03 a00a0000-a00affff : pnp 00:01 a01b0000-a01b0fff : HISI0191:00 a2000000-a200ffff : HISI0162:01 a2000000-a200ffff : HISI0162:01 a3000000-a300ffff : HISI0162:02 a3000000-a300ffff : HISI0162:02 a7020000-a702ffff : PNP0D20:00 a7020000-a702ffff : PNP0D20:00 b0000000-be7fffff : PCI Bus 0002:e8 b0000000-b06fffff : PCI Bus 0002:e9 b0000000-b00fffff : 0002:e9:00.0 b0000000-b00fffff : igb b0100000-b01fffff : 0002:e9:00.0 b0200000-b02fffff : 0002:e9:00.1 b0200000-b02fffff : igb b0300000-b03fffff : 0002:e9:00.1 b0400000-b04fffff : 0002:e9:00.2 b0400000-b04fffff : igb b0500000-b05fffff : 0002:e9:00.3 b0500000-b05fffff : igb b0600000-b0603fff : 0002:e9:00.0 b0600000-b0603fff : igb b0604000-b0607fff : 0002:e9:00.1 b0604000-b0607fff : igb b0608000-b060bfff : 0002:e9:00.2 b0608000-b060bfff : igb b060c000-b060ffff : 0002:e9:00.3 b060c000-b060ffff : igb b0700000-b0afffff : PCI Bus 0002:e9 b0700000-b077ffff : 0002:e9:00.0 b0780000-b07fffff : 0002:e9:00.0 b0800000-b087ffff : 0002:e9:00.1 b0880000-b08fffff : 0002:e9:00.1 b0900000-b097ffff : 0002:e9:00.2 b0980000-b09fffff : 0002:e9:00.2 b0a00000-b0a7ffff : 0002:e9:00.3 b0a80000-b0afffff : 0002:e9:00.3 b0b00000-b0b0ffff : 0002:e8:00.0 be800000-beffffff : PCI ECAM c0080000-c008ffff : HISI0152:02 c0080000-c008ffff : HISI0152:01 c3000000-c300ffff : HISI0162:00 c3000000-c300ffff : HISI0162:00 c5000000-c588ffff : HISI00B2:00 c5000000-c588ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 c7000000-c705ffff : HISI00B2:00 d0080000-d008ffff : HISI0152:07 d0080000-d008ffff : HISI0152:06 d0100000-d010ffff : HISI02A1:00 d0100000-d010ffff : HISI02A1:00 400000000-4007fffff : PCI ECAM 440000000-4ffffffff : PCI Bus 0005:00 440000000-4407fffff : PCI Bus 0005:01 440000000-4403fffff : 0005:01:00.0 440400000-4407fffff : 0005:01:00.1 440800000-4421fffff : PCI Bus 0005:01 440800000-440bfffff : 0005:01:00.0 440800000-440bfffff : ixgbe 440c00000-440ffffff : 0005:01:00.1 440c00000-440ffffff : ixgbe 441000000-4413fffff : 0005:01:00.0 441400000-4417fffff : 0005:01:00.0 441800000-441bfffff : 0005:01:00.1 441c00000-441ffffff : 0005:01:00.1 442000000-442003fff : 0005:01:00.0 442000000-442003fff : ixgbe 442004000-442007fff : 0005:01:00.1 442004000-442007fff : ixgbe 442200000-442200fff : 0005:00:00.0 700090000-70009ffff : pnp 00:03 7000a0000-7000affff : pnp 00:05 7000b0000-7000bffff : pnp 00:06 700200000-70020ffff : pnp 00:04 740800000-740ffffff : PCI ECAM 741000000-77ffeffff : PCI Bus 0006:08 741000000-74100ffff : 0006:08:00.0 784000000-7847fffff : PCI ECAM 784800000-7bffeffff : PCI Bus 0007:40 784800000-7849fffff : PCI Bus 0007:41 784800000-7849fffff : 0007:41:00.0 786000000-787ffffff : PCI Bus 0007:41 786000000-787ffffff : 0007:41:00.0 7c4800000-7c4ffffff : PCI ECAM 7c5000000-7fffeffff : PCI Bus 0004:48 7c5000000-7c51fffff : PCI Bus 0004:49 7c5000000-7c50fffff : 0004:49:00.0 7c5100000-7c513ffff : 0004:49:00.0 7c5100000-7c513ffff : mpt3sas 7c5140000-7c514ffff : 0004:49:00.0 7c5140000-7c514ffff : mpt3sas 7c5200000-7c520ffff : 0004:48:00.0 1040000000-1ffbffffff : System RAM 2000000000-2ffbffffff : System RAM 9000000000-9ffbffffff : System RAM a000000000-affbffffff : System RAM 400c0080000-400c008ffff : HISI0152:08 600a00a0000-600a00affff : pnp 00:08 64001000000-64001ffffff : PCI ECAM 65040000000-650ffffffff : PCI Bus 000a:10 65040000000-6504000ffff : 000a:10:00.0 700a0090000-700a009ffff : pnp 00:0a 700a0200000-700a020ffff : pnp 00:0b 74002000000-74002ffffff : PCI ECAM 75040000000-750ffffffff : PCI Bus 000c:20 75040000000-7504000ffff : 000c:20:00.0 78003000000-78003ffffff : PCI ECAM 79040000000-790ffffffff : PCI Bus 000d:30 79040000000-79040000fff : 000d:30:00.0 (3) # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img --reuse-cmdline -d arch_process_options:149: command_line: root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200 arch_process_options:151: initrd: /boot/initramfs-4.15.0-rc2-next-20171207+.img arch_process_options:152: dtb: (null) Try gzip decompression. kernel: 0xffff968d0010 kernel_size: 0xdf9200 get_memory_ranges_iomem_cb: 0000000000000000 - 000000003961ffff : System RAM get_memory_ranges_iomem_cb: 0000000039620000 - 00000000396bffff : reserved get_memory_ranges_iomem_cb: 0000000039760000 - 000000003976ffff : reserved get_memory_ranges_iomem_cb: 00000000397b0000 - 000000003989ffff : reserved get_memory_ranges_iomem_cb: 00000000398c0000 - 0000000039d3ffff : reserved get_memory_ranges_iomem_cb: 0000000039d40000 - 000000003ed2ffff : System RAM get_memory_ranges_iomem_cb: 000000003ed30000 - 000000003ed5ffff : reserved get_memory_ranges_iomem_cb: 000000003ed60000 - 000000003fbfffff : System RAM get_memory_ranges_iomem_cb: 0000001040000000 - 0000001ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000002000000000 - 0000002ffbffffff : System RAM get_memory_ranges_iomem_cb: 0000009000000000 - 0000009ffbffffff : System RAM get_memory_ranges_iomem_cb: 000000a000000000 - 000000affbffffff : System RAM elf_arm64_probe: Not an ELF executable. image_arm64_load: kernel_segment: 000000000e800000 image_arm64_load: text_offset: 0000000000080000 image_arm64_load: image_size: 00000000015f0000 image_arm64_load: phys_offset: 0000000000000000 image_arm64_load: vp_offset: ffffffffffffffff image_arm64_load: PE format: yes Reserved memory range 000000000e800000-000000002e7fffff (0) Coredump memory ranges 0000000000000000-000000000e7fffff (0) 000000002e800000-000000003961ffff (0) 0000000039d40000-000000003ed2ffff (0) 000000003ed60000-000000003fbfffff (0) 0000001040000000-0000001ffbffffff (0) 0000002000000000-0000002ffbffffff (0) 0000009000000000-0000009ffbffffff (0) 000000a000000000-000000affbffffff (0) ACPI reclaim memory ranges 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) crashkernel memory ranges 000000000e800000-000000002e7fffff (0) 00000000396c0000-000000003975ffff (0) 0000000039770000-00000000397affff (0) 00000000398a0000-00000000398bffff (0) kernel symbol _text vaddr = ffff000008080000 load_crashdump_segments: page_offset: ffff800000000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x0 p_paddr = 0x0 p_vaddr = 0xffff800000000000 p_filesz = 0xe800000 p_memsz = 0xe800000 Elf header: p_type = 1, p_offset = 0x2e800000 p_paddr = 0x2e800000 p_vaddr = 0xffff80002e800000 p_filesz = 0xae20000 p_memsz = 0xae20000 Elf header: p_type = 1, p_offset = 0x39d40000 p_paddr = 0x39d40000 p_vaddr = 0xffff800039d40000 p_filesz = 0x4ff0000 p_memsz = 0x4ff0000 Elf header: p_type = 1, p_offset = 0x3ed60000 p_paddr = 0x3ed60000 p_vaddr = 0xffff80003ed60000 p_filesz = 0xea0000 p_memsz = 0xea0000 Elf header: p_type = 1, p_offset = 0x1040000000 p_paddr = 0x1040000000 p_vaddr = 0xffff801040000000 p_filesz = 0xfbc000000 p_memsz = 0xfbc000000 Elf header: p_type = 1, p_offset = 0x2000000000 p_paddr = 0x2000000000 p_vaddr = 0xffff802000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0x9000000000 p_paddr = 0x9000000000 p_vaddr = 0xffff809000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 Elf header: p_type = 1, p_offset = 0xa000000000 p_paddr = 0xa000000000 p_vaddr = 0xffff80a000000000 p_filesz = 0xffc000000 p_memsz = 0xffc000000 get_crash_notes_per_cpu: crash_notes addr = 1ff7cf3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7cf3200 p_paddr = 0x1ff7cf3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d23200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d23200 p_paddr = 0x1ff7d23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d53200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d53200 p_paddr = 0x1ff7d53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7d83200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7d83200 p_paddr = 0x1ff7d83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7db3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7db3200 p_paddr = 0x1ff7db3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7de3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7de3200 p_paddr = 0x1ff7de3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e13200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e13200 p_paddr = 0x1ff7e13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e43200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e43200 p_paddr = 0x1ff7e43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7e73200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7e73200 p_paddr = 0x1ff7e73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ea3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ea3200 p_paddr = 0x1ff7ea3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7ed3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7ed3200 p_paddr = 0x1ff7ed3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f03200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f03200 p_paddr = 0x1ff7f03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f33200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f33200 p_paddr = 0x1ff7f33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f63200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f63200 p_paddr = 0x1ff7f63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7f93200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7f93200 p_paddr = 0x1ff7f93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 1ff7fc3200, size = 424 Elf header: p_type = 4, p_offset = 0x1ff7fc3200 p_paddr = 0x1ff7fc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d13200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d13200 p_paddr = 0x2ff7d13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d43200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d43200 p_paddr = 0x2ff7d43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7d73200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7d73200 p_paddr = 0x2ff7d73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7da3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7da3200 p_paddr = 0x2ff7da3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7dd3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7dd3200 p_paddr = 0x2ff7dd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e03200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e03200 p_paddr = 0x2ff7e03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e33200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e33200 p_paddr = 0x2ff7e33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e63200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e63200 p_paddr = 0x2ff7e63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7e93200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7e93200 p_paddr = 0x2ff7e93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ec3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ec3200 p_paddr = 0x2ff7ec3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7ef3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7ef3200 p_paddr = 0x2ff7ef3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f23200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f23200 p_paddr = 0x2ff7f23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f53200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f53200 p_paddr = 0x2ff7f53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7f83200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7f83200 p_paddr = 0x2ff7f83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fb3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fb3200 p_paddr = 0x2ff7fb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 2ff7fe3200, size = 424 Elf header: p_type = 4, p_offset = 0x2ff7fe3200 p_paddr = 0x2ff7fe3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d03200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d03200 p_paddr = 0x9ff7d03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d33200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d33200 p_paddr = 0x9ff7d33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d63200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d63200 p_paddr = 0x9ff7d63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7d93200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7d93200 p_paddr = 0x9ff7d93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7dc3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7dc3200 p_paddr = 0x9ff7dc3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7df3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7df3200 p_paddr = 0x9ff7df3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e23200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e23200 p_paddr = 0x9ff7e23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e53200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e53200 p_paddr = 0x9ff7e53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7e83200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7e83200 p_paddr = 0x9ff7e83200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7eb3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7eb3200 p_paddr = 0x9ff7eb3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7ee3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7ee3200 p_paddr = 0x9ff7ee3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f13200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f13200 p_paddr = 0x9ff7f13200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f43200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f43200 p_paddr = 0x9ff7f43200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7f73200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7f73200 p_paddr = 0x9ff7f73200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fa3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fa3200 p_paddr = 0x9ff7fa3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = 9ff7fd3200, size = 424 Elf header: p_type = 4, p_offset = 0x9ff7fd3200 p_paddr = 0x9ff7fd3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7883200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7883200 p_paddr = 0xaff7883200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78b3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78b3200 p_paddr = 0xaff78b3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff78e3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff78e3200 p_paddr = 0xaff78e3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7913200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7913200 p_paddr = 0xaff7913200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7943200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7943200 p_paddr = 0xaff7943200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7973200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7973200 p_paddr = 0xaff7973200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79a3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79a3200 p_paddr = 0xaff79a3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff79d3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff79d3200 p_paddr = 0xaff79d3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a03200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a03200 p_paddr = 0xaff7a03200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a33200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a33200 p_paddr = 0xaff7a33200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a63200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a63200 p_paddr = 0xaff7a63200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7a93200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7a93200 p_paddr = 0xaff7a93200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7ac3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7ac3200 p_paddr = 0xaff7ac3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7af3200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7af3200 p_paddr = 0xaff7af3200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b23200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b23200 p_paddr = 0xaff7b23200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 get_crash_notes_per_cpu: crash_notes addr = aff7b53200, size = 424 Elf header: p_type = 4, p_offset = 0xaff7b53200 p_paddr = 0xaff7b53200 p_vaddr = 0x0 p_filesz = 0x1a8 p_memsz = 0x1a8 vmcoreinfo header: p_type = 4, p_offset = 0x9fc4720000 p_paddr = 0x9fc4720000 p_vaddr = 0x0 p_filesz = 0x10024 p_memsz = 0x10024 Kernel text Elf header: p_type = 1, p_offset = 0x80000 p_paddr = 0x80000 p_vaddr = 0xffff000008080000 p_filesz = 0x15f0000 p_memsz = 0x15f0000 Elf header: p_type = 1, p_offset = 0x396c0000 p_paddr = 0x396c0000 p_vaddr = 0xffff8000396c0000 p_filesz = 0xa0000 p_memsz = 0xa0000 Elf header: p_type = 1, p_offset = 0x39770000 p_paddr = 0x39770000 p_vaddr = 0xffff800039770000 p_filesz = 0x40000 p_memsz = 0x40000 Elf header: p_type = 1, p_offset = 0x398a0000 p_paddr = 0x398a0000 p_vaddr = 0xffff8000398a0000 p_filesz = 0x20000 p_memsz = 0x20000 load_crashdump_segments: elfcorehdr 0x2e7f0000-0x2e7f0fff read_1st_dtb: found /sys/firmware/fdt get_cells_size: #address-cells:2 #size-cells:2 cells_size_fitted: 2e7f0000-2e7f0fff cells_size_fitted: e800000-2e7fffff cells_size_fitted: 396c0000-3975ffff cells_size_fitted: 39770000-397affff cells_size_fitted: 398a0000-398bffff / { #size-cells = <0x00000002>; #address-cells = <0x00000002>; chosen { linux,usable-memory-range = <0x00000000 0x0e800000 0x00000000 0x20000000 0x00000000 0x396c0000 0x00000000 0x000a0000 0x00000000 0x39770000 0x00000000 0x00040000 0x00000000 0x398a0000 0x00000000 0x00020000>; linux,elfcorehdr = <0x00000000 0x2e7f0000 0x00000000 0x00001000>; linux,uefi-mmap-desc-ver = <0x00000001>; linux,uefi-mmap-desc-size = <0x00000030>; linux,uefi-mmap-size = <0x00000e40>; linux,uefi-mmap-start = <0x00000000 0x30288018>; linux,uefi-system-table = <0x00000000 0x3ed50018>; bootargs = "root=/dev/mapper/rhelaa_huawei--t2280--01-root ro earlycon=pl011,mmio,0x602B0000 efi=debug memblock_debug=1 pcie_aspm=off rd.lvm.lv=rhelaa_huawei-t2280-01/root rd.lvm.lv=rhelaa_huawei-t2280-01/swap acpi=force console=ttyAMA0,115200"; linux,initrd-end = <0x00000000 0x2fbff9e0>; linux,initrd-start = <0x00000000 0x2e84d000>; }; }; initrd: base fe70000, size 13b29e0h (20654560), end 112229e0 [snip..] sym: sha256_starts info: 12 other: 00 shndx: 1 value: eb0 size: 6c sym: sha256_starts value: 11240eb0 addr: 11240018 machine_apply_elf_rel: CALL26 580006b394000000->580006b3940003a6 sym: sha256_update info: 12 other: 00 shndx: 1 value: 5158 size: c sym: sha256_update value: 11245158 addr: 11240034 machine_apply_elf_rel: CALL26 9100427394000000->9100427394001449 sym: sha256_finish info: 12 other: 00 shndx: 1 value: 5164 size: 1cc sym: sha256_finish value: 11245164 addr: 11240050 machine_apply_elf_rel: CALL26 aa1403e094000000->aa1403e094001445 sym: memcmp info: 12 other: 00 shndx: 1 value: 634 size: 34 sym: memcmp value: 11240634 addr: 11240060 machine_apply_elf_rel: CALL26 340003c094000000->340003c094000175 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240070 machine_apply_elf_rel: CALL26 5800046094000000->5800046094000135 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240078 machine_apply_elf_rel: CALL26 5800047594000000->5800047594000133 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 11240088 machine_apply_elf_rel: CALL26 9100067394000000->910006739400012f sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400a8 machine_apply_elf_rel: CALL26 5800036094000000->5800036094000127 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400b0 machine_apply_elf_rel: CALL26 910402e194000000->910402e194000125 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400c0 machine_apply_elf_rel: CALL26 9100067394000000->9100067394000121 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 112400d4 machine_apply_elf_rel: CALL26 5280002094000000->528000209400011c sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112453a8 addr: 112400f0 machine_apply_elf_rel: ABS64 0000000000000000->00000000112453a8 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245338 addr: 112400f8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245338 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245358 addr: 11240100 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245368 addr: 11240108 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245368 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 1124536e addr: 11240110 machine_apply_elf_rel: ABS64 0000000000000000->000000001124536e sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245370 addr: 11240118 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245370 sym: printf info: 12 other: 00 shndx: 1 value: 544 size: 90 sym: printf value: 11240544 addr: 1124012c machine_apply_elf_rel: CALL26 9400000094000000->9400000094000106 sym: setup_arch info: 12 other: 00 shndx: 1 value: ea8 size: 4 sym: setup_arch value: 11240ea8 addr: 11240130 machine_apply_elf_rel: CALL26 9400000094000000->940000009400035e sym: verify_sha256_digest info: 12 other: 00 shndx: 1 value: 0 size: f0 sym: verify_sha256_digest value: 11240000 addr: 11240134 machine_apply_elf_rel: CALL26 3400004094000000->3400004097ffffb3 sym: post_verification_setup_arch info: 12 other: 00 shndx: 1 value: ea4 size: 4 sym: post_verification_setup_arch value: 11240ea4 addr: 11240144 machine_apply_elf_rel: JUMP26 0000000014000000->0000000014000358 sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245380 addr: 11240148 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245380 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 112401ac machine_apply_elf_rel: CALL26 f94037a194000000->f94037a19400033d sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240220 machine_apply_elf_rel: CALL26 910006f794000000->910006f794000320 sym: putchar info: 12 other: 00 shndx: 1 value: ea0 size: 4 sym: putchar value: 11240ea0 addr: 11240478 machine_apply_elf_rel: CALL26 9100073994000000->910007399400028a sym: .rodata.str1.1 info: 03 other: 00 shndx: 3 value: 0 size: 0 sym: .rodata.str1.1 value: 11245392 addr: 112404b8 machine_apply_elf_rel: ABS64 0000000000000000->0000000011245392 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 11240538 machine_apply_elf_rel: CALL26 a8d07bfd94000000->a8d07bfd97ffff06 sym: vsprintf info: 12 other: 00 shndx: 1 value: 150 size: 364 sym: vsprintf value: 11240150 addr: 112405c8 machine_apply_elf_rel: CALL26 a8d17bfd94000000->a8d17bfd97fffee2 sym: purgatory info: 12 other: 00 shndx: 1 value: 120 size: 28 sym: purgatory value: 11240120 addr: 11240678 machine_apply_elf_rel: CALL26 5800001194000000->5800001197fffeaa sym: arm64_kernel_entry info: 10 other: 00 shndx: 4 value: 120 size: 8 sym: arm64_kernel_entry value: 112454c8 addr: 1124067c machine_apply_elf_rel: LD_PREL_LO19 5800000058000011->5800000058027271 sym: arm64_dtb_addr info: 10 other: 00 shndx: 4 value: 128 size: 8 sym: arm64_dtb_addr value: 112454d0 addr: 11240680 machine_apply_elf_rel: LD_PREL_LO19 aa1f03e158000000->aa1f03e158027280 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 112450bc machine_apply_elf_rel: CALL26 d101029494000000->d101029497ffef98 sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245118 machine_apply_elf_rel: JUMP26 b4fffc5814000000->b4fffc5817ffed3f sym: memcpy info: 12 other: 00 shndx: 1 value: 614 size: 20 sym: memcpy value: 11240614 addr: 11245130 machine_apply_elf_rel: CALL26 aa1503e094000000->aa1503e097ffed39 sym: sha256_process info: 12 other: 00 shndx: 1 value: f1c size: 4134 sym: sha256_process value: 11240f1c addr: 1124513c machine_apply_elf_rel: CALL26 cb1302d694000000->cb1302d697ffef78 sym: .data info: 03 other: 00 shndx: 4 value: 0 size: 0 sym: .data value: 112454d8 addr: 11245330 machine_apply_elf_rel: ABS64 0000000000000000->00000000112454d8 kexec_load: entry = 0x11240670 flags = 0xb70001 nr_segments = 5 segment[0].buf = 0xffff968d0010 segment[0].bufsz = 0xdf9200 segment[0].mem = 0xe880000 segment[0].memsz = 0x15f0000 segment[1].buf = 0xffff950e0010 segment[1].bufsz = 0x13b29e0 segment[1].mem = 0xfe70000 segment[1].memsz = 0x13c0000 segment[2].buf = 0x1115b440 segment[2].bufsz = 0x33d segment[2].mem = 0x11230000 segment[2].memsz = 0x10000 segment[3].buf = 0x1115bb70 segment[3].bufsz = 0x5518 segment[3].mem = 0x11240000 segment[3].memsz = 0x10000 segment[4].buf = 0x11159ca0 segment[4].bufsz = 0x1000 segment[4].mem = 0x2e7f0000 segment[4].memsz = 0x10000 Regards, Bhupesh > > >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec@lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 8:59 ` Bhupesh SHARMA (?) (?) @ 2017-12-19 5:01 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:01 UTC (permalink / raw) To: Bhupesh SHARMA Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... This is an odd message coming from: |void __init arm64_memblock_init(void) |... | | if (WARN(base < memblock_start_of_DRAM() || | base + size > memblock_start_of_DRAM() + | linear_region_size, | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { Can you confirm how the condition breaks here? I suppose base: 0xfe70000 size: 0x13c0000 memblock_start_of_DRAM(): 0xe800000 according to the information you gave me. Thanks, -Takahiro AKASHI > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. > > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org > >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 5:01 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:01 UTC (permalink / raw) To: Bhupesh SHARMA Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, Bhupesh Sharma, kexec, linux-kernel, linux-acpi, James Morse, Dave Young, linux-arm-kernel On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... This is an odd message coming from: |void __init arm64_memblock_init(void) |... | | if (WARN(base < memblock_start_of_DRAM() || | base + size > memblock_start_of_DRAM() + | linear_region_size, | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { Can you confirm how the condition breaks here? I suppose base: 0xfe70000 size: 0x13c0000 memblock_start_of_DRAM(): 0xe800000 according to the information you gave me. Thanks, -Takahiro AKASHI > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. > > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec@lists.fedoraproject.org > >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 5:01 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:01 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... This is an odd message coming from: |void __init arm64_memblock_init(void) |... | | if (WARN(base < memblock_start_of_DRAM() || | base + size > memblock_start_of_DRAM() + | linear_region_size, | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { Can you confirm how the condition breaks here? I suppose base: 0xfe70000 size: 0x13c0000 memblock_start_of_DRAM(): 0xe800000 according to the information you gave me. Thanks, -Takahiro AKASHI > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. > > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec at lists.fedoraproject.org > >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 5:01 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:01 UTC (permalink / raw) To: Bhupesh SHARMA Cc: Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: > > [snip..] > > [ 0.000000] linux,usable-memory-range base e800000, size 20000000 > [ 0.000000] - e800000 , 20000000 > [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 > [ 0.000000] - 396c0000 , a0000 > [ 0.000000] linux,usable-memory-range base 39770000, size 40000 > [ 0.000000] - 39770000 , 40000 > [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 > [ 0.000000] - 398a0000 , 20000 > [ 0.000000] initrd not fully accessible via the linear mapping -- > please check your bootloader ... This is an odd message coming from: |void __init arm64_memblock_init(void) |... | | if (WARN(base < memblock_start_of_DRAM() || | base + size > memblock_start_of_DRAM() + | linear_region_size, | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { Can you confirm how the condition breaks here? I suppose base: 0xfe70000 size: 0x13c0000 memblock_start_of_DRAM(): 0xe800000 according to the information you gave me. Thanks, -Takahiro AKASHI > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 > arm64_memblock_init+0x210/0x484 > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 > [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 > [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 > [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 > [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] > pstate: 600000c5 > [ 0.000000] sp : ffff000008ccfe80 > [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 > [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 > [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 > [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 > [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 > [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 > [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 > [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 > [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 > [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 > [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 > [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d > [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 > [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 > [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 > [ 0.000000] Call trace: > [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) > [ 0.000000] fd40: 0000000000000056 0000000000000000 > 0000000000000000 0000000000000000 > [ 0.000000] fd60: 0000000000000001 ffff000008c96360 > 000000000000000d 746f6f622072756f > [ 0.000000] fd80: ffff000008517414 00000000000000f4 > 2065687420616976 6d207261656e696c > [ 0.000000] fda0: 2d20676e69707061 657361656c70202d > 79206b6365686320 000000002be00842 > [ 0.000000] fdc0: ffff000008d05580 0000000000000000 > 000000000c283806 ffff000008afa000 > [ 0.000000] fde0: ffff000008080000 ffff000008afa000 > ffff000009680000 ffff000008ec0000 > [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 > 00000000013b0000 0000000011230000 > [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 > ffff000008b76984 ffff000008ccfe80 > [ 0.000000] fe40: ffff000008b76984 00000000600000c5 > ffff00000959b7a8 ffff000008ec0000 > [ 0.000000] fe60: ffffffffffffffff 0000000000000005 > ffff000008ccfe80 ffff000008b76984 > [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 > [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] random: get_random_bytes called from > print_oops_end_marker+0x50/0x6c with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr > [ 0.000000] cma: Failed to reserve 512 MiB > [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate > 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W > ------------ 4.14.0+ #7 > [ 0.000000] Call trace: > [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c > [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c > [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 > [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 > [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c > [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 > [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 > [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 > [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 > [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c > [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to > allocate 0x0000000000010000 bytes below 0x0000000000000000. > [ 0.000000] > > I guess it is because of the 1G alignment requirement between the > kernel image and the initrd and how we populate the holes between the > kernel image, segments (including dtb) and the initrd from the > kexec-tools. > > Akashi, any pointers on this will be helpful as well. > > Regards, > Bhupesh > > > >> > > >> > Regards, > >> > Bhupesh > >> > > >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> > >> via a kernel command line parameter, "memmap=". > >> > >> > >> > _______________________________________________ > >> > kexec mailing list -- kexec@lists.fedoraproject.org > >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171219050113.GF28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-19 5:01 ` AKASHI Takahiro (?) (?) @ 2017-12-20 19:52 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, James Morse, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, Matt Fleming On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... > > This is an odd message coming from: > |void __init arm64_memblock_init(void) > |... > | > | if (WARN(base < memblock_start_of_DRAM() || > | base + size > memblock_start_of_DRAM() + > | linear_region_size, > | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { > > Can you confirm how the condition breaks here? > I suppose > base: 0xfe70000 > size: 0x13c0000 > memblock_start_of_DRAM(): 0xe800000 > according to the information you gave me. Indeed, the first check 'base < memblock_start_of_DRAM()' in the following check fails: if (WARN(base < memblock_start_of_DRAM() || base + size > memblock_start_of_DRAM() + linear_region_size, Here are the values I am seeing on this board using the kernel and kexec-tools which have been modified to append the 'linux,usable-memory-range' with the acpi reclaim regions: base=fe70000, size=13c0000, memblock_start_of_DRAM=39620000 linear_region_size=800000000000 I suspect that the holes introduced by kexec-tools inside 'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see the code leg below): /* Put the other segments after the image. */ hole_min = image_base + arm64_mem.image_size; if (info->kexec_flags & KEXEC_ON_CRASH) hole_max = crash_reserved_mem.end; else hole_max = ULONG_MAX; should be updated to introduce appropriate handling of the acpi reclaim regions. I am not aware of the background of this handling in the kexec-tools. Do you think this can be at fault, Akashi? Regards, Bhupesh > >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. >> >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org >> >> > To unsubscribe send an email to kexec-leave-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A@public.gmane.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-20 19:52 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... > > This is an odd message coming from: > |void __init arm64_memblock_init(void) > |... > | > | if (WARN(base < memblock_start_of_DRAM() || > | base + size > memblock_start_of_DRAM() + > | linear_region_size, > | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { > > Can you confirm how the condition breaks here? > I suppose > base: 0xfe70000 > size: 0x13c0000 > memblock_start_of_DRAM(): 0xe800000 > according to the information you gave me. Indeed, the first check 'base < memblock_start_of_DRAM()' in the following check fails: if (WARN(base < memblock_start_of_DRAM() || base + size > memblock_start_of_DRAM() + linear_region_size, Here are the values I am seeing on this board using the kernel and kexec-tools which have been modified to append the 'linux,usable-memory-range' with the acpi reclaim regions: base=fe70000, size=13c0000, memblock_start_of_DRAM=39620000 linear_region_size=800000000000 I suspect that the holes introduced by kexec-tools inside 'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see the code leg below): /* Put the other segments after the image. */ hole_min = image_base + arm64_mem.image_size; if (info->kexec_flags & KEXEC_ON_CRASH) hole_max = crash_reserved_mem.end; else hole_max = ULONG_MAX; should be updated to introduce appropriate handling of the acpi reclaim regions. I am not aware of the background of this handling in the kexec-tools. Do you think this can be at fault, Akashi? Regards, Bhupesh > >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. >> >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec@lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-20 19:52 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... > > This is an odd message coming from: > |void __init arm64_memblock_init(void) > |... > | > | if (WARN(base < memblock_start_of_DRAM() || > | base + size > memblock_start_of_DRAM() + > | linear_region_size, > | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { > > Can you confirm how the condition breaks here? > I suppose > base: 0xfe70000 > size: 0x13c0000 > memblock_start_of_DRAM(): 0xe800000 > according to the information you gave me. Indeed, the first check 'base < memblock_start_of_DRAM()' in the following check fails: if (WARN(base < memblock_start_of_DRAM() || base + size > memblock_start_of_DRAM() + linear_region_size, Here are the values I am seeing on this board using the kernel and kexec-tools which have been modified to append the 'linux,usable-memory-range' with the acpi reclaim regions: base=fe70000, size=13c0000, memblock_start_of_DRAM=39620000 linear_region_size=800000000000 I suspect that the holes introduced by kexec-tools inside 'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see the code leg below): /* Put the other segments after the image. */ hole_min = image_base + arm64_mem.image_size; if (info->kexec_flags & KEXEC_ON_CRASH) hole_max = crash_reserved_mem.end; else hole_max = ULONG_MAX; should be updated to introduce appropriate handling of the acpi reclaim regions. I am not aware of the background of this handling in the kexec-tools. Do you think this can be at fault, Akashi? Regards, Bhupesh > >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. >> >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec at lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-20 19:52 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-20 19:52 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh SHARMA, Dave Young, Bhupesh Sharma, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, linux-efi, Mark Rutland, Matt Fleming On Tue, Dec 19, 2017 at 10:31 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 02:29:05PM +0530, Bhupesh SHARMA wrote: >> >> [snip..] >> >> [ 0.000000] linux,usable-memory-range base e800000, size 20000000 >> [ 0.000000] - e800000 , 20000000 >> [ 0.000000] linux,usable-memory-range base 396c0000, size a0000 >> [ 0.000000] - 396c0000 , a0000 >> [ 0.000000] linux,usable-memory-range base 39770000, size 40000 >> [ 0.000000] - 39770000 , 40000 >> [ 0.000000] linux,usable-memory-range base 398a0000, size 20000 >> [ 0.000000] - 398a0000 , 20000 >> [ 0.000000] initrd not fully accessible via the linear mapping -- >> please check your bootloader ... > > This is an odd message coming from: > |void __init arm64_memblock_init(void) > |... > | > | if (WARN(base < memblock_start_of_DRAM() || > | base + size > memblock_start_of_DRAM() + > | linear_region_size, > | "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) { > > Can you confirm how the condition breaks here? > I suppose > base: 0xfe70000 > size: 0x13c0000 > memblock_start_of_DRAM(): 0xe800000 > according to the information you gave me. Indeed, the first check 'base < memblock_start_of_DRAM()' in the following check fails: if (WARN(base < memblock_start_of_DRAM() || base + size > memblock_start_of_DRAM() + linear_region_size, Here are the values I am seeing on this board using the kernel and kexec-tools which have been modified to append the 'linux,usable-memory-range' with the acpi reclaim regions: base=fe70000, size=13c0000, memblock_start_of_DRAM=39620000 linear_region_size=800000000000 I suspect that the holes introduced by kexec-tools inside 'arm64_load_other_segments()' in 'kexec/arch/arm64/kexec-arm64.c' (see the code leg below): /* Put the other segments after the image. */ hole_min = image_base + arm64_mem.image_size; if (info->kexec_flags & KEXEC_ON_CRASH) hole_max = crash_reserved_mem.end; else hole_max = ULONG_MAX; should be updated to introduce appropriate handling of the acpi reclaim regions. I am not aware of the background of this handling in the kexec-tools. Do you think this can be at fault, Akashi? Regards, Bhupesh > >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:597 >> arm64_memblock_init+0x210/0x484 >> [ 0.000000] Modules linked in: >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0+ #7 >> [ 0.000000] task: ffff000008d05580 task.stack: ffff000008cc0000 >> [ 0.000000] PC is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] LR is at arm64_memblock_init+0x210/0x484 >> [ 0.000000] pc : [<ffff000008b76984>] lr : [<ffff000008b76984>] >> pstate: 600000c5 >> [ 0.000000] sp : ffff000008ccfe80 >> [ 0.000000] x29: ffff000008ccfe80 x28: 000000000f370018 >> [ 0.000000] x27: 0000000011230000 x26: 00000000013b0000 >> [ 0.000000] x25: 000000000fe80000 x24: ffff000008cf3000 >> [ 0.000000] x23: ffff000008ec0000 x22: ffff000009680000 >> [ 0.000000] x21: ffff000008afa000 x20: ffff000008080000 >> [ 0.000000] x19: ffff000008afa000 x18: 000000000c283806 >> [ 0.000000] x17: 0000000000000000 x16: ffff000008d05580 >> [ 0.000000] x15: 000000002be00842 x14: 79206b6365686320 >> [ 0.000000] x13: 657361656c70202d x12: 2d20676e69707061 >> [ 0.000000] x11: 6d207261656e696c x10: 2065687420616976 >> [ 0.000000] x9 : 00000000000000f4 x8 : ffff000008517414 >> [ 0.000000] x7 : 746f6f622072756f x6 : 000000000000000d >> [ 0.000000] x5 : ffff000008c96360 x4 : 0000000000000001 >> [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 >> [ 0.000000] x1 : 0000000000000000 x0 : 0000000000000056 >> [ 0.000000] Call trace: >> [ 0.000000] Exception stack(0xffff000008ccfd40 to 0xffff000008ccfe80) >> [ 0.000000] fd40: 0000000000000056 0000000000000000 >> 0000000000000000 0000000000000000 >> [ 0.000000] fd60: 0000000000000001 ffff000008c96360 >> 000000000000000d 746f6f622072756f >> [ 0.000000] fd80: ffff000008517414 00000000000000f4 >> 2065687420616976 6d207261656e696c >> [ 0.000000] fda0: 2d20676e69707061 657361656c70202d >> 79206b6365686320 000000002be00842 >> [ 0.000000] fdc0: ffff000008d05580 0000000000000000 >> 000000000c283806 ffff000008afa000 >> [ 0.000000] fde0: ffff000008080000 ffff000008afa000 >> ffff000009680000 ffff000008ec0000 >> [ 0.000000] fe00: ffff000008cf3000 000000000fe80000 >> 00000000013b0000 0000000011230000 >> [ 0.000000] fe20: 000000000f370018 ffff000008ccfe80 >> ffff000008b76984 ffff000008ccfe80 >> [ 0.000000] fe40: ffff000008b76984 00000000600000c5 >> ffff00000959b7a8 ffff000008ec0000 >> [ 0.000000] fe60: ffffffffffffffff 0000000000000005 >> ffff000008ccfe80 ffff000008b76984 >> [ 0.000000] [<ffff000008b76984>] arm64_memblock_init+0x210/0x484 >> [ 0.000000] [<ffff000008b7398c>] setup_arch+0x1b8/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] random: get_random_bytes called from >> print_oops_end_marker+0x50/0x6c with crng_init=0 >> [ 0.000000] ---[ end trace 0000000000000000 ]--- >> [ 0.000000] Reserving 4KB of memory at 0x2e7f0000 for elfcorehdr >> [ 0.000000] cma: Failed to reserve 512 MiB >> [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate >> 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W >> ------------ 4.14.0+ #7 >> [ 0.000000] Call trace: >> [ 0.000000] [<ffff000008088da8>] dump_backtrace+0x0/0x23c >> [ 0.000000] [<ffff000008089008>] show_stack+0x24/0x2c >> [ 0.000000] [<ffff0000087f647c>] dump_stack+0x84/0xa8 >> [ 0.000000] [<ffff0000080cfd44>] panic+0x138/0x2a0 >> [ 0.000000] [<ffff000008b95c88>] memblock_alloc_base+0x44/0x4c >> [ 0.000000] [<ffff000008b95cbc>] memblock_alloc+0x2c/0x38 >> [ 0.000000] [<ffff000008b772dc>] early_pgtable_alloc+0x20/0x74 >> [ 0.000000] [<ffff000008b7755c>] paging_init+0x28/0x544 >> [ 0.000000] [<ffff000008b73990>] setup_arch+0x1bc/0x5f4 >> [ 0.000000] [<ffff000008b70a10>] start_kernel+0x74/0x43c >> [ 0.000000] ---[ end Kernel panic - not syncing: ERROR: Failed to >> allocate 0x0000000000010000 bytes below 0x0000000000000000. >> [ 0.000000] >> >> I guess it is because of the 1G alignment requirement between the >> kernel image and the initrd and how we populate the holes between the >> kernel image, segments (including dtb) and the initrd from the >> kexec-tools. >> >> Akashi, any pointers on this will be helpful as well. >> >> Regards, >> Bhupesh >> >> >> >> > >> >> > Regards, >> >> > Bhupesh >> >> > >> >> > >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> > >> via a kernel command line parameter, "memmap=". >> >> > >> >> >> > _______________________________________________ >> >> > kexec mailing list -- kexec@lists.fedoraproject.org >> >> > To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 5:16 ` Dave Young (?) (?) @ 2017-12-18 21:28 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw) To: Dave Young Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming Hi Dave, On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec@lists.infradead.org > > Also add linux-acpi list > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >> > On 15 December 2017 at 09:59, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> <takahiro.akashi@linaro.org> wrote: >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> >> <takahiro.akashi@linaro.org> wrote: >> >>> >> > Bhupesh, Ard, >> >>> >> > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> >> >> Hi Ard, Akashi >> >>> >> >> >> >>> >> > (snip) >> >>> >> > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> >> >> , for details) >> >>> >> > >> >>> >> > Right. >> >>> >> > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> >> >> with the crashkernel memory range: >> >>> >> >> >> >>> >> >> /* add linux,usable-memory-range */ >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> >> >> address_cells, size_cells); >> >>> >> >> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> >> >> , for details) >> >>> >> >> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> >> >> they are marked as System RAM or as RESERVED. As, >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> >> >> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> >> >> ACPI memory and crashes while trying to access the same: >> >>> >> >> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> >> >> -r`.img --reuse-cmdline -d >> >>> >> >> >> >>> >> >> [snip..] >> >>> >> >> >> >>> >> >> Reserved memory range >> >>> >> >> 000000000e800000-000000002e7fffff (0) >> >>> >> >> >> >>> >> >> Coredump memory ranges >> >>> >> >> 0000000000000000-000000000e7fffff (0) >> >>> >> >> 000000002e800000-000000003961ffff (0) >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >>> >> >> 000000a000000000-000000affbffffff (0) >> >>> >> >> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> >> >> memory cap'ing passed to the crash kernel inside >> >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >>> >> >> >> >>> >> >> static void __init fdt_enforce_memory_region(void) >> >>> >> >> { >> >>> >> >> struct memblock_region reg = { >> >>> >> >> .size = 0, >> >>> >> >> }; >> >>> >> >> >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> >> >> >> >>> >> >> if (reg.size) >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> >> >> comment this out */ >> >>> >> >> } >> >>> >> > >> >>> >> > Please just don't do that. It can cause a fatal damage on >> >>> >> > memory contents of the *crashed* kernel. >> >>> >> > >> >>> >> >> 5). Both the above temporary solutions fix the problem. >> >>> >> >> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> >> >> fail. >> >>> >> >> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> >> >> dt node 'linux,usable-memory-range' >> >>> >> > >> >>> >> > I still don't understand why we need to carry over the information >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> >> > such regions are free to be reused by the kernel after some point of >> >>> >> > initialization. Why does crash dump kernel need to know about them? >> >>> >> > >> >>> >> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> >> kernel, those regions needs to be preserved, which is why they are >> >>> >> memblock_reserve()'d now. >> >>> > >> >>> > For my better understandings, who is actually accessing such regions >> >>> > during boot time, uefi itself or efistub? >> >>> > >> >>> >> >>> No, only the kernel. This is where the ACPI tables are stored. For >> >>> instance, on QEMU we have >> >>> >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> 01000013) >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> BXPC 00000001) >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> BXPC 00000001) >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> BXPC 00000001) >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> BXPC 00000001) >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> BXPC 00000001) >> >>> >> >>> covered by >> >>> >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> ... >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >> OK. I mistakenly understood those regions could be freed after exiting >> >> UEFI boot services. >> >> >> >>> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >>> >> when booting the next kernel. >> >>> > >> >>> > not really. >> >>> > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> >> > on crash dump kernel?) >> >>> >> > >> >>> >> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >>> >> regions only revealed the bug, not created it (given that other >> >>> >> memblock_reserve regions may be affected as well) >> >>> > >> >>> > As whether we should honor such reserved regions over kexec'ing >> >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > As a matter of fact, no information about "reserved" memblocks is >> >>> > exposed to user space (via proc/iomem). >> >>> > >> >>> >> >>> That is why I suggested (somewhere in this thread?) to not expose them >> >>> as 'System RAM'. Do you think that could solve this? >> >> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> marking them under another name in /proc/iomem would also be good in order >> >> not to allocate them as part of crash kernel's memory. >> >> >> > >> > I agree. However, this may not be entirely trivial, since iterating >> > over the memblock_reserved table and creating iomem entries may result >> > in collisions. >> >> I found a method (using the patch I shared earlier in this thread) to mark these >> entries as 'ACPI reclaim memory' ranges rather than System RAM or >> reserved regions. >> >> >> But I'm not still convinced that we should export them in useable- >> >> memory-range to crash dump kernel. They will be accessed through >> >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> (or memblocks), I guess. >> > >> > Agreed. They will be covered by the linear mapping in the boot kernel, >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > which is exactly what we want in this case. >> >> Now this is what is confusing me. I don't see the above happening. >> >> I see that the primary kernel boots up and adds the ACPI regions via: >> acpi_os_ioremap >> -> ioremap_cache >> >> But during the crashkernel boot, ''acpi_os_ioremap' calls >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> variant. >> >> And it fails while accessing the ACPI tables: >> >> [ 0.039205] ACPI: Core revision 20170728 >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> [ 0.100022] Modules linked in: >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> pstate: 60000045 >> [ 0.132647] sp : ffff000008ccfb40 >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> [ 0.223224] Call trace: >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> ffff0000095e3980 ffff000008ccfbe0 >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> ffff000008ccfc50 0000000000000000 >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> 00000000ffffff76 0000000000000006 >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> 000000000000038e 0000000000000000 >> [ 0.263843] fa80: 0000000000000000 0000000000000000 >> 0000000000000005 000000000000001b >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> ffff000009710027 0000000000000001 >> [ 0.279667] fac0: 0000000000000001 000000000000001b >> 0000000000000000 ffff0000088be820 >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> ffff00000849b4f8 ffff000008ccfb40 >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> ffff000008ccfb40 ffff000008260a18 >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> ffff000008ccfb40 ffff0000084a6764 >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> [ 0.399160] Kernel panic - not syncing: Fatal exception >> [ 0.404437] Rebooting in 10 seconds. >> >> So, I think the linear mapping done by the primary kernel does not >> make these accessible in the crash kernel directly. >> >> Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? gdb points to the following code line number: (gdb) list *(acpi_ns_lookup+0x25c) 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). 572 } 573 } 574 575 /* Extract one ACPI name from the front of the pathname */ 576 577 ACPI_MOVE_32_TO_32(&simple_name, path); 578 579 /* Try to find the single (4 character) ACPI name */ 580 581 status = (gdb) i.e. ACPI_MOVE_32_TO_32(&simple_name, path); addr2line also confirms the same: # addr2line -e vmlinux ffff0000084aa250 /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 Regards, Bhupesh >> >> Regards, >> Bhupesh >> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> via a kernel command line parameter, "memmap=". >> >> >> _______________________________________________ >> kexec mailing list -- kexec@lists.fedoraproject.org >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 21:28 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw) To: Dave Young Cc: Mark Rutland, linux-efi, AKASHI Takahiro, Matt Fleming, Ard Biesheuvel, kexec, linux-kernel, linux-acpi, James Morse, Bhupesh SHARMA, linux-arm-kernel Hi Dave, On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec@lists.infradead.org > > Also add linux-acpi list > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >> > On 15 December 2017 at 09:59, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> <takahiro.akashi@linaro.org> wrote: >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> >> <takahiro.akashi@linaro.org> wrote: >> >>> >> > Bhupesh, Ard, >> >>> >> > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> >> >> Hi Ard, Akashi >> >>> >> >> >> >>> >> > (snip) >> >>> >> > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> >> >> , for details) >> >>> >> > >> >>> >> > Right. >> >>> >> > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> >> >> with the crashkernel memory range: >> >>> >> >> >> >>> >> >> /* add linux,usable-memory-range */ >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> >> >> address_cells, size_cells); >> >>> >> >> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> >> >> , for details) >> >>> >> >> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> >> >> they are marked as System RAM or as RESERVED. As, >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> >> >> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> >> >> ACPI memory and crashes while trying to access the same: >> >>> >> >> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> >> >> -r`.img --reuse-cmdline -d >> >>> >> >> >> >>> >> >> [snip..] >> >>> >> >> >> >>> >> >> Reserved memory range >> >>> >> >> 000000000e800000-000000002e7fffff (0) >> >>> >> >> >> >>> >> >> Coredump memory ranges >> >>> >> >> 0000000000000000-000000000e7fffff (0) >> >>> >> >> 000000002e800000-000000003961ffff (0) >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >>> >> >> 000000a000000000-000000affbffffff (0) >> >>> >> >> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> >> >> memory cap'ing passed to the crash kernel inside >> >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >>> >> >> >> >>> >> >> static void __init fdt_enforce_memory_region(void) >> >>> >> >> { >> >>> >> >> struct memblock_region reg = { >> >>> >> >> .size = 0, >> >>> >> >> }; >> >>> >> >> >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> >> >> >> >>> >> >> if (reg.size) >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> >> >> comment this out */ >> >>> >> >> } >> >>> >> > >> >>> >> > Please just don't do that. It can cause a fatal damage on >> >>> >> > memory contents of the *crashed* kernel. >> >>> >> > >> >>> >> >> 5). Both the above temporary solutions fix the problem. >> >>> >> >> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> >> >> fail. >> >>> >> >> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> >> >> dt node 'linux,usable-memory-range' >> >>> >> > >> >>> >> > I still don't understand why we need to carry over the information >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> >> > such regions are free to be reused by the kernel after some point of >> >>> >> > initialization. Why does crash dump kernel need to know about them? >> >>> >> > >> >>> >> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> >> kernel, those regions needs to be preserved, which is why they are >> >>> >> memblock_reserve()'d now. >> >>> > >> >>> > For my better understandings, who is actually accessing such regions >> >>> > during boot time, uefi itself or efistub? >> >>> > >> >>> >> >>> No, only the kernel. This is where the ACPI tables are stored. For >> >>> instance, on QEMU we have >> >>> >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> 01000013) >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> BXPC 00000001) >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> BXPC 00000001) >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> BXPC 00000001) >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> BXPC 00000001) >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> BXPC 00000001) >> >>> >> >>> covered by >> >>> >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> ... >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >> OK. I mistakenly understood those regions could be freed after exiting >> >> UEFI boot services. >> >> >> >>> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >>> >> when booting the next kernel. >> >>> > >> >>> > not really. >> >>> > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> >> > on crash dump kernel?) >> >>> >> > >> >>> >> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >>> >> regions only revealed the bug, not created it (given that other >> >>> >> memblock_reserve regions may be affected as well) >> >>> > >> >>> > As whether we should honor such reserved regions over kexec'ing >> >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > As a matter of fact, no information about "reserved" memblocks is >> >>> > exposed to user space (via proc/iomem). >> >>> > >> >>> >> >>> That is why I suggested (somewhere in this thread?) to not expose them >> >>> as 'System RAM'. Do you think that could solve this? >> >> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> marking them under another name in /proc/iomem would also be good in order >> >> not to allocate them as part of crash kernel's memory. >> >> >> > >> > I agree. However, this may not be entirely trivial, since iterating >> > over the memblock_reserved table and creating iomem entries may result >> > in collisions. >> >> I found a method (using the patch I shared earlier in this thread) to mark these >> entries as 'ACPI reclaim memory' ranges rather than System RAM or >> reserved regions. >> >> >> But I'm not still convinced that we should export them in useable- >> >> memory-range to crash dump kernel. They will be accessed through >> >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> (or memblocks), I guess. >> > >> > Agreed. They will be covered by the linear mapping in the boot kernel, >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > which is exactly what we want in this case. >> >> Now this is what is confusing me. I don't see the above happening. >> >> I see that the primary kernel boots up and adds the ACPI regions via: >> acpi_os_ioremap >> -> ioremap_cache >> >> But during the crashkernel boot, ''acpi_os_ioremap' calls >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> variant. >> >> And it fails while accessing the ACPI tables: >> >> [ 0.039205] ACPI: Core revision 20170728 >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> [ 0.100022] Modules linked in: >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> pstate: 60000045 >> [ 0.132647] sp : ffff000008ccfb40 >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> [ 0.223224] Call trace: >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> ffff0000095e3980 ffff000008ccfbe0 >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> ffff000008ccfc50 0000000000000000 >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> 00000000ffffff76 0000000000000006 >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> 000000000000038e 0000000000000000 >> [ 0.263843] fa80: 0000000000000000 0000000000000000 >> 0000000000000005 000000000000001b >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> ffff000009710027 0000000000000001 >> [ 0.279667] fac0: 0000000000000001 000000000000001b >> 0000000000000000 ffff0000088be820 >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> ffff00000849b4f8 ffff000008ccfb40 >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> ffff000008ccfb40 ffff000008260a18 >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> ffff000008ccfb40 ffff0000084a6764 >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> [ 0.399160] Kernel panic - not syncing: Fatal exception >> [ 0.404437] Rebooting in 10 seconds. >> >> So, I think the linear mapping done by the primary kernel does not >> make these accessible in the crash kernel directly. >> >> Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? gdb points to the following code line number: (gdb) list *(acpi_ns_lookup+0x25c) 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). 572 } 573 } 574 575 /* Extract one ACPI name from the front of the pathname */ 576 577 ACPI_MOVE_32_TO_32(&simple_name, path); 578 579 /* Try to find the single (4 character) ACPI name */ 580 581 status = (gdb) i.e. ACPI_MOVE_32_TO_32(&simple_name, path); addr2line also confirms the same: # addr2line -e vmlinux ffff0000084aa250 /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 Regards, Bhupesh >> >> Regards, >> Bhupesh >> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> via a kernel command line parameter, "memmap=". >> >> >> _______________________________________________ >> kexec mailing list -- kexec@lists.fedoraproject.org >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 21:28 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw) To: linux-arm-kernel Hi Dave, On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec at lists.infradead.org > > Also add linux-acpi list > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >> > On 15 December 2017 at 09:59, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> <takahiro.akashi@linaro.org> wrote: >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> >> <takahiro.akashi@linaro.org> wrote: >> >>> >> > Bhupesh, Ard, >> >>> >> > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> >> >> Hi Ard, Akashi >> >>> >> >> >> >>> >> > (snip) >> >>> >> > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> >> >> , for details) >> >>> >> > >> >>> >> > Right. >> >>> >> > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> >> >> with the crashkernel memory range: >> >>> >> >> >> >>> >> >> /* add linux,usable-memory-range */ >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> >> >> address_cells, size_cells); >> >>> >> >> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> >> >> , for details) >> >>> >> >> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> >> >> they are marked as System RAM or as RESERVED. As, >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> >> >> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> >> >> ACPI memory and crashes while trying to access the same: >> >>> >> >> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> >> >> -r`.img --reuse-cmdline -d >> >>> >> >> >> >>> >> >> [snip..] >> >>> >> >> >> >>> >> >> Reserved memory range >> >>> >> >> 000000000e800000-000000002e7fffff (0) >> >>> >> >> >> >>> >> >> Coredump memory ranges >> >>> >> >> 0000000000000000-000000000e7fffff (0) >> >>> >> >> 000000002e800000-000000003961ffff (0) >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >>> >> >> 000000a000000000-000000affbffffff (0) >> >>> >> >> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> >> >> memory cap'ing passed to the crash kernel inside >> >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >>> >> >> >> >>> >> >> static void __init fdt_enforce_memory_region(void) >> >>> >> >> { >> >>> >> >> struct memblock_region reg = { >> >>> >> >> .size = 0, >> >>> >> >> }; >> >>> >> >> >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> >> >> >> >>> >> >> if (reg.size) >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> >> >> comment this out */ >> >>> >> >> } >> >>> >> > >> >>> >> > Please just don't do that. It can cause a fatal damage on >> >>> >> > memory contents of the *crashed* kernel. >> >>> >> > >> >>> >> >> 5). Both the above temporary solutions fix the problem. >> >>> >> >> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> >> >> fail. >> >>> >> >> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> >> >> dt node 'linux,usable-memory-range' >> >>> >> > >> >>> >> > I still don't understand why we need to carry over the information >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> >> > such regions are free to be reused by the kernel after some point of >> >>> >> > initialization. Why does crash dump kernel need to know about them? >> >>> >> > >> >>> >> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> >> kernel, those regions needs to be preserved, which is why they are >> >>> >> memblock_reserve()'d now. >> >>> > >> >>> > For my better understandings, who is actually accessing such regions >> >>> > during boot time, uefi itself or efistub? >> >>> > >> >>> >> >>> No, only the kernel. This is where the ACPI tables are stored. For >> >>> instance, on QEMU we have >> >>> >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> 01000013) >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> BXPC 00000001) >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> BXPC 00000001) >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> BXPC 00000001) >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> BXPC 00000001) >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> BXPC 00000001) >> >>> >> >>> covered by >> >>> >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> ... >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >> OK. I mistakenly understood those regions could be freed after exiting >> >> UEFI boot services. >> >> >> >>> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >>> >> when booting the next kernel. >> >>> > >> >>> > not really. >> >>> > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> >> > on crash dump kernel?) >> >>> >> > >> >>> >> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >>> >> regions only revealed the bug, not created it (given that other >> >>> >> memblock_reserve regions may be affected as well) >> >>> > >> >>> > As whether we should honor such reserved regions over kexec'ing >> >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > As a matter of fact, no information about "reserved" memblocks is >> >>> > exposed to user space (via proc/iomem). >> >>> > >> >>> >> >>> That is why I suggested (somewhere in this thread?) to not expose them >> >>> as 'System RAM'. Do you think that could solve this? >> >> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> marking them under another name in /proc/iomem would also be good in order >> >> not to allocate them as part of crash kernel's memory. >> >> >> > >> > I agree. However, this may not be entirely trivial, since iterating >> > over the memblock_reserved table and creating iomem entries may result >> > in collisions. >> >> I found a method (using the patch I shared earlier in this thread) to mark these >> entries as 'ACPI reclaim memory' ranges rather than System RAM or >> reserved regions. >> >> >> But I'm not still convinced that we should export them in useable- >> >> memory-range to crash dump kernel. They will be accessed through >> >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> (or memblocks), I guess. >> > >> > Agreed. They will be covered by the linear mapping in the boot kernel, >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > which is exactly what we want in this case. >> >> Now this is what is confusing me. I don't see the above happening. >> >> I see that the primary kernel boots up and adds the ACPI regions via: >> acpi_os_ioremap >> -> ioremap_cache >> >> But during the crashkernel boot, ''acpi_os_ioremap' calls >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> variant. >> >> And it fails while accessing the ACPI tables: >> >> [ 0.039205] ACPI: Core revision 20170728 >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> [ 0.100022] Modules linked in: >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> pstate: 60000045 >> [ 0.132647] sp : ffff000008ccfb40 >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> [ 0.223224] Call trace: >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> ffff0000095e3980 ffff000008ccfbe0 >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> ffff000008ccfc50 0000000000000000 >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> 00000000ffffff76 0000000000000006 >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> 000000000000038e 0000000000000000 >> [ 0.263843] fa80: 0000000000000000 0000000000000000 >> 0000000000000005 000000000000001b >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> ffff000009710027 0000000000000001 >> [ 0.279667] fac0: 0000000000000001 000000000000001b >> 0000000000000000 ffff0000088be820 >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> ffff00000849b4f8 ffff000008ccfb40 >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> ffff000008ccfb40 ffff000008260a18 >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> ffff000008ccfb40 ffff0000084a6764 >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> [ 0.399160] Kernel panic - not syncing: Fatal exception >> [ 0.404437] Rebooting in 10 seconds. >> >> So, I think the linear mapping done by the primary kernel does not >> make these accessible in the crash kernel directly. >> >> Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? gdb points to the following code line number: (gdb) list *(acpi_ns_lookup+0x25c) 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). 572 } 573 } 574 575 /* Extract one ACPI name from the front of the pathname */ 576 577 ACPI_MOVE_32_TO_32(&simple_name, path); 578 579 /* Try to find the single (4 character) ACPI name */ 580 581 status = (gdb) i.e. ACPI_MOVE_32_TO_32(&simple_name, path); addr2line also confirms the same: # addr2line -e vmlinux ffff0000084aa250 /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 Regards, Bhupesh >> >> Regards, >> Bhupesh >> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> via a kernel command line parameter, "memmap=". >> >> >> _______________________________________________ >> kexec mailing list -- kexec at lists.fedoraproject.org >> To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 21:28 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-18 21:28 UTC (permalink / raw) To: Dave Young Cc: Ard Biesheuvel, kexec, linux-acpi, linux-kernel, AKASHI Takahiro, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming Hi Dave, On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > to kexec@lists.infradead.org > > Also add linux-acpi list > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >> > On 15 December 2017 at 09:59, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> <takahiro.akashi@linaro.org> wrote: >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> >> <takahiro.akashi@linaro.org> wrote: >> >>> >> > Bhupesh, Ard, >> >>> >> > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> >> >> Hi Ard, Akashi >> >>> >> >> >> >>> >> > (snip) >> >>> >> > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> >> >> identify its own usable memory and exclude, at its boot time, any >> >>> >> >> other memory areas that are part of the panicked kernel's memory. >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> >> >> , for details) >> >>> >> > >> >>> >> > Right. >> >>> >> > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> >> >> with the crashkernel memory range: >> >>> >> >> >> >>> >> >> /* add linux,usable-memory-range */ >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> >> >> address_cells, size_cells); >> >>> >> >> >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> >> >> , for details) >> >>> >> >> >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> >> >> they are marked as System RAM or as RESERVED. As, >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> >> >> >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> >> >> ACPI memory and crashes while trying to access the same: >> >>> >> >> >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> >> >> -r`.img --reuse-cmdline -d >> >>> >> >> >> >>> >> >> [snip..] >> >>> >> >> >> >>> >> >> Reserved memory range >> >>> >> >> 000000000e800000-000000002e7fffff (0) >> >>> >> >> >> >>> >> >> Coredump memory ranges >> >>> >> >> 0000000000000000-000000000e7fffff (0) >> >>> >> >> 000000002e800000-000000003961ffff (0) >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> >> >> 000000003ed60000-000000003fbfffff (0) >> >>> >> >> 0000001040000000-0000001ffbffffff (0) >> >>> >> >> 0000002000000000-0000002ffbffffff (0) >> >>> >> >> 0000009000000000-0000009ffbffffff (0) >> >>> >> >> 000000a000000000-000000affbffffff (0) >> >>> >> >> >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> >> >> memory cap'ing passed to the crash kernel inside >> >>> >> >> 'arch/arm64/mm/init.c' (see below): >> >>> >> >> >> >>> >> >> static void __init fdt_enforce_memory_region(void) >> >>> >> >> { >> >>> >> >> struct memblock_region reg = { >> >>> >> >> .size = 0, >> >>> >> >> }; >> >>> >> >> >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> >> >> >> >>> >> >> if (reg.size) >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> >> >> comment this out */ >> >>> >> >> } >> >>> >> > >> >>> >> > Please just don't do that. It can cause a fatal damage on >> >>> >> > memory contents of the *crashed* kernel. >> >>> >> > >> >>> >> >> 5). Both the above temporary solutions fix the problem. >> >>> >> >> >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> >> >> fail. >> >>> >> >> >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> >> >> dt node 'linux,usable-memory-range' >> >>> >> > >> >>> >> > I still don't understand why we need to carry over the information >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> >> > such regions are free to be reused by the kernel after some point of >> >>> >> > initialization. Why does crash dump kernel need to know about them? >> >>> >> > >> >>> >> >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> >> kernel, those regions needs to be preserved, which is why they are >> >>> >> memblock_reserve()'d now. >> >>> > >> >>> > For my better understandings, who is actually accessing such regions >> >>> > during boot time, uefi itself or efistub? >> >>> > >> >>> >> >>> No, only the kernel. This is where the ACPI tables are stored. For >> >>> instance, on QEMU we have >> >>> >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> 01000013) >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> BXPC 00000001) >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> BXPC 00000001) >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> BXPC 00000001) >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> BXPC 00000001) >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> BXPC 00000001) >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> BXPC 00000001) >> >>> >> >>> covered by >> >>> >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> ... >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >> OK. I mistakenly understood those regions could be freed after exiting >> >> UEFI boot services. >> >> >> >>> >> >>> >> So it seems that kexec does not honour the memblock_reserve() table >> >>> >> when booting the next kernel. >> >>> > >> >>> > not really. >> >>> > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> >> > on crash dump kernel?) >> >>> >> > >> >>> >> >> >>> >> I don't think so. And the change to the handling of ACPI reclaim >> >>> >> regions only revealed the bug, not created it (given that other >> >>> >> memblock_reserve regions may be affected as well) >> >>> > >> >>> > As whether we should honor such reserved regions over kexec'ing >> >>> > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > As a matter of fact, no information about "reserved" memblocks is >> >>> > exposed to user space (via proc/iomem). >> >>> > >> >>> >> >>> That is why I suggested (somewhere in this thread?) to not expose them >> >>> as 'System RAM'. Do you think that could solve this? >> >> >> >> Memblock-reserv'ing them is necessary to prevent their corruption and >> >> marking them under another name in /proc/iomem would also be good in order >> >> not to allocate them as part of crash kernel's memory. >> >> >> > >> > I agree. However, this may not be entirely trivial, since iterating >> > over the memblock_reserved table and creating iomem entries may result >> > in collisions. >> >> I found a method (using the patch I shared earlier in this thread) to mark these >> entries as 'ACPI reclaim memory' ranges rather than System RAM or >> reserved regions. >> >> >> But I'm not still convinced that we should export them in useable- >> >> memory-range to crash dump kernel. They will be accessed through >> >> acpi_os_map_memory() and so won't be required to be part of system ram >> >> (or memblocks), I guess. >> > >> > Agreed. They will be covered by the linear mapping in the boot kernel, >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, >> > which is exactly what we want in this case. >> >> Now this is what is confusing me. I don't see the above happening. >> >> I see that the primary kernel boots up and adds the ACPI regions via: >> acpi_os_ioremap >> -> ioremap_cache >> >> But during the crashkernel boot, ''acpi_os_ioremap' calls >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache >> variant. >> >> And it fails while accessing the ACPI tables: >> >> [ 0.039205] ACPI: Core revision 20170728 >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP >> [ 0.100022] Modules linked in: >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] >> pstate: 60000045 >> [ 0.132647] sp : ffff000008ccfb40 >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) >> [ 0.223224] Call trace: >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 >> ffff0000095e3980 ffff000008ccfbe0 >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 >> ffff000008ccfc50 0000000000000000 >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f >> 00000000ffffff76 0000000000000006 >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 >> 000000000000038e 0000000000000000 >> [ 0.263843] fa80: 0000000000000000 0000000000000000 >> 0000000000000005 000000000000001b >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 >> ffff000009710027 0000000000000001 >> [ 0.279667] fac0: 0000000000000001 000000000000001b >> 0000000000000000 ffff0000088be820 >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 >> ffff00000849b4f8 ffff000008ccfb40 >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 >> ffff000008ccfb40 ffff000008260a18 >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 >> ffff000008ccfb40 ffff0000084a6764 >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- >> [ 0.399160] Kernel panic - not syncing: Fatal exception >> [ 0.404437] Rebooting in 10 seconds. >> >> So, I think the linear mapping done by the primary kernel does not >> make these accessible in the crash kernel directly. >> >> Any pointers? > > Can you get the code line number for acpi_ns_lookup+0x25c? gdb points to the following code line number: (gdb) list *(acpi_ns_lookup+0x25c) 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). 572 } 573 } 574 575 /* Extract one ACPI name from the front of the pathname */ 576 577 ACPI_MOVE_32_TO_32(&simple_name, path); 578 579 /* Try to find the single (4 character) ACPI name */ 580 581 status = (gdb) i.e. ACPI_MOVE_32_TO_32(&simple_name, path); addr2line also confirms the same: # addr2line -e vmlinux ffff0000084aa250 /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 Regards, Bhupesh >> >> Regards, >> Bhupesh >> >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> via a kernel command line parameter, "memmap=". >> >> >> _______________________________________________ >> kexec mailing list -- kexec@lists.fedoraproject.org >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 21:28 ` Bhupesh Sharma (?) (?) @ 2017-12-19 5:25 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:25 UTC (permalink / raw) To: Bhupesh Sharma Cc: Dave Young, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote: > Hi Dave, > > On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > > to kexec@lists.infradead.org > > > > Also add linux-acpi list > > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >> > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > <takahiro.akashi@linaro.org> wrote: > >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> <takahiro.akashi@linaro.org> wrote: > >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> >> <takahiro.akashi@linaro.org> wrote: > >> >>> >> > Bhupesh, Ard, > >> >>> >> > > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> >> >> Hi Ard, Akashi > >> >>> >> >> > >> >>> >> > (snip) > >> >>> >> > > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> >> >> , for details) > >> >>> >> > > >> >>> >> > Right. > >> >>> >> > > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> >> >> with the crashkernel memory range: > >> >>> >> >> > >> >>> >> >> /* add linux,usable-memory-range */ > >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> >> >> address_cells, size_cells); > >> >>> >> >> > >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> >> >> , for details) > >> >>> >> >> > >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> >> >> they are marked as System RAM or as RESERVED. As, > >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> >> >> > >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> >> >> ACPI memory and crashes while trying to access the same: > >> >>> >> >> > >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> >> >> -r`.img --reuse-cmdline -d > >> >>> >> >> > >> >>> >> >> [snip..] > >> >>> >> >> > >> >>> >> >> Reserved memory range > >> >>> >> >> 000000000e800000-000000002e7fffff (0) > >> >>> >> >> > >> >>> >> >> Coredump memory ranges > >> >>> >> >> 0000000000000000-000000000e7fffff (0) > >> >>> >> >> 000000002e800000-000000003961ffff (0) > >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> >> >> 000000a000000000-000000affbffffff (0) > >> >>> >> >> > >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> >> >> memory cap'ing passed to the crash kernel inside > >> >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> >> >> > >> >>> >> >> static void __init fdt_enforce_memory_region(void) > >> >>> >> >> { > >> >>> >> >> struct memblock_region reg = { > >> >>> >> >> .size = 0, > >> >>> >> >> }; > >> >>> >> >> > >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> >> >> > >> >>> >> >> if (reg.size) > >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> >> >> comment this out */ > >> >>> >> >> } > >> >>> >> > > >> >>> >> > Please just don't do that. It can cause a fatal damage on > >> >>> >> > memory contents of the *crashed* kernel. > >> >>> >> > > >> >>> >> >> 5). Both the above temporary solutions fix the problem. > >> >>> >> >> > >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> >> >> fail. > >> >>> >> >> > >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> >> >> dt node 'linux,usable-memory-range' > >> >>> >> > > >> >>> >> > I still don't understand why we need to carry over the information > >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> >> > such regions are free to be reused by the kernel after some point of > >> >>> >> > initialization. Why does crash dump kernel need to know about them? > >> >>> >> > > >> >>> >> > >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> >> kernel, those regions needs to be preserved, which is why they are > >> >>> >> memblock_reserve()'d now. > >> >>> > > >> >>> > For my better understandings, who is actually accessing such regions > >> >>> > during boot time, uefi itself or efistub? > >> >>> > > >> >>> > >> >>> No, only the kernel. This is where the ACPI tables are stored. For > >> >>> instance, on QEMU we have > >> >>> > >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> 01000013) > >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> BXPC 00000001) > >> >>> > >> >>> covered by > >> >>> > >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> ... > >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> > >> >> OK. I mistakenly understood those regions could be freed after exiting > >> >> UEFI boot services. > >> >> > >> >>> > >> >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> >> when booting the next kernel. > >> >>> > > >> >>> > not really. > >> >>> > > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> >> > on crash dump kernel?) > >> >>> >> > > >> >>> >> > >> >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> >> regions only revealed the bug, not created it (given that other > >> >>> >> memblock_reserve regions may be affected as well) > >> >>> > > >> >>> > As whether we should honor such reserved regions over kexec'ing > >> >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > As a matter of fact, no information about "reserved" memblocks is > >> >>> > exposed to user space (via proc/iomem). > >> >>> > > >> >>> > >> >>> That is why I suggested (somewhere in this thread?) to not expose them > >> >>> as 'System RAM'. Do you think that could solve this? > >> >> > >> >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> marking them under another name in /proc/iomem would also be good in order > >> >> not to allocate them as part of crash kernel's memory. > >> >> > >> > > >> > I agree. However, this may not be entirely trivial, since iterating > >> > over the memblock_reserved table and creating iomem entries may result > >> > in collisions. > >> > >> I found a method (using the patch I shared earlier in this thread) to mark these > >> entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> reserved regions. > >> > >> >> But I'm not still convinced that we should export them in useable- > >> >> memory-range to crash dump kernel. They will be accessed through > >> >> acpi_os_map_memory() and so won't be required to be part of system ram > >> >> (or memblocks), I guess. > >> > > >> > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > which is exactly what we want in this case. > >> > >> Now this is what is confusing me. I don't see the above happening. > >> > >> I see that the primary kernel boots up and adds the ACPI regions via: > >> acpi_os_ioremap > >> -> ioremap_cache > >> > >> But during the crashkernel boot, ''acpi_os_ioremap' calls > >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> variant. > >> > >> And it fails while accessing the ACPI tables: > >> > >> [ 0.039205] ACPI: Core revision 20170728 > >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > >> [ 0.100022] Modules linked in: > >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> pstate: 60000045 > >> [ 0.132647] sp : ffff000008ccfb40 > >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> [ 0.223224] Call trace: > >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> ffff0000095e3980 ffff000008ccfbe0 > >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> ffff000008ccfc50 0000000000000000 > >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> 00000000ffffff76 0000000000000006 > >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> 000000000000038e 0000000000000000 > >> [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> 0000000000000005 000000000000001b > >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> ffff000009710027 0000000000000001 > >> [ 0.279667] fac0: 0000000000000001 000000000000001b > >> 0000000000000000 ffff0000088be820 > >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> ffff00000849b4f8 ffff000008ccfb40 > >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> ffff000008ccfb40 ffff000008260a18 > >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> ffff000008ccfb40 ffff0000084a6764 > >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> [ 0.399160] Kernel panic - not syncing: Fatal exception > >> [ 0.404437] Rebooting in 10 seconds. > >> > >> So, I think the linear mapping done by the primary kernel does not > >> make these accessible in the crash kernel directly. > >> > >> Any pointers? > > > > Can you get the code line number for acpi_ns_lookup+0x25c? > > gdb points to the following code line number: > > (gdb) list *(acpi_ns_lookup+0x25c) > 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). > 572 } > 573 } > 574 > 575 /* Extract one ACPI name from the front of the pathname */ > 576 > 577 ACPI_MOVE_32_TO_32(&simple_name, path); > 578 > 579 /* Try to find the single (4 character) ACPI name */ > 580 > 581 status = > (gdb) > > i.e. ACPI_MOVE_32_TO_32(&simple_name, path); This macro can be defined in two ways depending on ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h. So, in principle, any use of ioremap() in acpi_os_ioremap() may be in conflict with those definitions here. This suggests that, under the current code base, we must expose ACPI reclaim regions as memblocks (i.e. via usable-memory-range) in order to avoid the reported issue. Thanks, -Takahiro AKASHI > addr2line also confirms the same: > > # addr2line -e vmlinux ffff0000084aa250 > /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 > > > Regards, > Bhupesh > > > >> > >> Regards, > >> Bhupesh > >> > >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> via a kernel command line parameter, "memmap=". > >> >> > >> _______________________________________________ > >> kexec mailing list -- kexec@lists.fedoraproject.org > >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 5:25 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:25 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, linux-kernel, linux-acpi, James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote: > Hi Dave, > > On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > > to kexec@lists.infradead.org > > > > Also add linux-acpi list > > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >> > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > <takahiro.akashi@linaro.org> wrote: > >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> <takahiro.akashi@linaro.org> wrote: > >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> >> <takahiro.akashi@linaro.org> wrote: > >> >>> >> > Bhupesh, Ard, > >> >>> >> > > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> >> >> Hi Ard, Akashi > >> >>> >> >> > >> >>> >> > (snip) > >> >>> >> > > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> >> >> , for details) > >> >>> >> > > >> >>> >> > Right. > >> >>> >> > > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> >> >> with the crashkernel memory range: > >> >>> >> >> > >> >>> >> >> /* add linux,usable-memory-range */ > >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> >> >> address_cells, size_cells); > >> >>> >> >> > >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> >> >> , for details) > >> >>> >> >> > >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> >> >> they are marked as System RAM or as RESERVED. As, > >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> >> >> > >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> >> >> ACPI memory and crashes while trying to access the same: > >> >>> >> >> > >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> >> >> -r`.img --reuse-cmdline -d > >> >>> >> >> > >> >>> >> >> [snip..] > >> >>> >> >> > >> >>> >> >> Reserved memory range > >> >>> >> >> 000000000e800000-000000002e7fffff (0) > >> >>> >> >> > >> >>> >> >> Coredump memory ranges > >> >>> >> >> 0000000000000000-000000000e7fffff (0) > >> >>> >> >> 000000002e800000-000000003961ffff (0) > >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> >> >> 000000a000000000-000000affbffffff (0) > >> >>> >> >> > >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> >> >> memory cap'ing passed to the crash kernel inside > >> >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> >> >> > >> >>> >> >> static void __init fdt_enforce_memory_region(void) > >> >>> >> >> { > >> >>> >> >> struct memblock_region reg = { > >> >>> >> >> .size = 0, > >> >>> >> >> }; > >> >>> >> >> > >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> >> >> > >> >>> >> >> if (reg.size) > >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> >> >> comment this out */ > >> >>> >> >> } > >> >>> >> > > >> >>> >> > Please just don't do that. It can cause a fatal damage on > >> >>> >> > memory contents of the *crashed* kernel. > >> >>> >> > > >> >>> >> >> 5). Both the above temporary solutions fix the problem. > >> >>> >> >> > >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> >> >> fail. > >> >>> >> >> > >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> >> >> dt node 'linux,usable-memory-range' > >> >>> >> > > >> >>> >> > I still don't understand why we need to carry over the information > >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> >> > such regions are free to be reused by the kernel after some point of > >> >>> >> > initialization. Why does crash dump kernel need to know about them? > >> >>> >> > > >> >>> >> > >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> >> kernel, those regions needs to be preserved, which is why they are > >> >>> >> memblock_reserve()'d now. > >> >>> > > >> >>> > For my better understandings, who is actually accessing such regions > >> >>> > during boot time, uefi itself or efistub? > >> >>> > > >> >>> > >> >>> No, only the kernel. This is where the ACPI tables are stored. For > >> >>> instance, on QEMU we have > >> >>> > >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> 01000013) > >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> BXPC 00000001) > >> >>> > >> >>> covered by > >> >>> > >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> ... > >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> > >> >> OK. I mistakenly understood those regions could be freed after exiting > >> >> UEFI boot services. > >> >> > >> >>> > >> >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> >> when booting the next kernel. > >> >>> > > >> >>> > not really. > >> >>> > > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> >> > on crash dump kernel?) > >> >>> >> > > >> >>> >> > >> >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> >> regions only revealed the bug, not created it (given that other > >> >>> >> memblock_reserve regions may be affected as well) > >> >>> > > >> >>> > As whether we should honor such reserved regions over kexec'ing > >> >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > As a matter of fact, no information about "reserved" memblocks is > >> >>> > exposed to user space (via proc/iomem). > >> >>> > > >> >>> > >> >>> That is why I suggested (somewhere in this thread?) to not expose them > >> >>> as 'System RAM'. Do you think that could solve this? > >> >> > >> >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> marking them under another name in /proc/iomem would also be good in order > >> >> not to allocate them as part of crash kernel's memory. > >> >> > >> > > >> > I agree. However, this may not be entirely trivial, since iterating > >> > over the memblock_reserved table and creating iomem entries may result > >> > in collisions. > >> > >> I found a method (using the patch I shared earlier in this thread) to mark these > >> entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> reserved regions. > >> > >> >> But I'm not still convinced that we should export them in useable- > >> >> memory-range to crash dump kernel. They will be accessed through > >> >> acpi_os_map_memory() and so won't be required to be part of system ram > >> >> (or memblocks), I guess. > >> > > >> > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > which is exactly what we want in this case. > >> > >> Now this is what is confusing me. I don't see the above happening. > >> > >> I see that the primary kernel boots up and adds the ACPI regions via: > >> acpi_os_ioremap > >> -> ioremap_cache > >> > >> But during the crashkernel boot, ''acpi_os_ioremap' calls > >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> variant. > >> > >> And it fails while accessing the ACPI tables: > >> > >> [ 0.039205] ACPI: Core revision 20170728 > >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > >> [ 0.100022] Modules linked in: > >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> pstate: 60000045 > >> [ 0.132647] sp : ffff000008ccfb40 > >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> [ 0.223224] Call trace: > >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> ffff0000095e3980 ffff000008ccfbe0 > >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> ffff000008ccfc50 0000000000000000 > >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> 00000000ffffff76 0000000000000006 > >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> 000000000000038e 0000000000000000 > >> [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> 0000000000000005 000000000000001b > >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> ffff000009710027 0000000000000001 > >> [ 0.279667] fac0: 0000000000000001 000000000000001b > >> 0000000000000000 ffff0000088be820 > >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> ffff00000849b4f8 ffff000008ccfb40 > >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> ffff000008ccfb40 ffff000008260a18 > >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> ffff000008ccfb40 ffff0000084a6764 > >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> [ 0.399160] Kernel panic - not syncing: Fatal exception > >> [ 0.404437] Rebooting in 10 seconds. > >> > >> So, I think the linear mapping done by the primary kernel does not > >> make these accessible in the crash kernel directly. > >> > >> Any pointers? > > > > Can you get the code line number for acpi_ns_lookup+0x25c? > > gdb points to the following code line number: > > (gdb) list *(acpi_ns_lookup+0x25c) > 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). > 572 } > 573 } > 574 > 575 /* Extract one ACPI name from the front of the pathname */ > 576 > 577 ACPI_MOVE_32_TO_32(&simple_name, path); > 578 > 579 /* Try to find the single (4 character) ACPI name */ > 580 > 581 status = > (gdb) > > i.e. ACPI_MOVE_32_TO_32(&simple_name, path); This macro can be defined in two ways depending on ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h. So, in principle, any use of ioremap() in acpi_os_ioremap() may be in conflict with those definitions here. This suggests that, under the current code base, we must expose ACPI reclaim regions as memblocks (i.e. via usable-memory-range) in order to avoid the reported issue. Thanks, -Takahiro AKASHI > addr2line also confirms the same: > > # addr2line -e vmlinux ffff0000084aa250 > /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 > > > Regards, > Bhupesh > > > >> > >> Regards, > >> Bhupesh > >> > >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> via a kernel command line parameter, "memmap=". > >> >> > >> _______________________________________________ > >> kexec mailing list -- kexec@lists.fedoraproject.org > >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 5:25 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:25 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote: > Hi Dave, > > On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > > kexec at fedoraproject... is for Fedora kexec scripts discussion, changed it > > to kexec at lists.infradead.org > > > > Also add linux-acpi list > > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >> > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > <takahiro.akashi@linaro.org> wrote: > >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> <takahiro.akashi@linaro.org> wrote: > >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> >> <takahiro.akashi@linaro.org> wrote: > >> >>> >> > Bhupesh, Ard, > >> >>> >> > > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> >> >> Hi Ard, Akashi > >> >>> >> >> > >> >>> >> > (snip) > >> >>> >> > > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> >> >> , for details) > >> >>> >> > > >> >>> >> > Right. > >> >>> >> > > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> >> >> with the crashkernel memory range: > >> >>> >> >> > >> >>> >> >> /* add linux,usable-memory-range */ > >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> >> >> address_cells, size_cells); > >> >>> >> >> > >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> >> >> , for details) > >> >>> >> >> > >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> >> >> they are marked as System RAM or as RESERVED. As, > >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> >> >> > >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> >> >> ACPI memory and crashes while trying to access the same: > >> >>> >> >> > >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> >> >> -r`.img --reuse-cmdline -d > >> >>> >> >> > >> >>> >> >> [snip..] > >> >>> >> >> > >> >>> >> >> Reserved memory range > >> >>> >> >> 000000000e800000-000000002e7fffff (0) > >> >>> >> >> > >> >>> >> >> Coredump memory ranges > >> >>> >> >> 0000000000000000-000000000e7fffff (0) > >> >>> >> >> 000000002e800000-000000003961ffff (0) > >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> >> >> 000000a000000000-000000affbffffff (0) > >> >>> >> >> > >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> >> >> memory cap'ing passed to the crash kernel inside > >> >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> >> >> > >> >>> >> >> static void __init fdt_enforce_memory_region(void) > >> >>> >> >> { > >> >>> >> >> struct memblock_region reg = { > >> >>> >> >> .size = 0, > >> >>> >> >> }; > >> >>> >> >> > >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> >> >> > >> >>> >> >> if (reg.size) > >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> >> >> comment this out */ > >> >>> >> >> } > >> >>> >> > > >> >>> >> > Please just don't do that. It can cause a fatal damage on > >> >>> >> > memory contents of the *crashed* kernel. > >> >>> >> > > >> >>> >> >> 5). Both the above temporary solutions fix the problem. > >> >>> >> >> > >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> >> >> fail. > >> >>> >> >> > >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> >> >> dt node 'linux,usable-memory-range' > >> >>> >> > > >> >>> >> > I still don't understand why we need to carry over the information > >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> >> > such regions are free to be reused by the kernel after some point of > >> >>> >> > initialization. Why does crash dump kernel need to know about them? > >> >>> >> > > >> >>> >> > >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> >> kernel, those regions needs to be preserved, which is why they are > >> >>> >> memblock_reserve()'d now. > >> >>> > > >> >>> > For my better understandings, who is actually accessing such regions > >> >>> > during boot time, uefi itself or efistub? > >> >>> > > >> >>> > >> >>> No, only the kernel. This is where the ACPI tables are stored. For > >> >>> instance, on QEMU we have > >> >>> > >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> 01000013) > >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> BXPC 00000001) > >> >>> > >> >>> covered by > >> >>> > >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> ... > >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> > >> >> OK. I mistakenly understood those regions could be freed after exiting > >> >> UEFI boot services. > >> >> > >> >>> > >> >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> >> when booting the next kernel. > >> >>> > > >> >>> > not really. > >> >>> > > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> >> > on crash dump kernel?) > >> >>> >> > > >> >>> >> > >> >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> >> regions only revealed the bug, not created it (given that other > >> >>> >> memblock_reserve regions may be affected as well) > >> >>> > > >> >>> > As whether we should honor such reserved regions over kexec'ing > >> >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > As a matter of fact, no information about "reserved" memblocks is > >> >>> > exposed to user space (via proc/iomem). > >> >>> > > >> >>> > >> >>> That is why I suggested (somewhere in this thread?) to not expose them > >> >>> as 'System RAM'. Do you think that could solve this? > >> >> > >> >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> marking them under another name in /proc/iomem would also be good in order > >> >> not to allocate them as part of crash kernel's memory. > >> >> > >> > > >> > I agree. However, this may not be entirely trivial, since iterating > >> > over the memblock_reserved table and creating iomem entries may result > >> > in collisions. > >> > >> I found a method (using the patch I shared earlier in this thread) to mark these > >> entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> reserved regions. > >> > >> >> But I'm not still convinced that we should export them in useable- > >> >> memory-range to crash dump kernel. They will be accessed through > >> >> acpi_os_map_memory() and so won't be required to be part of system ram > >> >> (or memblocks), I guess. > >> > > >> > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > which is exactly what we want in this case. > >> > >> Now this is what is confusing me. I don't see the above happening. > >> > >> I see that the primary kernel boots up and adds the ACPI regions via: > >> acpi_os_ioremap > >> -> ioremap_cache > >> > >> But during the crashkernel boot, ''acpi_os_ioremap' calls > >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> variant. > >> > >> And it fails while accessing the ACPI tables: > >> > >> [ 0.039205] ACPI: Core revision 20170728 > >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > >> [ 0.100022] Modules linked in: > >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> pstate: 60000045 > >> [ 0.132647] sp : ffff000008ccfb40 > >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> [ 0.223224] Call trace: > >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> ffff0000095e3980 ffff000008ccfbe0 > >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> ffff000008ccfc50 0000000000000000 > >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> 00000000ffffff76 0000000000000006 > >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> 000000000000038e 0000000000000000 > >> [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> 0000000000000005 000000000000001b > >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> ffff000009710027 0000000000000001 > >> [ 0.279667] fac0: 0000000000000001 000000000000001b > >> 0000000000000000 ffff0000088be820 > >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> ffff00000849b4f8 ffff000008ccfb40 > >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> ffff000008ccfb40 ffff000008260a18 > >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> ffff000008ccfb40 ffff0000084a6764 > >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> [ 0.399160] Kernel panic - not syncing: Fatal exception > >> [ 0.404437] Rebooting in 10 seconds. > >> > >> So, I think the linear mapping done by the primary kernel does not > >> make these accessible in the crash kernel directly. > >> > >> Any pointers? > > > > Can you get the code line number for acpi_ns_lookup+0x25c? > > gdb points to the following code line number: > > (gdb) list *(acpi_ns_lookup+0x25c) > 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). > 572 } > 573 } > 574 > 575 /* Extract one ACPI name from the front of the pathname */ > 576 > 577 ACPI_MOVE_32_TO_32(&simple_name, path); > 578 > 579 /* Try to find the single (4 character) ACPI name */ > 580 > 581 status = > (gdb) > > i.e. ACPI_MOVE_32_TO_32(&simple_name, path); This macro can be defined in two ways depending on ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h. So, in principle, any use of ioremap() in acpi_os_ioremap() may be in conflict with those definitions here. This suggests that, under the current code base, we must expose ACPI reclaim regions as memblocks (i.e. via usable-memory-range) in order to avoid the reported issue. Thanks, -Takahiro AKASHI > addr2line also confirms the same: > > # addr2line -e vmlinux ffff0000084aa250 > /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 > > > Regards, > Bhupesh > > > >> > >> Regards, > >> Bhupesh > >> > >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> via a kernel command line parameter, "memmap=". > >> >> > >> _______________________________________________ > >> kexec mailing list -- kexec at lists.fedoraproject.org > >> To unsubscribe send an email to kexec-leave at lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 5:25 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 5:25 UTC (permalink / raw) To: Bhupesh Sharma Cc: Dave Young, Ard Biesheuvel, kexec, linux-acpi, linux-kernel, linux-arm-kernel, James Morse, Bhupesh SHARMA, linux-efi, Mark Rutland, Matt Fleming On Tue, Dec 19, 2017 at 02:58:20AM +0530, Bhupesh Sharma wrote: > Hi Dave, > > On Mon, Dec 18, 2017 at 10:46 AM, Dave Young <dyoung@redhat.com> wrote: > > kexec@fedoraproject... is for Fedora kexec scripts discussion, changed it > > to kexec@lists.infradead.org > > > > Also add linux-acpi list > > On 12/18/17 at 02:31am, Bhupesh Sharma wrote: > >> On Fri, Dec 15, 2017 at 3:05 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >> > On 15 December 2017 at 09:59, AKASHI Takahiro > >> > <takahiro.akashi@linaro.org> wrote: > >> >> On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> <takahiro.akashi@linaro.org> wrote: > >> >>> > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> >> <takahiro.akashi@linaro.org> wrote: > >> >>> >> > Bhupesh, Ard, > >> >>> >> > > >> >>> >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> >> >> Hi Ard, Akashi > >> >>> >> >> > >> >>> >> > (snip) > >> >>> >> > > >> >>> >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> >> >> , for details) > >> >>> >> > > >> >>> >> > Right. > >> >>> >> > > >> >>> >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> >> >> with the crashkernel memory range: > >> >>> >> >> > >> >>> >> >> /* add linux,usable-memory-range */ > >> >>> >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> >> >> address_cells, size_cells); > >> >>> >> >> > >> >>> >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> >> >> , for details) > >> >>> >> >> > >> >>> >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> >> >> they are marked as System RAM or as RESERVED. As, > >> >>> >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> >> >> > >> >>> >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> >> >> ACPI memory and crashes while trying to access the same: > >> >>> >> >> > >> >>> >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> >> >> -r`.img --reuse-cmdline -d > >> >>> >> >> > >> >>> >> >> [snip..] > >> >>> >> >> > >> >>> >> >> Reserved memory range > >> >>> >> >> 000000000e800000-000000002e7fffff (0) > >> >>> >> >> > >> >>> >> >> Coredump memory ranges > >> >>> >> >> 0000000000000000-000000000e7fffff (0) > >> >>> >> >> 000000002e800000-000000003961ffff (0) > >> >>> >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> >> >> 000000a000000000-000000affbffffff (0) > >> >>> >> >> > >> >>> >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> >> >> memory cap'ing passed to the crash kernel inside > >> >>> >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> >> >> > >> >>> >> >> static void __init fdt_enforce_memory_region(void) > >> >>> >> >> { > >> >>> >> >> struct memblock_region reg = { > >> >>> >> >> .size = 0, > >> >>> >> >> }; > >> >>> >> >> > >> >>> >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> >> >> > >> >>> >> >> if (reg.size) > >> >>> >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> >> >> comment this out */ > >> >>> >> >> } > >> >>> >> > > >> >>> >> > Please just don't do that. It can cause a fatal damage on > >> >>> >> > memory contents of the *crashed* kernel. > >> >>> >> > > >> >>> >> >> 5). Both the above temporary solutions fix the problem. > >> >>> >> >> > >> >>> >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> >> >> fail. > >> >>> >> >> > >> >>> >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> >> >> dt node 'linux,usable-memory-range' > >> >>> >> > > >> >>> >> > I still don't understand why we need to carry over the information > >> >>> >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> >> > such regions are free to be reused by the kernel after some point of > >> >>> >> > initialization. Why does crash dump kernel need to know about them? > >> >>> >> > > >> >>> >> > >> >>> >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> >> kernel, those regions needs to be preserved, which is why they are > >> >>> >> memblock_reserve()'d now. > >> >>> > > >> >>> > For my better understandings, who is actually accessing such regions > >> >>> > during boot time, uefi itself or efistub? > >> >>> > > >> >>> > >> >>> No, only the kernel. This is where the ACPI tables are stored. For > >> >>> instance, on QEMU we have > >> >>> > >> >>> ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> 01000013) > >> >>> ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> BXPC 00000001) > >> >>> ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> BXPC 00000001) > >> >>> > >> >>> covered by > >> >>> > >> >>> efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> ... > >> >>> efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> > >> >> OK. I mistakenly understood those regions could be freed after exiting > >> >> UEFI boot services. > >> >> > >> >>> > >> >>> >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> >> when booting the next kernel. > >> >>> > > >> >>> > not really. > >> >>> > > >> >>> >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> >> > on crash dump kernel?) > >> >>> >> > > >> >>> >> > >> >>> >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> >> regions only revealed the bug, not created it (given that other > >> >>> >> memblock_reserve regions may be affected as well) > >> >>> > > >> >>> > As whether we should honor such reserved regions over kexec'ing > >> >>> > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > As a matter of fact, no information about "reserved" memblocks is > >> >>> > exposed to user space (via proc/iomem). > >> >>> > > >> >>> > >> >>> That is why I suggested (somewhere in this thread?) to not expose them > >> >>> as 'System RAM'. Do you think that could solve this? > >> >> > >> >> Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> marking them under another name in /proc/iomem would also be good in order > >> >> not to allocate them as part of crash kernel's memory. > >> >> > >> > > >> > I agree. However, this may not be entirely trivial, since iterating > >> > over the memblock_reserved table and creating iomem entries may result > >> > in collisions. > >> > >> I found a method (using the patch I shared earlier in this thread) to mark these > >> entries as 'ACPI reclaim memory' ranges rather than System RAM or > >> reserved regions. > >> > >> >> But I'm not still convinced that we should export them in useable- > >> >> memory-range to crash dump kernel. They will be accessed through > >> >> acpi_os_map_memory() and so won't be required to be part of system ram > >> >> (or memblocks), I guess. > >> > > >> > Agreed. They will be covered by the linear mapping in the boot kernel, > >> > and be mapped explicitly via ioremap_cache() in the kexec kernel, > >> > which is exactly what we want in this case. > >> > >> Now this is what is confusing me. I don't see the above happening. > >> > >> I see that the primary kernel boots up and adds the ACPI regions via: > >> acpi_os_ioremap > >> -> ioremap_cache > >> > >> But during the crashkernel boot, ''acpi_os_ioremap' calls > >> 'ioremap' for the ACPI Reclaim Memory regions and not the _cache > >> variant. > >> > >> And it fails while accessing the ACPI tables: > >> > >> [ 0.039205] ACPI: Core revision 20170728 > >> pud=000000002e7d0003, *pmd=000000002e7c0003, *pte=00e8000039710707 > >> [ 0.095098] Internal error: Oops: 96000021 [#1] SMP > >> [ 0.100022] Modules linked in: > >> [ 0.103102] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1 > >> [ 0.109432] task: ffff000008d05180 task.stack: ffff000008cc0000 > >> [ 0.115414] PC is at acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.119987] LR is at acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.125175] pc : [<ffff0000084a6764>] lr : [<ffff00000849b4f8>] > >> pstate: 60000045 > >> [ 0.132647] sp : ffff000008ccfb40 > >> [ 0.135989] x29: ffff000008ccfb40 x28: ffff000008a9f2a4 > >> [ 0.141354] x27: ffff0000088be820 x26: 0000000000000000 > >> [ 0.146718] x25: 000000000000001b x24: 0000000000000001 > >> [ 0.152083] x23: 0000000000000001 x22: ffff000009710027 > >> [ 0.157447] x21: ffff000008ccfc50 x20: 0000000000000001 > >> [ 0.162812] x19: 000000000000001b x18: 0000000000000005 > >> [ 0.168176] x17: 0000000000000000 x16: 0000000000000000 > >> [ 0.173541] x15: 0000000000000000 x14: 000000000000038e > >> [ 0.178905] x13: ffffffff00000000 x12: ffffffffffffffff > >> [ 0.184270] x11: 0000000000000006 x10: 00000000ffffff76 > >> [ 0.189634] x9 : 000000000000005f x8 : ffff8000126d0140 > >> [ 0.194998] x7 : 0000000000000000 x6 : ffff000008ccfc50 > >> [ 0.200362] x5 : ffff80000fe62c00 x4 : 0000000000000001 > >> [ 0.205727] x3 : ffff000008ccfbe0 x2 : ffff0000095e3980 > >> [ 0.211091] x1 : ffff000009710027 x0 : 0000000000000000 > >> [ 0.216456] Process swapper/0 (pid: 0, stack limit = 0xffff000008cc0000) > >> [ 0.223224] Call trace: > >> [ 0.225688] Exception stack(0xffff000008ccfa00 to 0xffff000008ccfb40) > >> [ 0.232194] fa00: 0000000000000000 ffff000009710027 > >> ffff0000095e3980 ffff000008ccfbe0 > >> [ 0.240106] fa20: 0000000000000001 ffff80000fe62c00 > >> ffff000008ccfc50 0000000000000000 > >> [ 0.248018] fa40: ffff8000126d0140 000000000000005f > >> 00000000ffffff76 0000000000000006 > >> [ 0.255931] fa60: ffffffffffffffff ffffffff00000000 > >> 000000000000038e 0000000000000000 > >> [ 0.263843] fa80: 0000000000000000 0000000000000000 > >> 0000000000000005 000000000000001b > >> [ 0.271754] faa0: 0000000000000001 ffff000008ccfc50 > >> ffff000009710027 0000000000000001 > >> [ 0.279667] fac0: 0000000000000001 000000000000001b > >> 0000000000000000 ffff0000088be820 > >> [ 0.287579] fae0: ffff000008a9f2a4 ffff000008ccfb40 > >> ffff00000849b4f8 ffff000008ccfb40 > >> [ 0.295491] fb00: ffff0000084a6764 0000000060000045 > >> ffff000008ccfb40 ffff000008260a18 > >> [ 0.303403] fb20: ffffffffffffffff ffff0000087f3fb0 > >> ffff000008ccfb40 ffff0000084a6764 > >> [ 0.311316] [<ffff0000084a6764>] acpi_ns_lookup+0x25c/0x3c0 > >> [ 0.316943] [<ffff00000849b4f8>] acpi_ds_load1_begin_op+0xa4/0x294 > >> [ 0.323186] [<ffff0000084ad4ac>] acpi_ps_build_named_op+0xc4/0x198 > >> [ 0.329428] [<ffff0000084ad6cc>] acpi_ps_create_op+0x14c/0x270 > >> [ 0.335319] [<ffff0000084acfa8>] acpi_ps_parse_loop+0x188/0x5c8 > >> [ 0.341298] [<ffff0000084ae048>] acpi_ps_parse_aml+0xb0/0x2b8 > >> [ 0.347101] [<ffff0000084a8e10>] acpi_ns_one_complete_parse+0x144/0x184 > >> [ 0.353783] [<ffff0000084a8e98>] acpi_ns_parse_table+0x48/0x68 > >> [ 0.359675] [<ffff0000084a82cc>] acpi_ns_load_table+0x4c/0xdc > >> [ 0.365479] [<ffff0000084b32f8>] acpi_tb_load_namespace+0xe4/0x264 > >> [ 0.371723] [<ffff000008baf9b4>] acpi_load_tables+0x48/0xc0 > >> [ 0.377350] [<ffff000008badc20>] acpi_early_init+0x9c/0xd0 > >> [ 0.382891] [<ffff000008b70d50>] start_kernel+0x3b4/0x43c > >> [ 0.388343] Code: b9008fb9 2a000318 36380054 32190318 (b94002c0) > >> [ 0.394500] ---[ end trace c46ed37f9651c58e ]--- > >> [ 0.399160] Kernel panic - not syncing: Fatal exception > >> [ 0.404437] Rebooting in 10 seconds. > >> > >> So, I think the linear mapping done by the primary kernel does not > >> make these accessible in the crash kernel directly. > >> > >> Any pointers? > > > > Can you get the code line number for acpi_ns_lookup+0x25c? > > gdb points to the following code line number: > > (gdb) list *(acpi_ns_lookup+0x25c) > 0xffff0000084aa250 is in acpi_ns_lookup (drivers/acpi/acpica/nsaccess.c:577). > 572 } > 573 } > 574 > 575 /* Extract one ACPI name from the front of the pathname */ > 576 > 577 ACPI_MOVE_32_TO_32(&simple_name, path); > 578 > 579 /* Try to find the single (4 character) ACPI name */ > 580 > 581 status = > (gdb) > > i.e. ACPI_MOVE_32_TO_32(&simple_name, path); This macro can be defined in two ways depending on ACPI_MISALIGNMENT_NOT_SUPPORTED in drivers/acpi/acpica/acmarcos.h. So, in principle, any use of ioremap() in acpi_os_ioremap() may be in conflict with those definitions here. This suggests that, under the current code base, we must expose ACPI reclaim regions as memblocks (i.e. via usable-memory-range) in order to avoid the reported issue. Thanks, -Takahiro AKASHI > addr2line also confirms the same: > > # addr2line -e vmlinux ffff0000084aa250 > /root/git/kernel-alt/drivers/acpi/acpica/nsaccess.c:577 > > > Regards, > Bhupesh > > > >> > >> Regards, > >> Bhupesh > >> > >> >> Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> via a kernel command line parameter, "memmap=". > >> >> > >> _______________________________________________ > >> kexec mailing list -- kexec@lists.fedoraproject.org > >> To unsubscribe send an email to kexec-leave@lists.fedoraproject.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-15 8:59 ` AKASHI Takahiro @ 2017-12-18 5:40 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:40 UTC (permalink / raw) To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > On 13 December 2017 at 12:16, AKASHI Takahiro > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > >> > Bhupesh, Ard, > > >> > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > >> >> Hi Ard, Akashi > > >> >> > > >> > (snip) > > >> > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > >> >> identify its own usable memory and exclude, at its boot time, any > > >> >> other memory areas that are part of the panicked kernel's memory. > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > >> >> , for details) > > >> > > > >> > Right. > > >> > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > >> >> with the crashkernel memory range: > > >> >> > > >> >> /* add linux,usable-memory-range */ > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > >> >> address_cells, size_cells); > > >> >> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > >> >> , for details) > > >> >> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > >> >> they are marked as System RAM or as RESERVED. As, > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > >> >> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > >> >> ACPI memory and crashes while trying to access the same: > > >> >> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > >> >> -r`.img --reuse-cmdline -d > > >> >> > > >> >> [snip..] > > >> >> > > >> >> Reserved memory range > > >> >> 000000000e800000-000000002e7fffff (0) > > >> >> > > >> >> Coredump memory ranges > > >> >> 0000000000000000-000000000e7fffff (0) > > >> >> 000000002e800000-000000003961ffff (0) > > >> >> 0000000039d40000-000000003ed2ffff (0) > > >> >> 000000003ed60000-000000003fbfffff (0) > > >> >> 0000001040000000-0000001ffbffffff (0) > > >> >> 0000002000000000-0000002ffbffffff (0) > > >> >> 0000009000000000-0000009ffbffffff (0) > > >> >> 000000a000000000-000000affbffffff (0) > > >> >> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > >> >> memory cap'ing passed to the crash kernel inside > > >> >> 'arch/arm64/mm/init.c' (see below): > > >> >> > > >> >> static void __init fdt_enforce_memory_region(void) > > >> >> { > > >> >> struct memblock_region reg = { > > >> >> .size = 0, > > >> >> }; > > >> >> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > >> >> > > >> >> if (reg.size) > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > >> >> comment this out */ > > >> >> } > > >> > > > >> > Please just don't do that. It can cause a fatal damage on > > >> > memory contents of the *crashed* kernel. > > >> > > > >> >> 5). Both the above temporary solutions fix the problem. > > >> >> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > >> >> fail. > > >> >> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > >> >> dt node 'linux,usable-memory-range' > > >> > > > >> > I still don't understand why we need to carry over the information > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > >> > such regions are free to be reused by the kernel after some point of > > >> > initialization. Why does crash dump kernel need to know about them? > > >> > > > >> > > >> Not really. According to the UEFI spec, they can be reclaimed after > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > >> no longer needs them. Of course, in order to be able to boot a kexec > > >> kernel, those regions needs to be preserved, which is why they are > > >> memblock_reserve()'d now. > > > > > > For my better understandings, who is actually accessing such regions > > > during boot time, uefi itself or efistub? > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > instance, on QEMU we have > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > 01000013) > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > BXPC 00000001) > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > BXPC 00000001) > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > BXPC 00000001) > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > BXPC 00000001) > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > BXPC 00000001) > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > BXPC 00000001) > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > BXPC 00000001) > > > > covered by > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > ... > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > OK. I mistakenly understood those regions could be freed after exiting > UEFI boot services. > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > >> when booting the next kernel. > > > > > > not really. > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > >> > on crash dump kernel?) > > >> > > > >> > > >> I don't think so. And the change to the handling of ACPI reclaim > > >> regions only revealed the bug, not created it (given that other > > >> memblock_reserve regions may be affected as well) > > > > > > As whether we should honor such reserved regions over kexec'ing > > > depends on each one's specific nature, we will have to take care one-by-one. > > > As a matter of fact, no information about "reserved" memblocks is > > > exposed to user space (via proc/iomem). > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > as 'System RAM'. Do you think that could solve this? > > Memblock-reserv'ing them is necessary to prevent their corruption and > marking them under another name in /proc/iomem would also be good in order > not to allocate them as part of crash kernel's memory. > > But I'm not still convinced that we should export them in useable- > memory-range to crash dump kernel. They will be accessed through > acpi_os_map_memory() and so won't be required to be part of system ram > (or memblocks), I guess. > -> Bhupesh? I forgot how arm64 kernel retrieve the memory ranges and initialize them. If no "e820" like interfaces shouldn't kernel reinitialize all the memory according to the efi memmap? For kdump kernel anything other than usable memory (which is from the dt node instead) should be reinitialized according to efi passed info, no? > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > via a kernel command line parameter, "memmap=". memmap= is only used in old kexec-tools, now we are passing them via e820 table. [snip] Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:40 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:40 UTC (permalink / raw) To: linux-arm-kernel On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > On 13 December 2017 at 12:16, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > >> <takahiro.akashi@linaro.org> wrote: > > >> > Bhupesh, Ard, > > >> > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > >> >> Hi Ard, Akashi > > >> >> > > >> > (snip) > > >> > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > >> >> identify its own usable memory and exclude, at its boot time, any > > >> >> other memory areas that are part of the panicked kernel's memory. > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > >> >> , for details) > > >> > > > >> > Right. > > >> > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > >> >> with the crashkernel memory range: > > >> >> > > >> >> /* add linux,usable-memory-range */ > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > >> >> address_cells, size_cells); > > >> >> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > >> >> , for details) > > >> >> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > >> >> they are marked as System RAM or as RESERVED. As, > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > >> >> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > >> >> ACPI memory and crashes while trying to access the same: > > >> >> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > >> >> -r`.img --reuse-cmdline -d > > >> >> > > >> >> [snip..] > > >> >> > > >> >> Reserved memory range > > >> >> 000000000e800000-000000002e7fffff (0) > > >> >> > > >> >> Coredump memory ranges > > >> >> 0000000000000000-000000000e7fffff (0) > > >> >> 000000002e800000-000000003961ffff (0) > > >> >> 0000000039d40000-000000003ed2ffff (0) > > >> >> 000000003ed60000-000000003fbfffff (0) > > >> >> 0000001040000000-0000001ffbffffff (0) > > >> >> 0000002000000000-0000002ffbffffff (0) > > >> >> 0000009000000000-0000009ffbffffff (0) > > >> >> 000000a000000000-000000affbffffff (0) > > >> >> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > >> >> memory cap'ing passed to the crash kernel inside > > >> >> 'arch/arm64/mm/init.c' (see below): > > >> >> > > >> >> static void __init fdt_enforce_memory_region(void) > > >> >> { > > >> >> struct memblock_region reg = { > > >> >> .size = 0, > > >> >> }; > > >> >> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > >> >> > > >> >> if (reg.size) > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > >> >> comment this out */ > > >> >> } > > >> > > > >> > Please just don't do that. It can cause a fatal damage on > > >> > memory contents of the *crashed* kernel. > > >> > > > >> >> 5). Both the above temporary solutions fix the problem. > > >> >> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > >> >> fail. > > >> >> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > >> >> dt node 'linux,usable-memory-range' > > >> > > > >> > I still don't understand why we need to carry over the information > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > >> > such regions are free to be reused by the kernel after some point of > > >> > initialization. Why does crash dump kernel need to know about them? > > >> > > > >> > > >> Not really. According to the UEFI spec, they can be reclaimed after > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > >> no longer needs them. Of course, in order to be able to boot a kexec > > >> kernel, those regions needs to be preserved, which is why they are > > >> memblock_reserve()'d now. > > > > > > For my better understandings, who is actually accessing such regions > > > during boot time, uefi itself or efistub? > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > instance, on QEMU we have > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > 01000013) > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > BXPC 00000001) > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > BXPC 00000001) > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > BXPC 00000001) > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > BXPC 00000001) > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > BXPC 00000001) > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > BXPC 00000001) > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > BXPC 00000001) > > > > covered by > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > ... > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > OK. I mistakenly understood those regions could be freed after exiting > UEFI boot services. > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > >> when booting the next kernel. > > > > > > not really. > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > >> > on crash dump kernel?) > > >> > > > >> > > >> I don't think so. And the change to the handling of ACPI reclaim > > >> regions only revealed the bug, not created it (given that other > > >> memblock_reserve regions may be affected as well) > > > > > > As whether we should honor such reserved regions over kexec'ing > > > depends on each one's specific nature, we will have to take care one-by-one. > > > As a matter of fact, no information about "reserved" memblocks is > > > exposed to user space (via proc/iomem). > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > as 'System RAM'. Do you think that could solve this? > > Memblock-reserv'ing them is necessary to prevent their corruption and > marking them under another name in /proc/iomem would also be good in order > not to allocate them as part of crash kernel's memory. > > But I'm not still convinced that we should export them in useable- > memory-range to crash dump kernel. They will be accessed through > acpi_os_map_memory() and so won't be required to be part of system ram > (or memblocks), I guess. > -> Bhupesh? I forgot how arm64 kernel retrieve the memory ranges and initialize them. If no "e820" like interfaces shouldn't kernel reinitialize all the memory according to the efi memmap? For kdump kernel anything other than usable memory (which is from the dt node instead) should be reinitialized according to efi passed info, no? > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > via a kernel command line parameter, "memmap=". memmap= is only used in old kexec-tools, now we are passing them via e820 table. [snip] Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP [not found] ` <20171218054009.GA6392-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org> 2017-12-19 6:09 ` AKASHI Takahiro (?) @ 2017-12-18 5:43 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:43 UTC (permalink / raw) To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec, linux-kernel Fix the kexec list address. On 12/18/17 at 01:40pm, Dave Young wrote: > On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > > On 13 December 2017 at 12:16, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > > >> <takahiro.akashi@linaro.org> wrote: > > > >> > Bhupesh, Ard, > > > >> > > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > > >> >> Hi Ard, Akashi > > > >> >> > > > >> > (snip) > > > >> > > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > > >> >> identify its own usable memory and exclude, at its boot time, any > > > >> >> other memory areas that are part of the panicked kernel's memory. > > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > > >> >> , for details) > > > >> > > > > >> > Right. > > > >> > > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > > >> >> with the crashkernel memory range: > > > >> >> > > > >> >> /* add linux,usable-memory-range */ > > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > > >> >> address_cells, size_cells); > > > >> >> > > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > > >> >> , for details) > > > >> >> > > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > > >> >> they are marked as System RAM or as RESERVED. As, > > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > > >> >> > > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > > >> >> ACPI memory and crashes while trying to access the same: > > > >> >> > > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > > >> >> -r`.img --reuse-cmdline -d > > > >> >> > > > >> >> [snip..] > > > >> >> > > > >> >> Reserved memory range > > > >> >> 000000000e800000-000000002e7fffff (0) > > > >> >> > > > >> >> Coredump memory ranges > > > >> >> 0000000000000000-000000000e7fffff (0) > > > >> >> 000000002e800000-000000003961ffff (0) > > > >> >> 0000000039d40000-000000003ed2ffff (0) > > > >> >> 000000003ed60000-000000003fbfffff (0) > > > >> >> 0000001040000000-0000001ffbffffff (0) > > > >> >> 0000002000000000-0000002ffbffffff (0) > > > >> >> 0000009000000000-0000009ffbffffff (0) > > > >> >> 000000a000000000-000000affbffffff (0) > > > >> >> > > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > > >> >> memory cap'ing passed to the crash kernel inside > > > >> >> 'arch/arm64/mm/init.c' (see below): > > > >> >> > > > >> >> static void __init fdt_enforce_memory_region(void) > > > >> >> { > > > >> >> struct memblock_region reg = { > > > >> >> .size = 0, > > > >> >> }; > > > >> >> > > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > >> >> > > > >> >> if (reg.size) > > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > > >> >> comment this out */ > > > >> >> } > > > >> > > > > >> > Please just don't do that. It can cause a fatal damage on > > > >> > memory contents of the *crashed* kernel. > > > >> > > > > >> >> 5). Both the above temporary solutions fix the problem. > > > >> >> > > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > > >> >> fail. > > > >> >> > > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > > >> >> dt node 'linux,usable-memory-range' > > > >> > > > > >> > I still don't understand why we need to carry over the information > > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > > >> > such regions are free to be reused by the kernel after some point of > > > >> > initialization. Why does crash dump kernel need to know about them? > > > >> > > > > >> > > > >> Not really. According to the UEFI spec, they can be reclaimed after > > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > > >> no longer needs them. Of course, in order to be able to boot a kexec > > > >> kernel, those regions needs to be preserved, which is why they are > > > >> memblock_reserve()'d now. > > > > > > > > For my better understandings, who is actually accessing such regions > > > > during boot time, uefi itself or efistub? > > > > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > > instance, on QEMU we have > > > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > > 01000013) > > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > > BXPC 00000001) > > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > > BXPC 00000001) > > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > > BXPC 00000001) > > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > > BXPC 00000001) > > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > > BXPC 00000001) > > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > > BXPC 00000001) > > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > > BXPC 00000001) > > > > > > covered by > > > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > > ... > > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > > > OK. I mistakenly understood those regions could be freed after exiting > > UEFI boot services. > > > > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > > >> when booting the next kernel. > > > > > > > > not really. > > > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > > >> > on crash dump kernel?) > > > >> > > > > >> > > > >> I don't think so. And the change to the handling of ACPI reclaim > > > >> regions only revealed the bug, not created it (given that other > > > >> memblock_reserve regions may be affected as well) > > > > > > > > As whether we should honor such reserved regions over kexec'ing > > > > depends on each one's specific nature, we will have to take care one-by-one. > > > > As a matter of fact, no information about "reserved" memblocks is > > > > exposed to user space (via proc/iomem). > > > > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > > as 'System RAM'. Do you think that could solve this? > > > > Memblock-reserv'ing them is necessary to prevent their corruption and > > marking them under another name in /proc/iomem would also be good in order > > not to allocate them as part of crash kernel's memory. > > > > But I'm not still convinced that we should export them in useable- > > memory-range to crash dump kernel. They will be accessed through > > acpi_os_map_memory() and so won't be required to be part of system ram > > (or memblocks), I guess. > > -> Bhupesh? > > I forgot how arm64 kernel retrieve the memory ranges and initialize > them. If no "e820" like interfaces shouldn't kernel reinitialize all > the memory according to the efi memmap? For kdump kernel anything other > than usable memory (which is from the dt node instead) should be > reinitialized according to efi passed info, no? > > > > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > via a kernel command line parameter, "memmap=". > > memmap= is only used in old kexec-tools, now we are passing them via > e820 table. > > [snip] > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:43 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:43 UTC (permalink / raw) To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec, linux-kernel Fix the kexec list address. On 12/18/17 at 01:40pm, Dave Young wrote: > On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > > On 13 December 2017 at 12:16, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > > >> <takahiro.akashi@linaro.org> wrote: > > > >> > Bhupesh, Ard, > > > >> > > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > > >> >> Hi Ard, Akashi > > > >> >> > > > >> > (snip) > > > >> > > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > > >> >> identify its own usable memory and exclude, at its boot time, any > > > >> >> other memory areas that are part of the panicked kernel's memory. > > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > > >> >> , for details) > > > >> > > > > >> > Right. > > > >> > > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > > >> >> with the crashkernel memory range: > > > >> >> > > > >> >> /* add linux,usable-memory-range */ > > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > > >> >> address_cells, size_cells); > > > >> >> > > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > > >> >> , for details) > > > >> >> > > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > > >> >> they are marked as System RAM or as RESERVED. As, > > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > > >> >> > > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > > >> >> ACPI memory and crashes while trying to access the same: > > > >> >> > > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > > >> >> -r`.img --reuse-cmdline -d > > > >> >> > > > >> >> [snip..] > > > >> >> > > > >> >> Reserved memory range > > > >> >> 000000000e800000-000000002e7fffff (0) > > > >> >> > > > >> >> Coredump memory ranges > > > >> >> 0000000000000000-000000000e7fffff (0) > > > >> >> 000000002e800000-000000003961ffff (0) > > > >> >> 0000000039d40000-000000003ed2ffff (0) > > > >> >> 000000003ed60000-000000003fbfffff (0) > > > >> >> 0000001040000000-0000001ffbffffff (0) > > > >> >> 0000002000000000-0000002ffbffffff (0) > > > >> >> 0000009000000000-0000009ffbffffff (0) > > > >> >> 000000a000000000-000000affbffffff (0) > > > >> >> > > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > > >> >> memory cap'ing passed to the crash kernel inside > > > >> >> 'arch/arm64/mm/init.c' (see below): > > > >> >> > > > >> >> static void __init fdt_enforce_memory_region(void) > > > >> >> { > > > >> >> struct memblock_region reg = { > > > >> >> .size = 0, > > > >> >> }; > > > >> >> > > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > >> >> > > > >> >> if (reg.size) > > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > > >> >> comment this out */ > > > >> >> } > > > >> > > > > >> > Please just don't do that. It can cause a fatal damage on > > > >> > memory contents of the *crashed* kernel. > > > >> > > > > >> >> 5). Both the above temporary solutions fix the problem. > > > >> >> > > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > > >> >> fail. > > > >> >> > > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > > >> >> dt node 'linux,usable-memory-range' > > > >> > > > > >> > I still don't understand why we need to carry over the information > > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > > >> > such regions are free to be reused by the kernel after some point of > > > >> > initialization. Why does crash dump kernel need to know about them? > > > >> > > > > >> > > > >> Not really. According to the UEFI spec, they can be reclaimed after > > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > > >> no longer needs them. Of course, in order to be able to boot a kexec > > > >> kernel, those regions needs to be preserved, which is why they are > > > >> memblock_reserve()'d now. > > > > > > > > For my better understandings, who is actually accessing such regions > > > > during boot time, uefi itself or efistub? > > > > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > > instance, on QEMU we have > > > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > > 01000013) > > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > > BXPC 00000001) > > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > > BXPC 00000001) > > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > > BXPC 00000001) > > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > > BXPC 00000001) > > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > > BXPC 00000001) > > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > > BXPC 00000001) > > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > > BXPC 00000001) > > > > > > covered by > > > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > > ... > > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > > > OK. I mistakenly understood those regions could be freed after exiting > > UEFI boot services. > > > > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > > >> when booting the next kernel. > > > > > > > > not really. > > > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > > >> > on crash dump kernel?) > > > >> > > > > >> > > > >> I don't think so. And the change to the handling of ACPI reclaim > > > >> regions only revealed the bug, not created it (given that other > > > >> memblock_reserve regions may be affected as well) > > > > > > > > As whether we should honor such reserved regions over kexec'ing > > > > depends on each one's specific nature, we will have to take care one-by-one. > > > > As a matter of fact, no information about "reserved" memblocks is > > > > exposed to user space (via proc/iomem). > > > > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > > as 'System RAM'. Do you think that could solve this? > > > > Memblock-reserv'ing them is necessary to prevent their corruption and > > marking them under another name in /proc/iomem would also be good in order > > not to allocate them as part of crash kernel's memory. > > > > But I'm not still convinced that we should export them in useable- > > memory-range to crash dump kernel. They will be accessed through > > acpi_os_map_memory() and so won't be required to be part of system ram > > (or memblocks), I guess. > > -> Bhupesh? > > I forgot how arm64 kernel retrieve the memory ranges and initialize > them. If no "e820" like interfaces shouldn't kernel reinitialize all > the memory according to the efi memmap? For kdump kernel anything other > than usable memory (which is from the dt node instead) should be > reinitialized according to efi passed info, no? > > > > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > via a kernel command line parameter, "memmap=". > > memmap= is only used in old kexec-tools, now we are passing them via > e820 table. > > [snip] > > Thanks > Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:43 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:43 UTC (permalink / raw) To: linux-arm-kernel Fix the kexec list address. On 12/18/17 at 01:40pm, Dave Young wrote: > On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > > On 13 December 2017 at 12:16, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > > >> <takahiro.akashi@linaro.org> wrote: > > > >> > Bhupesh, Ard, > > > >> > > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > > >> >> Hi Ard, Akashi > > > >> >> > > > >> > (snip) > > > >> > > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > > >> >> identify its own usable memory and exclude, at its boot time, any > > > >> >> other memory areas that are part of the panicked kernel's memory. > > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > > >> >> , for details) > > > >> > > > > >> > Right. > > > >> > > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > > >> >> with the crashkernel memory range: > > > >> >> > > > >> >> /* add linux,usable-memory-range */ > > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > > >> >> address_cells, size_cells); > > > >> >> > > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > > >> >> , for details) > > > >> >> > > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > > >> >> they are marked as System RAM or as RESERVED. As, > > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > > >> >> > > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > > >> >> ACPI memory and crashes while trying to access the same: > > > >> >> > > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > > >> >> -r`.img --reuse-cmdline -d > > > >> >> > > > >> >> [snip..] > > > >> >> > > > >> >> Reserved memory range > > > >> >> 000000000e800000-000000002e7fffff (0) > > > >> >> > > > >> >> Coredump memory ranges > > > >> >> 0000000000000000-000000000e7fffff (0) > > > >> >> 000000002e800000-000000003961ffff (0) > > > >> >> 0000000039d40000-000000003ed2ffff (0) > > > >> >> 000000003ed60000-000000003fbfffff (0) > > > >> >> 0000001040000000-0000001ffbffffff (0) > > > >> >> 0000002000000000-0000002ffbffffff (0) > > > >> >> 0000009000000000-0000009ffbffffff (0) > > > >> >> 000000a000000000-000000affbffffff (0) > > > >> >> > > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > > >> >> memory cap'ing passed to the crash kernel inside > > > >> >> 'arch/arm64/mm/init.c' (see below): > > > >> >> > > > >> >> static void __init fdt_enforce_memory_region(void) > > > >> >> { > > > >> >> struct memblock_region reg = { > > > >> >> .size = 0, > > > >> >> }; > > > >> >> > > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > >> >> > > > >> >> if (reg.size) > > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > > >> >> comment this out */ > > > >> >> } > > > >> > > > > >> > Please just don't do that. It can cause a fatal damage on > > > >> > memory contents of the *crashed* kernel. > > > >> > > > > >> >> 5). Both the above temporary solutions fix the problem. > > > >> >> > > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > > >> >> fail. > > > >> >> > > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > > >> >> dt node 'linux,usable-memory-range' > > > >> > > > > >> > I still don't understand why we need to carry over the information > > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > > >> > such regions are free to be reused by the kernel after some point of > > > >> > initialization. Why does crash dump kernel need to know about them? > > > >> > > > > >> > > > >> Not really. According to the UEFI spec, they can be reclaimed after > > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > > >> no longer needs them. Of course, in order to be able to boot a kexec > > > >> kernel, those regions needs to be preserved, which is why they are > > > >> memblock_reserve()'d now. > > > > > > > > For my better understandings, who is actually accessing such regions > > > > during boot time, uefi itself or efistub? > > > > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > > instance, on QEMU we have > > > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > > 01000013) > > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > > BXPC 00000001) > > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > > BXPC 00000001) > > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > > BXPC 00000001) > > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > > BXPC 00000001) > > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > > BXPC 00000001) > > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > > BXPC 00000001) > > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > > BXPC 00000001) > > > > > > covered by > > > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > > ... > > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > > > OK. I mistakenly understood those regions could be freed after exiting > > UEFI boot services. > > > > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > > >> when booting the next kernel. > > > > > > > > not really. > > > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > > >> > on crash dump kernel?) > > > >> > > > > >> > > > >> I don't think so. And the change to the handling of ACPI reclaim > > > >> regions only revealed the bug, not created it (given that other > > > >> memblock_reserve regions may be affected as well) > > > > > > > > As whether we should honor such reserved regions over kexec'ing > > > > depends on each one's specific nature, we will have to take care one-by-one. > > > > As a matter of fact, no information about "reserved" memblocks is > > > > exposed to user space (via proc/iomem). > > > > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > > as 'System RAM'. Do you think that could solve this? > > > > Memblock-reserv'ing them is necessary to prevent their corruption and > > marking them under another name in /proc/iomem would also be good in order > > not to allocate them as part of crash kernel's memory. > > > > But I'm not still convinced that we should export them in useable- > > memory-range to crash dump kernel. They will be accessed through > > acpi_os_map_memory() and so won't be required to be part of system ram > > (or memblocks), I guess. > > -> Bhupesh? > > I forgot how arm64 kernel retrieve the memory ranges and initialize > them. If no "e820" like interfaces shouldn't kernel reinitialize all > the memory according to the efi memmap? For kdump kernel anything other > than usable memory (which is from the dt node instead) should be > reinitialized according to efi passed info, no? > > > > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > via a kernel command line parameter, "memmap=". > > memmap= is only used in old kexec-tools, now we are passing them via > e820 table. > > [snip] > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-18 5:43 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-18 5:43 UTC (permalink / raw) To: AKASHI Takahiro, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-kernel-u79uwXL29TY76Z2rM5mHXA Fix the kexec list address. On 12/18/17 at 01:40pm, Dave Young wrote: > On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > > On 13 December 2017 at 12:16, AKASHI Takahiro > > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > > >> > Bhupesh, Ard, > > > >> > > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > > >> >> Hi Ard, Akashi > > > >> >> > > > >> > (snip) > > > >> > > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > > >> >> identify its own usable memory and exclude, at its boot time, any > > > >> >> other memory areas that are part of the panicked kernel's memory. > > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > > >> >> , for details) > > > >> > > > > >> > Right. > > > >> > > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > > >> >> with the crashkernel memory range: > > > >> >> > > > >> >> /* add linux,usable-memory-range */ > > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > > >> >> address_cells, size_cells); > > > >> >> > > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > > >> >> , for details) > > > >> >> > > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > > >> >> they are marked as System RAM or as RESERVED. As, > > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > > >> >> > > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > > >> >> ACPI memory and crashes while trying to access the same: > > > >> >> > > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > > >> >> -r`.img --reuse-cmdline -d > > > >> >> > > > >> >> [snip..] > > > >> >> > > > >> >> Reserved memory range > > > >> >> 000000000e800000-000000002e7fffff (0) > > > >> >> > > > >> >> Coredump memory ranges > > > >> >> 0000000000000000-000000000e7fffff (0) > > > >> >> 000000002e800000-000000003961ffff (0) > > > >> >> 0000000039d40000-000000003ed2ffff (0) > > > >> >> 000000003ed60000-000000003fbfffff (0) > > > >> >> 0000001040000000-0000001ffbffffff (0) > > > >> >> 0000002000000000-0000002ffbffffff (0) > > > >> >> 0000009000000000-0000009ffbffffff (0) > > > >> >> 000000a000000000-000000affbffffff (0) > > > >> >> > > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > > >> >> memory cap'ing passed to the crash kernel inside > > > >> >> 'arch/arm64/mm/init.c' (see below): > > > >> >> > > > >> >> static void __init fdt_enforce_memory_region(void) > > > >> >> { > > > >> >> struct memblock_region reg = { > > > >> >> .size = 0, > > > >> >> }; > > > >> >> > > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > >> >> > > > >> >> if (reg.size) > > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > > >> >> comment this out */ > > > >> >> } > > > >> > > > > >> > Please just don't do that. It can cause a fatal damage on > > > >> > memory contents of the *crashed* kernel. > > > >> > > > > >> >> 5). Both the above temporary solutions fix the problem. > > > >> >> > > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > > >> >> fail. > > > >> >> > > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > > >> >> dt node 'linux,usable-memory-range' > > > >> > > > > >> > I still don't understand why we need to carry over the information > > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > > >> > such regions are free to be reused by the kernel after some point of > > > >> > initialization. Why does crash dump kernel need to know about them? > > > >> > > > > >> > > > >> Not really. According to the UEFI spec, they can be reclaimed after > > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > > >> no longer needs them. Of course, in order to be able to boot a kexec > > > >> kernel, those regions needs to be preserved, which is why they are > > > >> memblock_reserve()'d now. > > > > > > > > For my better understandings, who is actually accessing such regions > > > > during boot time, uefi itself or efistub? > > > > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > > instance, on QEMU we have > > > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > > 01000013) > > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > > BXPC 00000001) > > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > > BXPC 00000001) > > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > > BXPC 00000001) > > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > > BXPC 00000001) > > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > > BXPC 00000001) > > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > > BXPC 00000001) > > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > > BXPC 00000001) > > > > > > covered by > > > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > > ... > > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > > > OK. I mistakenly understood those regions could be freed after exiting > > UEFI boot services. > > > > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > > >> when booting the next kernel. > > > > > > > > not really. > > > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > > >> > on crash dump kernel?) > > > >> > > > > >> > > > >> I don't think so. And the change to the handling of ACPI reclaim > > > >> regions only revealed the bug, not created it (given that other > > > >> memblock_reserve regions may be affected as well) > > > > > > > > As whether we should honor such reserved regions over kexec'ing > > > > depends on each one's specific nature, we will have to take care one-by-one. > > > > As a matter of fact, no information about "reserved" memblocks is > > > > exposed to user space (via proc/iomem). > > > > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > > as 'System RAM'. Do you think that could solve this? > > > > Memblock-reserv'ing them is necessary to prevent their corruption and > > marking them under another name in /proc/iomem would also be good in order > > not to allocate them as part of crash kernel's memory. > > > > But I'm not still convinced that we should export them in useable- > > memory-range to crash dump kernel. They will be accessed through > > acpi_os_map_memory() and so won't be required to be part of system ram > > (or memblocks), I guess. > > -> Bhupesh? > > I forgot how arm64 kernel retrieve the memory ranges and initialize > them. If no "e820" like interfaces shouldn't kernel reinitialize all > the memory according to the efi memmap? For kdump kernel anything other > than usable memory (which is from the dt node instead) should be > reinitialized according to efi passed info, no? > > > > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > via a kernel command line parameter, "memmap=". > > memmap= is only used in old kexec-tools, now we are passing them via > e820 table. > > [snip] > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171218054009.GA6392-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-18 5:40 ` Dave Young @ 2017-12-19 6:09 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 6:09 UTC (permalink / raw) To: Dave Young Cc: Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > > On 13 December 2017 at 12:16, AKASHI Takahiro > > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > > >> > Bhupesh, Ard, > > > >> > > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > > >> >> Hi Ard, Akashi > > > >> >> > > > >> > (snip) > > > >> > > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > > >> >> identify its own usable memory and exclude, at its boot time, any > > > >> >> other memory areas that are part of the panicked kernel's memory. > > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > > >> >> , for details) > > > >> > > > > >> > Right. > > > >> > > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > > >> >> with the crashkernel memory range: > > > >> >> > > > >> >> /* add linux,usable-memory-range */ > > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > > >> >> address_cells, size_cells); > > > >> >> > > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > > >> >> , for details) > > > >> >> > > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > > >> >> they are marked as System RAM or as RESERVED. As, > > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > > >> >> > > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > > >> >> ACPI memory and crashes while trying to access the same: > > > >> >> > > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > > >> >> -r`.img --reuse-cmdline -d > > > >> >> > > > >> >> [snip..] > > > >> >> > > > >> >> Reserved memory range > > > >> >> 000000000e800000-000000002e7fffff (0) > > > >> >> > > > >> >> Coredump memory ranges > > > >> >> 0000000000000000-000000000e7fffff (0) > > > >> >> 000000002e800000-000000003961ffff (0) > > > >> >> 0000000039d40000-000000003ed2ffff (0) > > > >> >> 000000003ed60000-000000003fbfffff (0) > > > >> >> 0000001040000000-0000001ffbffffff (0) > > > >> >> 0000002000000000-0000002ffbffffff (0) > > > >> >> 0000009000000000-0000009ffbffffff (0) > > > >> >> 000000a000000000-000000affbffffff (0) > > > >> >> > > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > > >> >> memory cap'ing passed to the crash kernel inside > > > >> >> 'arch/arm64/mm/init.c' (see below): > > > >> >> > > > >> >> static void __init fdt_enforce_memory_region(void) > > > >> >> { > > > >> >> struct memblock_region reg = { > > > >> >> .size = 0, > > > >> >> }; > > > >> >> > > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > >> >> > > > >> >> if (reg.size) > > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > > >> >> comment this out */ > > > >> >> } > > > >> > > > > >> > Please just don't do that. It can cause a fatal damage on > > > >> > memory contents of the *crashed* kernel. > > > >> > > > > >> >> 5). Both the above temporary solutions fix the problem. > > > >> >> > > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > > >> >> fail. > > > >> >> > > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > > >> >> dt node 'linux,usable-memory-range' > > > >> > > > > >> > I still don't understand why we need to carry over the information > > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > > >> > such regions are free to be reused by the kernel after some point of > > > >> > initialization. Why does crash dump kernel need to know about them? > > > >> > > > > >> > > > >> Not really. According to the UEFI spec, they can be reclaimed after > > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > > >> no longer needs them. Of course, in order to be able to boot a kexec > > > >> kernel, those regions needs to be preserved, which is why they are > > > >> memblock_reserve()'d now. > > > > > > > > For my better understandings, who is actually accessing such regions > > > > during boot time, uefi itself or efistub? > > > > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > > instance, on QEMU we have > > > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > > 01000013) > > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > > BXPC 00000001) > > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > > BXPC 00000001) > > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > > BXPC 00000001) > > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > > BXPC 00000001) > > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > > BXPC 00000001) > > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > > BXPC 00000001) > > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > > BXPC 00000001) > > > > > > covered by > > > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > > ... > > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > > > OK. I mistakenly understood those regions could be freed after exiting > > UEFI boot services. > > > > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > > >> when booting the next kernel. > > > > > > > > not really. > > > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > > >> > on crash dump kernel?) > > > >> > > > > >> > > > >> I don't think so. And the change to the handling of ACPI reclaim > > > >> regions only revealed the bug, not created it (given that other > > > >> memblock_reserve regions may be affected as well) > > > > > > > > As whether we should honor such reserved regions over kexec'ing > > > > depends on each one's specific nature, we will have to take care one-by-one. > > > > As a matter of fact, no information about "reserved" memblocks is > > > > exposed to user space (via proc/iomem). > > > > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > > as 'System RAM'. Do you think that could solve this? > > > > Memblock-reserv'ing them is necessary to prevent their corruption and > > marking them under another name in /proc/iomem would also be good in order > > not to allocate them as part of crash kernel's memory. > > > > But I'm not still convinced that we should export them in useable- > > memory-range to crash dump kernel. They will be accessed through > > acpi_os_map_memory() and so won't be required to be part of system ram > > (or memblocks), I guess. > > -> Bhupesh? > > I forgot how arm64 kernel retrieve the memory ranges and initialize > them. If no "e820" like interfaces shouldn't kernel reinitialize all > the memory according to the efi memmap? For kdump kernel anything other > than usable memory (which is from the dt node instead) should be > reinitialized according to efi passed info, no? All the regions exported in efi memmap will be added to memblock.memory in (u)efi_init() and then trimmed down to the exact range specified as usable-memory-range by fdt_enforce_memory_region(). Now I noticed that the current fdt_enforce_memory_region() may not work well with multiple entries in usable-memory-range. > > > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > via a kernel command line parameter, "memmap=". > > memmap= is only used in old kexec-tools, now we are passing them via > e820 table. Thanks. I remember that you have explained it before. -Takahiro AKASHI > [snip] > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 6:09 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-19 6:09 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > > > On 13 December 2017 at 12:16, AKASHI Takahiro > > > <takahiro.akashi@linaro.org> wrote: > > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > > > >> <takahiro.akashi@linaro.org> wrote: > > > >> > Bhupesh, Ard, > > > >> > > > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > > > >> >> Hi Ard, Akashi > > > >> >> > > > >> > (snip) > > > >> > > > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > > > >> >> identify its own usable memory and exclude, at its boot time, any > > > >> >> other memory areas that are part of the panicked kernel's memory. > > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > > > >> >> , for details) > > > >> > > > > >> > Right. > > > >> > > > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > > > >> >> with the crashkernel memory range: > > > >> >> > > > >> >> /* add linux,usable-memory-range */ > > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > > > >> >> address_cells, size_cells); > > > >> >> > > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > > > >> >> , for details) > > > >> >> > > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > > > >> >> they are marked as System RAM or as RESERVED. As, > > > >> >> 'linux,usable-memory-range' dt node is patched up only with > > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > > > >> >> > > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > > > >> >> ACPI memory and crashes while trying to access the same: > > > >> >> > > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > > > >> >> -r`.img --reuse-cmdline -d > > > >> >> > > > >> >> [snip..] > > > >> >> > > > >> >> Reserved memory range > > > >> >> 000000000e800000-000000002e7fffff (0) > > > >> >> > > > >> >> Coredump memory ranges > > > >> >> 0000000000000000-000000000e7fffff (0) > > > >> >> 000000002e800000-000000003961ffff (0) > > > >> >> 0000000039d40000-000000003ed2ffff (0) > > > >> >> 000000003ed60000-000000003fbfffff (0) > > > >> >> 0000001040000000-0000001ffbffffff (0) > > > >> >> 0000002000000000-0000002ffbffffff (0) > > > >> >> 0000009000000000-0000009ffbffffff (0) > > > >> >> 000000a000000000-000000affbffffff (0) > > > >> >> > > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > > > >> >> memory cap'ing passed to the crash kernel inside > > > >> >> 'arch/arm64/mm/init.c' (see below): > > > >> >> > > > >> >> static void __init fdt_enforce_memory_region(void) > > > >> >> { > > > >> >> struct memblock_region reg = { > > > >> >> .size = 0, > > > >> >> }; > > > >> >> > > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > >> >> > > > >> >> if (reg.size) > > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > > > >> >> comment this out */ > > > >> >> } > > > >> > > > > >> > Please just don't do that. It can cause a fatal damage on > > > >> > memory contents of the *crashed* kernel. > > > >> > > > > >> >> 5). Both the above temporary solutions fix the problem. > > > >> >> > > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > > > >> >> fail. > > > >> >> > > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > > > >> >> dt node 'linux,usable-memory-range' > > > >> > > > > >> > I still don't understand why we need to carry over the information > > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > > > >> > such regions are free to be reused by the kernel after some point of > > > >> > initialization. Why does crash dump kernel need to know about them? > > > >> > > > > >> > > > >> Not really. According to the UEFI spec, they can be reclaimed after > > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > > > >> no longer needs them. Of course, in order to be able to boot a kexec > > > >> kernel, those regions needs to be preserved, which is why they are > > > >> memblock_reserve()'d now. > > > > > > > > For my better understandings, who is actually accessing such regions > > > > during boot time, uefi itself or efistub? > > > > > > > > > > No, only the kernel. This is where the ACPI tables are stored. For > > > instance, on QEMU we have > > > > > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > > > 01000013) > > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > > > BXPC 00000001) > > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > > > BXPC 00000001) > > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > > > BXPC 00000001) > > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > > > BXPC 00000001) > > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > > > BXPC 00000001) > > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > > > BXPC 00000001) > > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > > > BXPC 00000001) > > > > > > covered by > > > > > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > > > ... > > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > > > > OK. I mistakenly understood those regions could be freed after exiting > > UEFI boot services. > > > > > > > > >> So it seems that kexec does not honour the memblock_reserve() table > > > >> when booting the next kernel. > > > > > > > > not really. > > > > > > > >> > (In other words, can or should we skip some part of ACPI-related init code > > > >> > on crash dump kernel?) > > > >> > > > > >> > > > >> I don't think so. And the change to the handling of ACPI reclaim > > > >> regions only revealed the bug, not created it (given that other > > > >> memblock_reserve regions may be affected as well) > > > > > > > > As whether we should honor such reserved regions over kexec'ing > > > > depends on each one's specific nature, we will have to take care one-by-one. > > > > As a matter of fact, no information about "reserved" memblocks is > > > > exposed to user space (via proc/iomem). > > > > > > > > > > That is why I suggested (somewhere in this thread?) to not expose them > > > as 'System RAM'. Do you think that could solve this? > > > > Memblock-reserv'ing them is necessary to prevent their corruption and > > marking them under another name in /proc/iomem would also be good in order > > not to allocate them as part of crash kernel's memory. > > > > But I'm not still convinced that we should export them in useable- > > memory-range to crash dump kernel. They will be accessed through > > acpi_os_map_memory() and so won't be required to be part of system ram > > (or memblocks), I guess. > > -> Bhupesh? > > I forgot how arm64 kernel retrieve the memory ranges and initialize > them. If no "e820" like interfaces shouldn't kernel reinitialize all > the memory according to the efi memmap? For kdump kernel anything other > than usable memory (which is from the dt node instead) should be > reinitialized according to efi passed info, no? All the regions exported in efi memmap will be added to memblock.memory in (u)efi_init() and then trimmed down to the exact range specified as usable-memory-range by fdt_enforce_memory_region(). Now I noticed that the current fdt_enforce_memory_region() may not work well with multiple entries in usable-memory-range. > > > > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > > via a kernel command line parameter, "memmap=". > > memmap= is only used in old kexec-tools, now we are passing them via > e820 table. Thanks. I remember that you have explained it before. -Takahiro AKASHI > [snip] > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171219060927.GH28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-19 6:09 ` AKASHI Takahiro @ 2017-12-19 13:09 ` Ard Biesheuvel -1 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-19 13:09 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Ard Biesheuvel, Bhupesh Sharma, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On 19 December 2017 at 07:09, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > > >> > Bhupesh, Ard, >> > > >> > >> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> > > >> >> Hi Ard, Akashi >> > > >> >> >> > > >> > (snip) >> > > >> > >> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> > > >> >> identify its own usable memory and exclude, at its boot time, any >> > > >> >> other memory areas that are part of the panicked kernel's memory. >> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> > > >> >> , for details) >> > > >> > >> > > >> > Right. >> > > >> > >> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> > > >> >> with the crashkernel memory range: >> > > >> >> >> > > >> >> /* add linux,usable-memory-range */ >> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> > > >> >> address_cells, size_cells); >> > > >> >> >> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> > > >> >> , for details) >> > > >> >> >> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> > > >> >> they are marked as System RAM or as RESERVED. As, >> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> > > >> >> >> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> > > >> >> ACPI memory and crashes while trying to access the same: >> > > >> >> >> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> > > >> >> -r`.img --reuse-cmdline -d >> > > >> >> >> > > >> >> [snip..] >> > > >> >> >> > > >> >> Reserved memory range >> > > >> >> 000000000e800000-000000002e7fffff (0) >> > > >> >> >> > > >> >> Coredump memory ranges >> > > >> >> 0000000000000000-000000000e7fffff (0) >> > > >> >> 000000002e800000-000000003961ffff (0) >> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> > > >> >> 000000003ed60000-000000003fbfffff (0) >> > > >> >> 0000001040000000-0000001ffbffffff (0) >> > > >> >> 0000002000000000-0000002ffbffffff (0) >> > > >> >> 0000009000000000-0000009ffbffffff (0) >> > > >> >> 000000a000000000-000000affbffffff (0) >> > > >> >> >> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> > > >> >> memory cap'ing passed to the crash kernel inside >> > > >> >> 'arch/arm64/mm/init.c' (see below): >> > > >> >> >> > > >> >> static void __init fdt_enforce_memory_region(void) >> > > >> >> { >> > > >> >> struct memblock_region reg = { >> > > >> >> .size = 0, >> > > >> >> }; >> > > >> >> >> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > > >> >> >> > > >> >> if (reg.size) >> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> > > >> >> comment this out */ >> > > >> >> } >> > > >> > >> > > >> > Please just don't do that. It can cause a fatal damage on >> > > >> > memory contents of the *crashed* kernel. >> > > >> > >> > > >> >> 5). Both the above temporary solutions fix the problem. >> > > >> >> >> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> > > >> >> fail. >> > > >> >> >> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> > > >> >> dt node 'linux,usable-memory-range' >> > > >> > >> > > >> > I still don't understand why we need to carry over the information >> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > > >> > such regions are free to be reused by the kernel after some point of >> > > >> > initialization. Why does crash dump kernel need to know about them? >> > > >> > >> > > >> >> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> > > >> kernel, those regions needs to be preserved, which is why they are >> > > >> memblock_reserve()'d now. >> > > > >> > > > For my better understandings, who is actually accessing such regions >> > > > during boot time, uefi itself or efistub? >> > > > >> > > >> > > No, only the kernel. This is where the ACPI tables are stored. For >> > > instance, on QEMU we have >> > > >> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> > > 01000013) >> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> > > BXPC 00000001) >> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> > > BXPC 00000001) >> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> > > BXPC 00000001) >> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> > > BXPC 00000001) >> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> > > BXPC 00000001) >> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> > > BXPC 00000001) >> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> > > BXPC 00000001) >> > > >> > > covered by >> > > >> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> > > ... >> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> > >> > OK. I mistakenly understood those regions could be freed after exiting >> > UEFI boot services. >> > >> > > >> > > >> So it seems that kexec does not honour the memblock_reserve() table >> > > >> when booting the next kernel. >> > > > >> > > > not really. >> > > > >> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> > > >> > on crash dump kernel?) >> > > >> > >> > > >> >> > > >> I don't think so. And the change to the handling of ACPI reclaim >> > > >> regions only revealed the bug, not created it (given that other >> > > >> memblock_reserve regions may be affected as well) >> > > > >> > > > As whether we should honor such reserved regions over kexec'ing >> > > > depends on each one's specific nature, we will have to take care one-by-one. >> > > > As a matter of fact, no information about "reserved" memblocks is >> > > > exposed to user space (via proc/iomem). >> > > > >> > > >> > > That is why I suggested (somewhere in this thread?) to not expose them >> > > as 'System RAM'. Do you think that could solve this? >> > >> > Memblock-reserv'ing them is necessary to prevent their corruption and >> > marking them under another name in /proc/iomem would also be good in order >> > not to allocate them as part of crash kernel's memory. >> > >> > But I'm not still convinced that we should export them in useable- >> > memory-range to crash dump kernel. They will be accessed through >> > acpi_os_map_memory() and so won't be required to be part of system ram >> > (or memblocks), I guess. >> > -> Bhupesh? >> >> I forgot how arm64 kernel retrieve the memory ranges and initialize >> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> the memory according to the efi memmap? For kdump kernel anything other >> than usable memory (which is from the dt node instead) should be >> reinitialized according to efi passed info, no? > > All the regions exported in efi memmap will be added to memblock.memory > in (u)efi_init() and then trimmed down to the exact range specified as > usable-memory-range by fdt_enforce_memory_region(). > > Now I noticed that the current fdt_enforce_memory_region() may not work well > with multiple entries in usable-memory-range. > In any case, the root of the problem is that memory regions lose their 'memory' annotation due to the way the memory map is mangled before being supplied to the kexec kernel. Would it be possible to classify all memory that we want to hide from the kexec kernel as NOMAP instead? That way, it will not be mapped implicitly, but will still be mapped cacheable by acpi_os_ioremap(), so this seems to be the most appropriate way to deal with the host kernel's memory contents. >> > >> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> > via a kernel command line parameter, "memmap=". >> >> memmap= is only used in old kexec-tools, now we are passing them via >> e820 table. > > Thanks. I remember that you have explained it before. > > -Takahiro AKASHI > >> [snip] >> >> Thanks >> Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-19 13:09 ` Ard Biesheuvel 0 siblings, 0 replies; 135+ messages in thread From: Ard Biesheuvel @ 2017-12-19 13:09 UTC (permalink / raw) To: linux-arm-kernel On 19 December 2017 at 07:09, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> > > <takahiro.akashi@linaro.org> wrote: >> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> > > >> <takahiro.akashi@linaro.org> wrote: >> > > >> > Bhupesh, Ard, >> > > >> > >> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> > > >> >> Hi Ard, Akashi >> > > >> >> >> > > >> > (snip) >> > > >> > >> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> > > >> >> identify its own usable memory and exclude, at its boot time, any >> > > >> >> other memory areas that are part of the panicked kernel's memory. >> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> > > >> >> , for details) >> > > >> > >> > > >> > Right. >> > > >> > >> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> > > >> >> with the crashkernel memory range: >> > > >> >> >> > > >> >> /* add linux,usable-memory-range */ >> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> > > >> >> address_cells, size_cells); >> > > >> >> >> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> > > >> >> , for details) >> > > >> >> >> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> > > >> >> they are marked as System RAM or as RESERVED. As, >> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> > > >> >> >> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> > > >> >> ACPI memory and crashes while trying to access the same: >> > > >> >> >> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> > > >> >> -r`.img --reuse-cmdline -d >> > > >> >> >> > > >> >> [snip..] >> > > >> >> >> > > >> >> Reserved memory range >> > > >> >> 000000000e800000-000000002e7fffff (0) >> > > >> >> >> > > >> >> Coredump memory ranges >> > > >> >> 0000000000000000-000000000e7fffff (0) >> > > >> >> 000000002e800000-000000003961ffff (0) >> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> > > >> >> 000000003ed60000-000000003fbfffff (0) >> > > >> >> 0000001040000000-0000001ffbffffff (0) >> > > >> >> 0000002000000000-0000002ffbffffff (0) >> > > >> >> 0000009000000000-0000009ffbffffff (0) >> > > >> >> 000000a000000000-000000affbffffff (0) >> > > >> >> >> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> > > >> >> memory cap'ing passed to the crash kernel inside >> > > >> >> 'arch/arm64/mm/init.c' (see below): >> > > >> >> >> > > >> >> static void __init fdt_enforce_memory_region(void) >> > > >> >> { >> > > >> >> struct memblock_region reg = { >> > > >> >> .size = 0, >> > > >> >> }; >> > > >> >> >> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > > >> >> >> > > >> >> if (reg.size) >> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> > > >> >> comment this out */ >> > > >> >> } >> > > >> > >> > > >> > Please just don't do that. It can cause a fatal damage on >> > > >> > memory contents of the *crashed* kernel. >> > > >> > >> > > >> >> 5). Both the above temporary solutions fix the problem. >> > > >> >> >> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> > > >> >> fail. >> > > >> >> >> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> > > >> >> dt node 'linux,usable-memory-range' >> > > >> > >> > > >> > I still don't understand why we need to carry over the information >> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> > > >> > such regions are free to be reused by the kernel after some point of >> > > >> > initialization. Why does crash dump kernel need to know about them? >> > > >> > >> > > >> >> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> > > >> kernel, those regions needs to be preserved, which is why they are >> > > >> memblock_reserve()'d now. >> > > > >> > > > For my better understandings, who is actually accessing such regions >> > > > during boot time, uefi itself or efistub? >> > > > >> > > >> > > No, only the kernel. This is where the ACPI tables are stored. For >> > > instance, on QEMU we have >> > > >> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> > > 01000013) >> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> > > BXPC 00000001) >> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> > > BXPC 00000001) >> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> > > BXPC 00000001) >> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> > > BXPC 00000001) >> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> > > BXPC 00000001) >> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> > > BXPC 00000001) >> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> > > BXPC 00000001) >> > > >> > > covered by >> > > >> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> > > ... >> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> > >> > OK. I mistakenly understood those regions could be freed after exiting >> > UEFI boot services. >> > >> > > >> > > >> So it seems that kexec does not honour the memblock_reserve() table >> > > >> when booting the next kernel. >> > > > >> > > > not really. >> > > > >> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> > > >> > on crash dump kernel?) >> > > >> > >> > > >> >> > > >> I don't think so. And the change to the handling of ACPI reclaim >> > > >> regions only revealed the bug, not created it (given that other >> > > >> memblock_reserve regions may be affected as well) >> > > > >> > > > As whether we should honor such reserved regions over kexec'ing >> > > > depends on each one's specific nature, we will have to take care one-by-one. >> > > > As a matter of fact, no information about "reserved" memblocks is >> > > > exposed to user space (via proc/iomem). >> > > > >> > > >> > > That is why I suggested (somewhere in this thread?) to not expose them >> > > as 'System RAM'. Do you think that could solve this? >> > >> > Memblock-reserv'ing them is necessary to prevent their corruption and >> > marking them under another name in /proc/iomem would also be good in order >> > not to allocate them as part of crash kernel's memory. >> > >> > But I'm not still convinced that we should export them in useable- >> > memory-range to crash dump kernel. They will be accessed through >> > acpi_os_map_memory() and so won't be required to be part of system ram >> > (or memblocks), I guess. >> > -> Bhupesh? >> >> I forgot how arm64 kernel retrieve the memory ranges and initialize >> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> the memory according to the efi memmap? For kdump kernel anything other >> than usable memory (which is from the dt node instead) should be >> reinitialized according to efi passed info, no? > > All the regions exported in efi memmap will be added to memblock.memory > in (u)efi_init() and then trimmed down to the exact range specified as > usable-memory-range by fdt_enforce_memory_region(). > > Now I noticed that the current fdt_enforce_memory_region() may not work well > with multiple entries in usable-memory-range. > In any case, the root of the problem is that memory regions lose their 'memory' annotation due to the way the memory map is mangled before being supplied to the kexec kernel. Would it be possible to classify all memory that we want to hide from the kexec kernel as NOMAP instead? That way, it will not be mapped implicitly, but will still be mapped cacheable by acpi_os_ioremap(), so this seems to be the most appropriate way to deal with the host kernel's memory contents. >> > >> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> > via a kernel command line parameter, "memmap=". >> >> memmap= is only used in old kexec-tools, now we are passing them via >> e820 table. > > Thanks. I remember that you have explained it before. > > -Takahiro AKASHI > >> [snip] >> >> Thanks >> Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CAKv+Gu-gmbWdZ7rxp5qGrtSBQ7dM=3FqF-Pw=J0LaL=oKTMg4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-19 13:09 ` Ard Biesheuvel @ 2017-12-20 20:00 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-20 20:00 UTC (permalink / raw) To: Ard Biesheuvel Cc: AKASHI Takahiro, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-TuqUDEhatI4ANWPb/1PvSmm0pvjS0E/A On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On 19 December 2017 at 07:09, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >>> > > >> > Bhupesh, Ard, >>> > > >> > >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >>> > > >> >> Hi Ard, Akashi >>> > > >> >> >>> > > >> > (snip) >>> > > >> > >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >>> > > >> >> , for details) >>> > > >> > >>> > > >> > Right. >>> > > >> > >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >>> > > >> >> with the crashkernel memory range: >>> > > >> >> >>> > > >> >> /* add linux,usable-memory-range */ >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >>> > > >> >> address_cells, size_cells); >>> > > >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >>> > > >> >> , for details) >>> > > >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >>> > > >> >> they are marked as System RAM or as RESERVED. As, >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >>> > > >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >>> > > >> >> ACPI memory and crashes while trying to access the same: >>> > > >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >>> > > >> >> -r`.img --reuse-cmdline -d >>> > > >> >> >>> > > >> >> [snip..] >>> > > >> >> >>> > > >> >> Reserved memory range >>> > > >> >> 000000000e800000-000000002e7fffff (0) >>> > > >> >> >>> > > >> >> Coredump memory ranges >>> > > >> >> 0000000000000000-000000000e7fffff (0) >>> > > >> >> 000000002e800000-000000003961ffff (0) >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >>> > > >> >> 000000a000000000-000000affbffffff (0) >>> > > >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >>> > > >> >> memory cap'ing passed to the crash kernel inside >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >>> > > >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >>> > > >> >> { >>> > > >> >> struct memblock_region reg = { >>> > > >> >> .size = 0, >>> > > >> >> }; >>> > > >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >>> > > >> >> >>> > > >> >> if (reg.size) >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >>> > > >> >> comment this out */ >>> > > >> >> } >>> > > >> > >>> > > >> > Please just don't do that. It can cause a fatal damage on >>> > > >> > memory contents of the *crashed* kernel. >>> > > >> > >>> > > >> >> 5). Both the above temporary solutions fix the problem. >>> > > >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >>> > > >> >> fail. >>> > > >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >>> > > >> >> dt node 'linux,usable-memory-range' >>> > > >> > >>> > > >> > I still don't understand why we need to carry over the information >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >>> > > >> > such regions are free to be reused by the kernel after some point of >>> > > >> > initialization. Why does crash dump kernel need to know about them? >>> > > >> > >>> > > >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >>> > > >> kernel, those regions needs to be preserved, which is why they are >>> > > >> memblock_reserve()'d now. >>> > > > >>> > > > For my better understandings, who is actually accessing such regions >>> > > > during boot time, uefi itself or efistub? >>> > > > >>> > > >>> > > No, only the kernel. This is where the ACPI tables are stored. For >>> > > instance, on QEMU we have >>> > > >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >>> > > 01000013) >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >>> > > BXPC 00000001) >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >>> > > BXPC 00000001) >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >>> > > BXPC 00000001) >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >>> > > BXPC 00000001) >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >>> > > BXPC 00000001) >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >>> > > BXPC 00000001) >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >>> > > BXPC 00000001) >>> > > >>> > > covered by >>> > > >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >>> > > ... >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >>> > >>> > OK. I mistakenly understood those regions could be freed after exiting >>> > UEFI boot services. >>> > >>> > > >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >>> > > >> when booting the next kernel. >>> > > > >>> > > > not really. >>> > > > >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >>> > > >> > on crash dump kernel?) >>> > > >> > >>> > > >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >>> > > >> regions only revealed the bug, not created it (given that other >>> > > >> memblock_reserve regions may be affected as well) >>> > > > >>> > > > As whether we should honor such reserved regions over kexec'ing >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >>> > > > As a matter of fact, no information about "reserved" memblocks is >>> > > > exposed to user space (via proc/iomem). >>> > > > >>> > > >>> > > That is why I suggested (somewhere in this thread?) to not expose them >>> > > as 'System RAM'. Do you think that could solve this? >>> > >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >>> > marking them under another name in /proc/iomem would also be good in order >>> > not to allocate them as part of crash kernel's memory. >>> > >>> > But I'm not still convinced that we should export them in useable- >>> > memory-range to crash dump kernel. They will be accessed through >>> > acpi_os_map_memory() and so won't be required to be part of system ram >>> > (or memblocks), I guess. >>> > -> Bhupesh? >>> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >>> the memory according to the efi memmap? For kdump kernel anything other >>> than usable memory (which is from the dt node instead) should be >>> reinitialized according to efi passed info, no? >> >> All the regions exported in efi memmap will be added to memblock.memory >> in (u)efi_init() and then trimmed down to the exact range specified as >> usable-memory-range by fdt_enforce_memory_region(). >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> with multiple entries in usable-memory-range. >> > > In any case, the root of the problem is that memory regions lose their > 'memory' annotation due to the way the memory map is mangled before > being supplied to the kexec kernel. > > Would it be possible to classify all memory that we want to hide from > the kexec kernel as NOMAP instead? That way, it will not be mapped > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > so this seems to be the most appropriate way to deal with the host > kernel's memory contents. Hmm. wouldn't appending the acpi reclaim regions to 'linux,usable-memory-range' in the dtb being passed to the crashkernel be better? Because its indirectly achieving a similar objective (although may be a subset of all System RAM regions on the primary kernel's memory). I am not aware of the background about the current kexec-tools implementation where we add only the crashkernel range to the dtb being passed to the crashkernel. Probably Akashi can answer better, as to how we arrived at this design approach and why we didn't want to expose all System RAM regions (i.e. ! NOMPAP regions) to the crashkernel. I am suspecting that some issues were seen/meet when the System RAM (! NOMAP regions) were exposed to the crashkernel, and that's why we finalized on this design approach, but this is something which is just my guess. Regards, Bhupesh >>> > >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >>> > via a kernel command line parameter, "memmap=". >>> >>> memmap= is only used in old kexec-tools, now we are passing them via >>> e820 table. >> >> Thanks. I remember that you have explained it before. >> >> -Takahiro AKASHI >> >>> [snip] >>> >>> Thanks >>> Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-20 20:00 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-20 20:00 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 19 December 2017 at 07:09, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >>> > > <takahiro.akashi@linaro.org> wrote: >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >>> > > >> <takahiro.akashi@linaro.org> wrote: >>> > > >> > Bhupesh, Ard, >>> > > >> > >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >>> > > >> >> Hi Ard, Akashi >>> > > >> >> >>> > > >> > (snip) >>> > > >> > >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >>> > > >> >> , for details) >>> > > >> > >>> > > >> > Right. >>> > > >> > >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >>> > > >> >> with the crashkernel memory range: >>> > > >> >> >>> > > >> >> /* add linux,usable-memory-range */ >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >>> > > >> >> address_cells, size_cells); >>> > > >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >>> > > >> >> , for details) >>> > > >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >>> > > >> >> they are marked as System RAM or as RESERVED. As, >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >>> > > >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >>> > > >> >> ACPI memory and crashes while trying to access the same: >>> > > >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >>> > > >> >> -r`.img --reuse-cmdline -d >>> > > >> >> >>> > > >> >> [snip..] >>> > > >> >> >>> > > >> >> Reserved memory range >>> > > >> >> 000000000e800000-000000002e7fffff (0) >>> > > >> >> >>> > > >> >> Coredump memory ranges >>> > > >> >> 0000000000000000-000000000e7fffff (0) >>> > > >> >> 000000002e800000-000000003961ffff (0) >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >>> > > >> >> 000000a000000000-000000affbffffff (0) >>> > > >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >>> > > >> >> memory cap'ing passed to the crash kernel inside >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >>> > > >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >>> > > >> >> { >>> > > >> >> struct memblock_region reg = { >>> > > >> >> .size = 0, >>> > > >> >> }; >>> > > >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >>> > > >> >> >>> > > >> >> if (reg.size) >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >>> > > >> >> comment this out */ >>> > > >> >> } >>> > > >> > >>> > > >> > Please just don't do that. It can cause a fatal damage on >>> > > >> > memory contents of the *crashed* kernel. >>> > > >> > >>> > > >> >> 5). Both the above temporary solutions fix the problem. >>> > > >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >>> > > >> >> fail. >>> > > >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >>> > > >> >> dt node 'linux,usable-memory-range' >>> > > >> > >>> > > >> > I still don't understand why we need to carry over the information >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >>> > > >> > such regions are free to be reused by the kernel after some point of >>> > > >> > initialization. Why does crash dump kernel need to know about them? >>> > > >> > >>> > > >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >>> > > >> kernel, those regions needs to be preserved, which is why they are >>> > > >> memblock_reserve()'d now. >>> > > > >>> > > > For my better understandings, who is actually accessing such regions >>> > > > during boot time, uefi itself or efistub? >>> > > > >>> > > >>> > > No, only the kernel. This is where the ACPI tables are stored. For >>> > > instance, on QEMU we have >>> > > >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >>> > > 01000013) >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >>> > > BXPC 00000001) >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >>> > > BXPC 00000001) >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >>> > > BXPC 00000001) >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >>> > > BXPC 00000001) >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >>> > > BXPC 00000001) >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >>> > > BXPC 00000001) >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >>> > > BXPC 00000001) >>> > > >>> > > covered by >>> > > >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >>> > > ... >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >>> > >>> > OK. I mistakenly understood those regions could be freed after exiting >>> > UEFI boot services. >>> > >>> > > >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >>> > > >> when booting the next kernel. >>> > > > >>> > > > not really. >>> > > > >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >>> > > >> > on crash dump kernel?) >>> > > >> > >>> > > >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >>> > > >> regions only revealed the bug, not created it (given that other >>> > > >> memblock_reserve regions may be affected as well) >>> > > > >>> > > > As whether we should honor such reserved regions over kexec'ing >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >>> > > > As a matter of fact, no information about "reserved" memblocks is >>> > > > exposed to user space (via proc/iomem). >>> > > > >>> > > >>> > > That is why I suggested (somewhere in this thread?) to not expose them >>> > > as 'System RAM'. Do you think that could solve this? >>> > >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >>> > marking them under another name in /proc/iomem would also be good in order >>> > not to allocate them as part of crash kernel's memory. >>> > >>> > But I'm not still convinced that we should export them in useable- >>> > memory-range to crash dump kernel. They will be accessed through >>> > acpi_os_map_memory() and so won't be required to be part of system ram >>> > (or memblocks), I guess. >>> > -> Bhupesh? >>> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >>> the memory according to the efi memmap? For kdump kernel anything other >>> than usable memory (which is from the dt node instead) should be >>> reinitialized according to efi passed info, no? >> >> All the regions exported in efi memmap will be added to memblock.memory >> in (u)efi_init() and then trimmed down to the exact range specified as >> usable-memory-range by fdt_enforce_memory_region(). >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> with multiple entries in usable-memory-range. >> > > In any case, the root of the problem is that memory regions lose their > 'memory' annotation due to the way the memory map is mangled before > being supplied to the kexec kernel. > > Would it be possible to classify all memory that we want to hide from > the kexec kernel as NOMAP instead? That way, it will not be mapped > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > so this seems to be the most appropriate way to deal with the host > kernel's memory contents. Hmm. wouldn't appending the acpi reclaim regions to 'linux,usable-memory-range' in the dtb being passed to the crashkernel be better? Because its indirectly achieving a similar objective (although may be a subset of all System RAM regions on the primary kernel's memory). I am not aware of the background about the current kexec-tools implementation where we add only the crashkernel range to the dtb being passed to the crashkernel. Probably Akashi can answer better, as to how we arrived at this design approach and why we didn't want to expose all System RAM regions (i.e. ! NOMPAP regions) to the crashkernel. I am suspecting that some issues were seen/meet when the System RAM (! NOMAP regions) were exposed to the crashkernel, and that's why we finalized on this design approach, but this is something which is just my guess. Regards, Bhupesh >>> > >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >>> > via a kernel command line parameter, "memmap=". >>> >>> memmap= is only used in old kexec-tools, now we are passing them via >>> e820 table. >> >> Thanks. I remember that you have explained it before. >> >> -Takahiro AKASHI >> >>> [snip] >>> >>> Thanks >>> Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpOscbcBecWaC3Q9P22kheRYc+M2Ynfusszk14fPY-cJ5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-20 20:00 ` Bhupesh Sharma (?) @ 2017-12-21 10:34 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-21 10:34 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Bhupesh, Can you test the patch attached below, please? It is intended to retain already-reserved regions (ACPI reclaim memory in this case) in system ram (i.e. memblock.memory) without explicitly exporting them via usable-memory-range. (I still have to figure out what the side-effect of this patch is.) Thanks, -Takahiro AKASHI On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On 19 December 2017 at 07:09, AKASHI Takahiro > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >>> > > >> > Bhupesh, Ard, > >>> > > >> > > >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> > > >> >> Hi Ard, Akashi > >>> > > >> >> > >>> > > >> > (snip) > >>> > > >> > > >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> > > >> >> , for details) > >>> > > >> > > >>> > > >> > Right. > >>> > > >> > > >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> > > >> >> with the crashkernel memory range: > >>> > > >> >> > >>> > > >> >> /* add linux,usable-memory-range */ > >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> > > >> >> address_cells, size_cells); > >>> > > >> >> > >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> > > >> >> , for details) > >>> > > >> >> > >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> > > >> >> > >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> > > >> >> ACPI memory and crashes while trying to access the same: > >>> > > >> >> > >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> > > >> >> -r`.img --reuse-cmdline -d > >>> > > >> >> > >>> > > >> >> [snip..] > >>> > > >> >> > >>> > > >> >> Reserved memory range > >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >>> > > >> >> > >>> > > >> >> Coredump memory ranges > >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >>> > > >> >> 000000002e800000-000000003961ffff (0) > >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >>> > > >> >> 000000a000000000-000000affbffffff (0) > >>> > > >> >> > >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> > > >> >> memory cap'ing passed to the crash kernel inside > >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >>> > > >> >> > >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >>> > > >> >> { > >>> > > >> >> struct memblock_region reg = { > >>> > > >> >> .size = 0, > >>> > > >> >> }; > >>> > > >> >> > >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> > > >> >> > >>> > > >> >> if (reg.size) > >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> > > >> >> comment this out */ > >>> > > >> >> } > >>> > > >> > > >>> > > >> > Please just don't do that. It can cause a fatal damage on > >>> > > >> > memory contents of the *crashed* kernel. > >>> > > >> > > >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >>> > > >> >> > >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> > > >> >> fail. > >>> > > >> >> > >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> > > >> >> dt node 'linux,usable-memory-range' > >>> > > >> > > >>> > > >> > I still don't understand why we need to carry over the information > >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> > > >> > such regions are free to be reused by the kernel after some point of > >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >>> > > >> > > >>> > > >> > >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >>> > > >> kernel, those regions needs to be preserved, which is why they are > >>> > > >> memblock_reserve()'d now. > >>> > > > > >>> > > > For my better understandings, who is actually accessing such regions > >>> > > > during boot time, uefi itself or efistub? > >>> > > > > >>> > > > >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >>> > > instance, on QEMU we have > >>> > > > >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> > > 01000013) > >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> > > BXPC 00000001) > >>> > > > >>> > > covered by > >>> > > > >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> > > ... > >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >>> > > >>> > OK. I mistakenly understood those regions could be freed after exiting > >>> > UEFI boot services. > >>> > > >>> > > > >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >>> > > >> when booting the next kernel. > >>> > > > > >>> > > > not really. > >>> > > > > >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >>> > > >> > on crash dump kernel?) > >>> > > >> > > >>> > > >> > >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >>> > > >> regions only revealed the bug, not created it (given that other > >>> > > >> memblock_reserve regions may be affected as well) > >>> > > > > >>> > > > As whether we should honor such reserved regions over kexec'ing > >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >>> > > > As a matter of fact, no information about "reserved" memblocks is > >>> > > > exposed to user space (via proc/iomem). > >>> > > > > >>> > > > >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >>> > > as 'System RAM'. Do you think that could solve this? > >>> > > >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >>> > marking them under another name in /proc/iomem would also be good in order > >>> > not to allocate them as part of crash kernel's memory. > >>> > > >>> > But I'm not still convinced that we should export them in useable- > >>> > memory-range to crash dump kernel. They will be accessed through > >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >>> > (or memblocks), I guess. > >>> > -> Bhupesh? > >>> > >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >>> the memory according to the efi memmap? For kdump kernel anything other > >>> than usable memory (which is from the dt node instead) should be > >>> reinitialized according to efi passed info, no? > >> > >> All the regions exported in efi memmap will be added to memblock.memory > >> in (u)efi_init() and then trimmed down to the exact range specified as > >> usable-memory-range by fdt_enforce_memory_region(). > >> > >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> with multiple entries in usable-memory-range. > >> > > > > In any case, the root of the problem is that memory regions lose their > > 'memory' annotation due to the way the memory map is mangled before > > being supplied to the kexec kernel. > > > > Would it be possible to classify all memory that we want to hide from > > the kexec kernel as NOMAP instead? That way, it will not be mapped > > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > > so this seems to be the most appropriate way to deal with the host > > kernel's memory contents. > > Hmm. wouldn't appending the acpi reclaim regions to > 'linux,usable-memory-range' in the dtb being passed to the crashkernel > be better? Because its indirectly achieving a similar objective > (although may be a subset of all System RAM regions on the primary > kernel's memory). > > I am not aware of the background about the current kexec-tools > implementation where we add only the crashkernel range to the dtb > being passed to the crashkernel. > > Probably Akashi can answer better, as to how we arrived at this design > approach and why we didn't want to expose all System RAM regions (i.e. > ! NOMPAP regions) to the crashkernel. > > I am suspecting that some issues were seen/meet when the System RAM (! > NOMAP regions) were exposed to the crashkernel, and that's why we > finalized on this design approach, but this is something which is just > my guess. > > Regards, > Bhupesh > > >>> > > >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >>> > via a kernel command line parameter, "memmap=". > >>> > >>> memmap= is only used in old kexec-tools, now we are passing them via > >>> e820 table. > >> > >> Thanks. I remember that you have explained it before. > >> > >> -Takahiro AKASHI > >> > >>> [snip] > >>> > >>> Thanks > >>> Dave ===8<== >From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> Date: Thu, 21 Dec 2017 19:14:23 +0900 Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP --- arch/arm64/mm/init.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 00e7b900ca41..8175db94257b 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) struct memblock_region reg = { .size = 0, }; + u64 idx; + phys_addr_t start, end; of_scan_flat_dt(early_init_dt_scan_usablemem, ®); - if (reg.size) - memblock_cap_memory_range(reg.base, reg.size); + if (reg.size) { + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, + &start, &end, NULL) + memblock_mark_nomap(start, end - start); + memblock_clear_nomap(reg.base, reg.size); + } } void __init arm64_memblock_init(void) -- 2.15.1 ^ permalink raw reply related [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-21 10:34 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-21 10:34 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel Bhupesh, Can you test the patch attached below, please? It is intended to retain already-reserved regions (ACPI reclaim memory in this case) in system ram (i.e. memblock.memory) without explicitly exporting them via usable-memory-range. (I still have to figure out what the side-effect of this patch is.) Thanks, -Takahiro AKASHI On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 19 December 2017 at 07:09, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >>> > > <takahiro.akashi@linaro.org> wrote: > >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> > > >> <takahiro.akashi@linaro.org> wrote: > >>> > > >> > Bhupesh, Ard, > >>> > > >> > > >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> > > >> >> Hi Ard, Akashi > >>> > > >> >> > >>> > > >> > (snip) > >>> > > >> > > >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> > > >> >> , for details) > >>> > > >> > > >>> > > >> > Right. > >>> > > >> > > >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> > > >> >> with the crashkernel memory range: > >>> > > >> >> > >>> > > >> >> /* add linux,usable-memory-range */ > >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> > > >> >> address_cells, size_cells); > >>> > > >> >> > >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> > > >> >> , for details) > >>> > > >> >> > >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> > > >> >> > >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> > > >> >> ACPI memory and crashes while trying to access the same: > >>> > > >> >> > >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> > > >> >> -r`.img --reuse-cmdline -d > >>> > > >> >> > >>> > > >> >> [snip..] > >>> > > >> >> > >>> > > >> >> Reserved memory range > >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >>> > > >> >> > >>> > > >> >> Coredump memory ranges > >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >>> > > >> >> 000000002e800000-000000003961ffff (0) > >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >>> > > >> >> 000000a000000000-000000affbffffff (0) > >>> > > >> >> > >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> > > >> >> memory cap'ing passed to the crash kernel inside > >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >>> > > >> >> > >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >>> > > >> >> { > >>> > > >> >> struct memblock_region reg = { > >>> > > >> >> .size = 0, > >>> > > >> >> }; > >>> > > >> >> > >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> > > >> >> > >>> > > >> >> if (reg.size) > >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> > > >> >> comment this out */ > >>> > > >> >> } > >>> > > >> > > >>> > > >> > Please just don't do that. It can cause a fatal damage on > >>> > > >> > memory contents of the *crashed* kernel. > >>> > > >> > > >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >>> > > >> >> > >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> > > >> >> fail. > >>> > > >> >> > >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> > > >> >> dt node 'linux,usable-memory-range' > >>> > > >> > > >>> > > >> > I still don't understand why we need to carry over the information > >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> > > >> > such regions are free to be reused by the kernel after some point of > >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >>> > > >> > > >>> > > >> > >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >>> > > >> kernel, those regions needs to be preserved, which is why they are > >>> > > >> memblock_reserve()'d now. > >>> > > > > >>> > > > For my better understandings, who is actually accessing such regions > >>> > > > during boot time, uefi itself or efistub? > >>> > > > > >>> > > > >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >>> > > instance, on QEMU we have > >>> > > > >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> > > 01000013) > >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> > > BXPC 00000001) > >>> > > > >>> > > covered by > >>> > > > >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> > > ... > >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >>> > > >>> > OK. I mistakenly understood those regions could be freed after exiting > >>> > UEFI boot services. > >>> > > >>> > > > >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >>> > > >> when booting the next kernel. > >>> > > > > >>> > > > not really. > >>> > > > > >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >>> > > >> > on crash dump kernel?) > >>> > > >> > > >>> > > >> > >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >>> > > >> regions only revealed the bug, not created it (given that other > >>> > > >> memblock_reserve regions may be affected as well) > >>> > > > > >>> > > > As whether we should honor such reserved regions over kexec'ing > >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >>> > > > As a matter of fact, no information about "reserved" memblocks is > >>> > > > exposed to user space (via proc/iomem). > >>> > > > > >>> > > > >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >>> > > as 'System RAM'. Do you think that could solve this? > >>> > > >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >>> > marking them under another name in /proc/iomem would also be good in order > >>> > not to allocate them as part of crash kernel's memory. > >>> > > >>> > But I'm not still convinced that we should export them in useable- > >>> > memory-range to crash dump kernel. They will be accessed through > >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >>> > (or memblocks), I guess. > >>> > -> Bhupesh? > >>> > >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >>> the memory according to the efi memmap? For kdump kernel anything other > >>> than usable memory (which is from the dt node instead) should be > >>> reinitialized according to efi passed info, no? > >> > >> All the regions exported in efi memmap will be added to memblock.memory > >> in (u)efi_init() and then trimmed down to the exact range specified as > >> usable-memory-range by fdt_enforce_memory_region(). > >> > >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> with multiple entries in usable-memory-range. > >> > > > > In any case, the root of the problem is that memory regions lose their > > 'memory' annotation due to the way the memory map is mangled before > > being supplied to the kexec kernel. > > > > Would it be possible to classify all memory that we want to hide from > > the kexec kernel as NOMAP instead? That way, it will not be mapped > > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > > so this seems to be the most appropriate way to deal with the host > > kernel's memory contents. > > Hmm. wouldn't appending the acpi reclaim regions to > 'linux,usable-memory-range' in the dtb being passed to the crashkernel > be better? Because its indirectly achieving a similar objective > (although may be a subset of all System RAM regions on the primary > kernel's memory). > > I am not aware of the background about the current kexec-tools > implementation where we add only the crashkernel range to the dtb > being passed to the crashkernel. > > Probably Akashi can answer better, as to how we arrived at this design > approach and why we didn't want to expose all System RAM regions (i.e. > ! NOMPAP regions) to the crashkernel. > > I am suspecting that some issues were seen/meet when the System RAM (! > NOMAP regions) were exposed to the crashkernel, and that's why we > finalized on this design approach, but this is something which is just > my guess. > > Regards, > Bhupesh > > >>> > > >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >>> > via a kernel command line parameter, "memmap=". > >>> > >>> memmap= is only used in old kexec-tools, now we are passing them via > >>> e820 table. > >> > >> Thanks. I remember that you have explained it before. > >> > >> -Takahiro AKASHI > >> > >>> [snip] > >>> > >>> Thanks > >>> Dave ===8<== From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 From: AKASHI Takahiro <takahiro.akashi@linaro.org> Date: Thu, 21 Dec 2017 19:14:23 +0900 Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP --- arch/arm64/mm/init.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 00e7b900ca41..8175db94257b 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) struct memblock_region reg = { .size = 0, }; + u64 idx; + phys_addr_t start, end; of_scan_flat_dt(early_init_dt_scan_usablemem, ®); - if (reg.size) - memblock_cap_memory_range(reg.base, reg.size); + if (reg.size) { + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, + &start, &end, NULL) + memblock_mark_nomap(start, end - start); + memblock_clear_nomap(reg.base, reg.size); + } } void __init arm64_memblock_init(void) -- 2.15.1 _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-21 10:34 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-21 10:34 UTC (permalink / raw) To: linux-arm-kernel Bhupesh, Can you test the patch attached below, please? It is intended to retain already-reserved regions (ACPI reclaim memory in this case) in system ram (i.e. memblock.memory) without explicitly exporting them via usable-memory-range. (I still have to figure out what the side-effect of this patch is.) Thanks, -Takahiro AKASHI On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On 19 December 2017 at 07:09, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >>> > > <takahiro.akashi@linaro.org> wrote: > >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >>> > > >> <takahiro.akashi@linaro.org> wrote: > >>> > > >> > Bhupesh, Ard, > >>> > > >> > > >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >>> > > >> >> Hi Ard, Akashi > >>> > > >> >> > >>> > > >> > (snip) > >>> > > >> > > >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >>> > > >> >> , for details) > >>> > > >> > > >>> > > >> > Right. > >>> > > >> > > >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >>> > > >> >> with the crashkernel memory range: > >>> > > >> >> > >>> > > >> >> /* add linux,usable-memory-range */ > >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >>> > > >> >> address_cells, size_cells); > >>> > > >> >> > >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >>> > > >> >> , for details) > >>> > > >> >> > >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >>> > > >> >> > >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >>> > > >> >> ACPI memory and crashes while trying to access the same: > >>> > > >> >> > >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >>> > > >> >> -r`.img --reuse-cmdline -d > >>> > > >> >> > >>> > > >> >> [snip..] > >>> > > >> >> > >>> > > >> >> Reserved memory range > >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >>> > > >> >> > >>> > > >> >> Coredump memory ranges > >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >>> > > >> >> 000000002e800000-000000003961ffff (0) > >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >>> > > >> >> 000000a000000000-000000affbffffff (0) > >>> > > >> >> > >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >>> > > >> >> memory cap'ing passed to the crash kernel inside > >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >>> > > >> >> > >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >>> > > >> >> { > >>> > > >> >> struct memblock_region reg = { > >>> > > >> >> .size = 0, > >>> > > >> >> }; > >>> > > >> >> > >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >>> > > >> >> > >>> > > >> >> if (reg.size) > >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >>> > > >> >> comment this out */ > >>> > > >> >> } > >>> > > >> > > >>> > > >> > Please just don't do that. It can cause a fatal damage on > >>> > > >> > memory contents of the *crashed* kernel. > >>> > > >> > > >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >>> > > >> >> > >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >>> > > >> >> fail. > >>> > > >> >> > >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >>> > > >> >> dt node 'linux,usable-memory-range' > >>> > > >> > > >>> > > >> > I still don't understand why we need to carry over the information > >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >>> > > >> > such regions are free to be reused by the kernel after some point of > >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >>> > > >> > > >>> > > >> > >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >>> > > >> kernel, those regions needs to be preserved, which is why they are > >>> > > >> memblock_reserve()'d now. > >>> > > > > >>> > > > For my better understandings, who is actually accessing such regions > >>> > > > during boot time, uefi itself or efistub? > >>> > > > > >>> > > > >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >>> > > instance, on QEMU we have > >>> > > > >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >>> > > 01000013) > >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >>> > > BXPC 00000001) > >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >>> > > BXPC 00000001) > >>> > > > >>> > > covered by > >>> > > > >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >>> > > ... > >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >>> > > >>> > OK. I mistakenly understood those regions could be freed after exiting > >>> > UEFI boot services. > >>> > > >>> > > > >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >>> > > >> when booting the next kernel. > >>> > > > > >>> > > > not really. > >>> > > > > >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >>> > > >> > on crash dump kernel?) > >>> > > >> > > >>> > > >> > >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >>> > > >> regions only revealed the bug, not created it (given that other > >>> > > >> memblock_reserve regions may be affected as well) > >>> > > > > >>> > > > As whether we should honor such reserved regions over kexec'ing > >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >>> > > > As a matter of fact, no information about "reserved" memblocks is > >>> > > > exposed to user space (via proc/iomem). > >>> > > > > >>> > > > >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >>> > > as 'System RAM'. Do you think that could solve this? > >>> > > >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >>> > marking them under another name in /proc/iomem would also be good in order > >>> > not to allocate them as part of crash kernel's memory. > >>> > > >>> > But I'm not still convinced that we should export them in useable- > >>> > memory-range to crash dump kernel. They will be accessed through > >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >>> > (or memblocks), I guess. > >>> > -> Bhupesh? > >>> > >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >>> the memory according to the efi memmap? For kdump kernel anything other > >>> than usable memory (which is from the dt node instead) should be > >>> reinitialized according to efi passed info, no? > >> > >> All the regions exported in efi memmap will be added to memblock.memory > >> in (u)efi_init() and then trimmed down to the exact range specified as > >> usable-memory-range by fdt_enforce_memory_region(). > >> > >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> with multiple entries in usable-memory-range. > >> > > > > In any case, the root of the problem is that memory regions lose their > > 'memory' annotation due to the way the memory map is mangled before > > being supplied to the kexec kernel. > > > > Would it be possible to classify all memory that we want to hide from > > the kexec kernel as NOMAP instead? That way, it will not be mapped > > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > > so this seems to be the most appropriate way to deal with the host > > kernel's memory contents. > > Hmm. wouldn't appending the acpi reclaim regions to > 'linux,usable-memory-range' in the dtb being passed to the crashkernel > be better? Because its indirectly achieving a similar objective > (although may be a subset of all System RAM regions on the primary > kernel's memory). > > I am not aware of the background about the current kexec-tools > implementation where we add only the crashkernel range to the dtb > being passed to the crashkernel. > > Probably Akashi can answer better, as to how we arrived at this design > approach and why we didn't want to expose all System RAM regions (i.e. > ! NOMPAP regions) to the crashkernel. > > I am suspecting that some issues were seen/meet when the System RAM (! > NOMAP regions) were exposed to the crashkernel, and that's why we > finalized on this design approach, but this is something which is just > my guess. > > Regards, > Bhupesh > > >>> > > >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >>> > via a kernel command line parameter, "memmap=". > >>> > >>> memmap= is only used in old kexec-tools, now we are passing them via > >>> e820 table. > >> > >> Thanks. I remember that you have explained it before. > >> > >> -Takahiro AKASHI > >> > >>> [snip] > >>> > >>> Thanks > >>> Dave ===8<== >From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 From: AKASHI Takahiro <takahiro.akashi@linaro.org> Date: Thu, 21 Dec 2017 19:14:23 +0900 Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP --- arch/arm64/mm/init.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 00e7b900ca41..8175db94257b 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) struct memblock_region reg = { .size = 0, }; + u64 idx; + phys_addr_t start, end; of_scan_flat_dt(early_init_dt_scan_usablemem, ®); - if (reg.size) - memblock_cap_memory_range(reg.base, reg.size); + if (reg.size) { + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, + &start, &end, NULL) + memblock_mark_nomap(start, end - start); + memblock_clear_nomap(reg.base, reg.size); + } } void __init arm64_memblock_init(void) -- 2.15.1 ^ permalink raw reply related [flat|nested] 135+ messages in thread
[parent not found: <20171221103440.GJ28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-21 10:34 ` AKASHI Takahiro (?) @ 2017-12-21 12:06 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-21 12:06 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Hello Akashi, On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > Bhupesh, > > Can you test the patch attached below, please? > > It is intended to retain already-reserved regions (ACPI reclaim memory > in this case) in system ram (i.e. memblock.memory) without explicitly > exporting them via usable-memory-range. > (I still have to figure out what the side-effect of this patch is.) > > Thanks, > -Takahiro AKASHI > > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >>> > > >> > Bhupesh, Ard, >> >>> > > >> > >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> > > >> >> Hi Ard, Akashi >> >>> > > >> >> >> >>> > > >> > (snip) >> >>> > > >> > >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> > > >> >> , for details) >> >>> > > >> > >> >>> > > >> > Right. >> >>> > > >> > >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> > > >> >> with the crashkernel memory range: >> >>> > > >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> > > >> >> address_cells, size_cells); >> >>> > > >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> > > >> >> , for details) >> >>> > > >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> > > >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >>> > > >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >>> > > >> >> >> >>> > > >> >> [snip..] >> >>> > > >> >> >> >>> > > >> >> Reserved memory range >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >>> > > >> >> >> >>> > > >> >> Coredump memory ranges >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >>> > > >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >>> > > >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >>> > > >> >> { >> >>> > > >> >> struct memblock_region reg = { >> >>> > > >> >> .size = 0, >> >>> > > >> >> }; >> >>> > > >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> > > >> >> >> >>> > > >> >> if (reg.size) >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> > > >> >> comment this out */ >> >>> > > >> >> } >> >>> > > >> > >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >>> > > >> > memory contents of the *crashed* kernel. >> >>> > > >> > >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >>> > > >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> > > >> >> fail. >> >>> > > >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >>> > > >> > >> >>> > > >> > I still don't understand why we need to carry over the information >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >>> > > >> > >> >>> > > >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >>> > > >> memblock_reserve()'d now. >> >>> > > > >> >>> > > > For my better understandings, who is actually accessing such regions >> >>> > > > during boot time, uefi itself or efistub? >> >>> > > > >> >>> > > >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >>> > > instance, on QEMU we have >> >>> > > >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> > > 01000013) >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> > > BXPC 00000001) >> >>> > > >> >>> > > covered by >> >>> > > >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> > > ... >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >>> > >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >>> > UEFI boot services. >> >>> > >> >>> > > >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >>> > > >> when booting the next kernel. >> >>> > > > >> >>> > > > not really. >> >>> > > > >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> > > >> > on crash dump kernel?) >> >>> > > >> > >> >>> > > >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >>> > > >> regions only revealed the bug, not created it (given that other >> >>> > > >> memblock_reserve regions may be affected as well) >> >>> > > > >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >>> > > > exposed to user space (via proc/iomem). >> >>> > > > >> >>> > > >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >>> > > as 'System RAM'. Do you think that could solve this? >> >>> > >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >>> > marking them under another name in /proc/iomem would also be good in order >> >>> > not to allocate them as part of crash kernel's memory. >> >>> > >> >>> > But I'm not still convinced that we should export them in useable- >> >>> > memory-range to crash dump kernel. They will be accessed through >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >>> > (or memblocks), I guess. >> >>> > -> Bhupesh? >> >>> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >>> than usable memory (which is from the dt node instead) should be >> >>> reinitialized according to efi passed info, no? >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> with multiple entries in usable-memory-range. >> >> >> > >> > In any case, the root of the problem is that memory regions lose their >> > 'memory' annotation due to the way the memory map is mangled before >> > being supplied to the kexec kernel. >> > >> > Would it be possible to classify all memory that we want to hide from >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> > so this seems to be the most appropriate way to deal with the host >> > kernel's memory contents. >> >> Hmm. wouldn't appending the acpi reclaim regions to >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> be better? Because its indirectly achieving a similar objective >> (although may be a subset of all System RAM regions on the primary >> kernel's memory). >> >> I am not aware of the background about the current kexec-tools >> implementation where we add only the crashkernel range to the dtb >> being passed to the crashkernel. >> >> Probably Akashi can answer better, as to how we arrived at this design >> approach and why we didn't want to expose all System RAM regions (i.e. >> ! NOMPAP regions) to the crashkernel. >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> NOMAP regions) were exposed to the crashkernel, and that's why we >> finalized on this design approach, but this is something which is just >> my guess. >> >> Regards, >> Bhupesh >> >> >>> > >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >>> > via a kernel command line parameter, "memmap=". >> >>> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >>> e820 table. >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> -Takahiro AKASHI >> >> >> >>> [snip] >> >>> >> >>> Thanks >> >>> Dave > > ===8<== > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > > --- > arch/arm64/mm/init.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 00e7b900ca41..8175db94257b 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > struct memblock_region reg = { > .size = 0, > }; > + u64 idx; > + phys_addr_t start, end; > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > - if (reg.size) > - memblock_cap_memory_range(reg.base, reg.size); > + if (reg.size) { > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > + &start, &end, NULL) > + memblock_mark_nomap(start, end - start); > + memblock_clear_nomap(reg.base, reg.size); > + } > } > > void __init arm64_memblock_init(void) > -- > 2.15.1 > Thanks for the patch. After applying this on top of 4.15.0-rc4-next-20171220, there seems to be a improvement and the crashkernel boot no longer hangs while trying to access the acpi tables. However I notice a minor issue. Please see the log below for reference, the following message keeps spamming the console but I see the crashkernel boot proceed further.: [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] [ 0.000000] NUMA: NODE_DATA(1) on node 0 [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] [ 0.000000] NUMA: NODE_DATA(2) on node 0 [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] [ 0.000000] NUMA: NODE_DATA(3) on node 0 [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode page_structs [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode page_structs [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode page_structs [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode page_structs [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode page_structs [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode page_structs [snip..] [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode page_structs This WARNING message seems to come from vmemmap_verify() inside 'mm/sparse-vmemmap.c' Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-21 12:06 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-21 12:06 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec Hello Akashi, On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > Can you test the patch attached below, please? > > It is intended to retain already-reserved regions (ACPI reclaim memory > in this case) in system ram (i.e. memblock.memory) without explicitly > exporting them via usable-memory-range. > (I still have to figure out what the side-effect of this patch is.) > > Thanks, > -Takahiro AKASHI > > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> > > <takahiro.akashi@linaro.org> wrote: >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> > > >> <takahiro.akashi@linaro.org> wrote: >> >>> > > >> > Bhupesh, Ard, >> >>> > > >> > >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> > > >> >> Hi Ard, Akashi >> >>> > > >> >> >> >>> > > >> > (snip) >> >>> > > >> > >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> > > >> >> , for details) >> >>> > > >> > >> >>> > > >> > Right. >> >>> > > >> > >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> > > >> >> with the crashkernel memory range: >> >>> > > >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> > > >> >> address_cells, size_cells); >> >>> > > >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> > > >> >> , for details) >> >>> > > >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> > > >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >>> > > >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >>> > > >> >> >> >>> > > >> >> [snip..] >> >>> > > >> >> >> >>> > > >> >> Reserved memory range >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >>> > > >> >> >> >>> > > >> >> Coredump memory ranges >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >>> > > >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >>> > > >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >>> > > >> >> { >> >>> > > >> >> struct memblock_region reg = { >> >>> > > >> >> .size = 0, >> >>> > > >> >> }; >> >>> > > >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> > > >> >> >> >>> > > >> >> if (reg.size) >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> > > >> >> comment this out */ >> >>> > > >> >> } >> >>> > > >> > >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >>> > > >> > memory contents of the *crashed* kernel. >> >>> > > >> > >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >>> > > >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> > > >> >> fail. >> >>> > > >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >>> > > >> > >> >>> > > >> > I still don't understand why we need to carry over the information >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >>> > > >> > >> >>> > > >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >>> > > >> memblock_reserve()'d now. >> >>> > > > >> >>> > > > For my better understandings, who is actually accessing such regions >> >>> > > > during boot time, uefi itself or efistub? >> >>> > > > >> >>> > > >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >>> > > instance, on QEMU we have >> >>> > > >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> > > 01000013) >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> > > BXPC 00000001) >> >>> > > >> >>> > > covered by >> >>> > > >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> > > ... >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >>> > >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >>> > UEFI boot services. >> >>> > >> >>> > > >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >>> > > >> when booting the next kernel. >> >>> > > > >> >>> > > > not really. >> >>> > > > >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> > > >> > on crash dump kernel?) >> >>> > > >> > >> >>> > > >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >>> > > >> regions only revealed the bug, not created it (given that other >> >>> > > >> memblock_reserve regions may be affected as well) >> >>> > > > >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >>> > > > exposed to user space (via proc/iomem). >> >>> > > > >> >>> > > >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >>> > > as 'System RAM'. Do you think that could solve this? >> >>> > >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >>> > marking them under another name in /proc/iomem would also be good in order >> >>> > not to allocate them as part of crash kernel's memory. >> >>> > >> >>> > But I'm not still convinced that we should export them in useable- >> >>> > memory-range to crash dump kernel. They will be accessed through >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >>> > (or memblocks), I guess. >> >>> > -> Bhupesh? >> >>> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >>> than usable memory (which is from the dt node instead) should be >> >>> reinitialized according to efi passed info, no? >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> with multiple entries in usable-memory-range. >> >> >> > >> > In any case, the root of the problem is that memory regions lose their >> > 'memory' annotation due to the way the memory map is mangled before >> > being supplied to the kexec kernel. >> > >> > Would it be possible to classify all memory that we want to hide from >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> > so this seems to be the most appropriate way to deal with the host >> > kernel's memory contents. >> >> Hmm. wouldn't appending the acpi reclaim regions to >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> be better? Because its indirectly achieving a similar objective >> (although may be a subset of all System RAM regions on the primary >> kernel's memory). >> >> I am not aware of the background about the current kexec-tools >> implementation where we add only the crashkernel range to the dtb >> being passed to the crashkernel. >> >> Probably Akashi can answer better, as to how we arrived at this design >> approach and why we didn't want to expose all System RAM regions (i.e. >> ! NOMPAP regions) to the crashkernel. >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> NOMAP regions) were exposed to the crashkernel, and that's why we >> finalized on this design approach, but this is something which is just >> my guess. >> >> Regards, >> Bhupesh >> >> >>> > >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >>> > via a kernel command line parameter, "memmap=". >> >>> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >>> e820 table. >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> -Takahiro AKASHI >> >> >> >>> [snip] >> >>> >> >>> Thanks >> >>> Dave > > ===8<== > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > > --- > arch/arm64/mm/init.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 00e7b900ca41..8175db94257b 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > struct memblock_region reg = { > .size = 0, > }; > + u64 idx; > + phys_addr_t start, end; > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > - if (reg.size) > - memblock_cap_memory_range(reg.base, reg.size); > + if (reg.size) { > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > + &start, &end, NULL) > + memblock_mark_nomap(start, end - start); > + memblock_clear_nomap(reg.base, reg.size); > + } > } > > void __init arm64_memblock_init(void) > -- > 2.15.1 > Thanks for the patch. After applying this on top of 4.15.0-rc4-next-20171220, there seems to be a improvement and the crashkernel boot no longer hangs while trying to access the acpi tables. However I notice a minor issue. Please see the log below for reference, the following message keeps spamming the console but I see the crashkernel boot proceed further.: [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] [ 0.000000] NUMA: NODE_DATA(1) on node 0 [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] [ 0.000000] NUMA: NODE_DATA(2) on node 0 [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] [ 0.000000] NUMA: NODE_DATA(3) on node 0 [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode page_structs [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode page_structs [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode page_structs [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode page_structs [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode page_structs [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode page_structs [snip..] [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode page_structs This WARNING message seems to come from vmemmap_verify() inside 'mm/sparse-vmemmap.c' Regards, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-21 12:06 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-21 12:06 UTC (permalink / raw) To: linux-arm-kernel Hello Akashi, On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > Can you test the patch attached below, please? > > It is intended to retain already-reserved regions (ACPI reclaim memory > in this case) in system ram (i.e. memblock.memory) without explicitly > exporting them via usable-memory-range. > (I still have to figure out what the side-effect of this patch is.) > > Thanks, > -Takahiro AKASHI > > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >>> > > <takahiro.akashi@linaro.org> wrote: >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >>> > > >> <takahiro.akashi@linaro.org> wrote: >> >>> > > >> > Bhupesh, Ard, >> >>> > > >> > >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >>> > > >> >> Hi Ard, Akashi >> >>> > > >> >> >> >>> > > >> > (snip) >> >>> > > >> > >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >>> > > >> >> , for details) >> >>> > > >> > >> >>> > > >> > Right. >> >>> > > >> > >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >>> > > >> >> with the crashkernel memory range: >> >>> > > >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >>> > > >> >> address_cells, size_cells); >> >>> > > >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >>> > > >> >> , for details) >> >>> > > >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >>> > > >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >>> > > >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >>> > > >> >> >> >>> > > >> >> [snip..] >> >>> > > >> >> >> >>> > > >> >> Reserved memory range >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >>> > > >> >> >> >>> > > >> >> Coredump memory ranges >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >>> > > >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >>> > > >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >>> > > >> >> { >> >>> > > >> >> struct memblock_region reg = { >> >>> > > >> >> .size = 0, >> >>> > > >> >> }; >> >>> > > >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >>> > > >> >> >> >>> > > >> >> if (reg.size) >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >>> > > >> >> comment this out */ >> >>> > > >> >> } >> >>> > > >> > >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >>> > > >> > memory contents of the *crashed* kernel. >> >>> > > >> > >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >>> > > >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >>> > > >> >> fail. >> >>> > > >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >>> > > >> > >> >>> > > >> > I still don't understand why we need to carry over the information >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >>> > > >> > >> >>> > > >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >>> > > >> memblock_reserve()'d now. >> >>> > > > >> >>> > > > For my better understandings, who is actually accessing such regions >> >>> > > > during boot time, uefi itself or efistub? >> >>> > > > >> >>> > > >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >>> > > instance, on QEMU we have >> >>> > > >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >>> > > 01000013) >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >>> > > BXPC 00000001) >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >>> > > BXPC 00000001) >> >>> > > >> >>> > > covered by >> >>> > > >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >>> > > ... >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >>> > >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >>> > UEFI boot services. >> >>> > >> >>> > > >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >>> > > >> when booting the next kernel. >> >>> > > > >> >>> > > > not really. >> >>> > > > >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >>> > > >> > on crash dump kernel?) >> >>> > > >> > >> >>> > > >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >>> > > >> regions only revealed the bug, not created it (given that other >> >>> > > >> memblock_reserve regions may be affected as well) >> >>> > > > >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >>> > > > exposed to user space (via proc/iomem). >> >>> > > > >> >>> > > >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >>> > > as 'System RAM'. Do you think that could solve this? >> >>> > >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >>> > marking them under another name in /proc/iomem would also be good in order >> >>> > not to allocate them as part of crash kernel's memory. >> >>> > >> >>> > But I'm not still convinced that we should export them in useable- >> >>> > memory-range to crash dump kernel. They will be accessed through >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >>> > (or memblocks), I guess. >> >>> > -> Bhupesh? >> >>> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >>> than usable memory (which is from the dt node instead) should be >> >>> reinitialized according to efi passed info, no? >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> with multiple entries in usable-memory-range. >> >> >> > >> > In any case, the root of the problem is that memory regions lose their >> > 'memory' annotation due to the way the memory map is mangled before >> > being supplied to the kexec kernel. >> > >> > Would it be possible to classify all memory that we want to hide from >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> > so this seems to be the most appropriate way to deal with the host >> > kernel's memory contents. >> >> Hmm. wouldn't appending the acpi reclaim regions to >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> be better? Because its indirectly achieving a similar objective >> (although may be a subset of all System RAM regions on the primary >> kernel's memory). >> >> I am not aware of the background about the current kexec-tools >> implementation where we add only the crashkernel range to the dtb >> being passed to the crashkernel. >> >> Probably Akashi can answer better, as to how we arrived at this design >> approach and why we didn't want to expose all System RAM regions (i.e. >> ! NOMPAP regions) to the crashkernel. >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> NOMAP regions) were exposed to the crashkernel, and that's why we >> finalized on this design approach, but this is something which is just >> my guess. >> >> Regards, >> Bhupesh >> >> >>> > >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >>> > via a kernel command line parameter, "memmap=". >> >>> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >>> e820 table. >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> -Takahiro AKASHI >> >> >> >>> [snip] >> >>> >> >>> Thanks >> >>> Dave > > ===8<== > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > > --- > arch/arm64/mm/init.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 00e7b900ca41..8175db94257b 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > struct memblock_region reg = { > .size = 0, > }; > + u64 idx; > + phys_addr_t start, end; > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > - if (reg.size) > - memblock_cap_memory_range(reg.base, reg.size); > + if (reg.size) { > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > + &start, &end, NULL) > + memblock_mark_nomap(start, end - start); > + memblock_clear_nomap(reg.base, reg.size); > + } > } > > void __init arm64_memblock_init(void) > -- > 2.15.1 > Thanks for the patch. After applying this on top of 4.15.0-rc4-next-20171220, there seems to be a improvement and the crashkernel boot no longer hangs while trying to access the acpi tables. However I notice a minor issue. Please see the log below for reference, the following message keeps spamming the console but I see the crashkernel boot proceed further.: [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] [ 0.000000] NUMA: NODE_DATA(1) on node 0 [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] [ 0.000000] NUMA: NODE_DATA(2) on node 0 [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] [ 0.000000] NUMA: NODE_DATA(3) on node 0 [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode page_structs [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode page_structs [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode page_structs [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode page_structs [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode page_structs [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode page_structs [snip..] [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode page_structs This WARNING message seems to come from vmemmap_verify() inside 'mm/sparse-vmemmap.c' Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpMUnUKxiALAHW9_PE2RYC8GNWLPGpdJ5ca53g=v3rNkfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-21 12:06 ` Bhupesh Sharma (?) @ 2017-12-22 8:33 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-22 8:33 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > Hello Akashi, > > On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > Bhupesh, > > > > Can you test the patch attached below, please? > > > > It is intended to retain already-reserved regions (ACPI reclaim memory > > in this case) in system ram (i.e. memblock.memory) without explicitly > > exporting them via usable-memory-range. > > (I still have to figure out what the side-effect of this patch is.) > > > > Thanks, > > -Takahiro AKASHI > > > > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >>> > > >> > Bhupesh, Ard, > >> >>> > > >> > > >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> > > >> >> Hi Ard, Akashi > >> >>> > > >> >> > >> >>> > > >> > (snip) > >> >>> > > >> > > >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> > > >> >> , for details) > >> >>> > > >> > > >> >>> > > >> > Right. > >> >>> > > >> > > >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> > > >> >> with the crashkernel memory range: > >> >>> > > >> >> > >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> > > >> >> address_cells, size_cells); > >> >>> > > >> >> > >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> > > >> >> , for details) > >> >>> > > >> >> > >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> > > >> >> > >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >>> > > >> >> > >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >>> > > >> >> > >> >>> > > >> >> [snip..] > >> >>> > > >> >> > >> >>> > > >> >> Reserved memory range > >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >>> > > >> >> > >> >>> > > >> >> Coredump memory ranges > >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >>> > > >> >> > >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> > > >> >> > >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >>> > > >> >> { > >> >>> > > >> >> struct memblock_region reg = { > >> >>> > > >> >> .size = 0, > >> >>> > > >> >> }; > >> >>> > > >> >> > >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> > > >> >> > >> >>> > > >> >> if (reg.size) > >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> > > >> >> comment this out */ > >> >>> > > >> >> } > >> >>> > > >> > > >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >>> > > >> > memory contents of the *crashed* kernel. > >> >>> > > >> > > >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >>> > > >> >> > >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> > > >> >> fail. > >> >>> > > >> >> > >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >>> > > >> > > >> >>> > > >> > I still don't understand why we need to carry over the information > >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >>> > > >> > > >> >>> > > >> > >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >>> > > >> memblock_reserve()'d now. > >> >>> > > > > >> >>> > > > For my better understandings, who is actually accessing such regions > >> >>> > > > during boot time, uefi itself or efistub? > >> >>> > > > > >> >>> > > > >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >>> > > instance, on QEMU we have > >> >>> > > > >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> > > 01000013) > >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > > >> >>> > > covered by > >> >>> > > > >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> > > ... > >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >>> > > >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >>> > UEFI boot services. > >> >>> > > >> >>> > > > >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> > > >> when booting the next kernel. > >> >>> > > > > >> >>> > > > not really. > >> >>> > > > > >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> > > >> > on crash dump kernel?) > >> >>> > > >> > > >> >>> > > >> > >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >>> > > >> memblock_reserve regions may be affected as well) > >> >>> > > > > >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >>> > > > exposed to user space (via proc/iomem). > >> >>> > > > > >> >>> > > > >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >>> > > >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >>> > not to allocate them as part of crash kernel's memory. > >> >>> > > >> >>> > But I'm not still convinced that we should export them in useable- > >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >>> > (or memblocks), I guess. > >> >>> > -> Bhupesh? > >> >>> > >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >>> than usable memory (which is from the dt node instead) should be > >> >>> reinitialized according to efi passed info, no? > >> >> > >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> > >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> with multiple entries in usable-memory-range. > >> >> > >> > > >> > In any case, the root of the problem is that memory regions lose their > >> > 'memory' annotation due to the way the memory map is mangled before > >> > being supplied to the kexec kernel. > >> > > >> > Would it be possible to classify all memory that we want to hide from > >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> > so this seems to be the most appropriate way to deal with the host > >> > kernel's memory contents. > >> > >> Hmm. wouldn't appending the acpi reclaim regions to > >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> be better? Because its indirectly achieving a similar objective > >> (although may be a subset of all System RAM regions on the primary > >> kernel's memory). > >> > >> I am not aware of the background about the current kexec-tools > >> implementation where we add only the crashkernel range to the dtb > >> being passed to the crashkernel. > >> > >> Probably Akashi can answer better, as to how we arrived at this design > >> approach and why we didn't want to expose all System RAM regions (i.e. > >> ! NOMPAP regions) to the crashkernel. > >> > >> I am suspecting that some issues were seen/meet when the System RAM (! > >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> finalized on this design approach, but this is something which is just > >> my guess. > >> > >> Regards, > >> Bhupesh > >> > >> >>> > > >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >>> > via a kernel command line parameter, "memmap=". > >> >>> > >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >>> e820 table. > >> >> > >> >> Thanks. I remember that you have explained it before. > >> >> > >> >> -Takahiro AKASHI > >> >> > >> >>> [snip] > >> >>> > >> >>> Thanks > >> >>> Dave > > > > ===8<== > > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> > > Date: Thu, 21 Dec 2017 19:14:23 +0900 > > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > > > > --- > > arch/arm64/mm/init.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 00e7b900ca41..8175db94257b 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > > struct memblock_region reg = { > > .size = 0, > > }; > > + u64 idx; > > + phys_addr_t start, end; > > > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > > - if (reg.size) > > - memblock_cap_memory_range(reg.base, reg.size); > > + if (reg.size) { > > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > > + &start, &end, NULL) > > + memblock_mark_nomap(start, end - start); > > + memblock_clear_nomap(reg.base, reg.size); > > + } > > } > > > > void __init arm64_memblock_init(void) > > -- > > 2.15.1 > > > > Thanks for the patch. After applying this on top of > 4.15.0-rc4-next-20171220, there seems to be a improvement and the > crashkernel boot no longer hangs while trying to access the acpi > tables. > > However I notice a minor issue. Please see the log below for > reference, the following message keeps spamming the console but I see > the crashkernel boot proceed further.: > > [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > [ 0.000000] NUMA: NODE_DATA(1) on node 0 > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > [ 0.000000] NUMA: NODE_DATA(2) on node 0 > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > [ 0.000000] NUMA: NODE_DATA(3) on node 0 > [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > page_structs > > [snip..] > [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > page_structs These messages shows that some "struct page" data are allocated on remote (numa) nodes. Since on your crash dump kernel, all the usable system memory (starting 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. In my best guess, you can ingore them except for some performance penality. This may be one side-effect. So does your crash dump kernel now boot successfully? Thanks, -Takahiro AKASHI > This WARNING message seems to come from vmemmap_verify() inside > 'mm/sparse-vmemmap.c' > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-22 8:33 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-22 8:33 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > Hello Akashi, > > On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > Bhupesh, > > > > Can you test the patch attached below, please? > > > > It is intended to retain already-reserved regions (ACPI reclaim memory > > in this case) in system ram (i.e. memblock.memory) without explicitly > > exporting them via usable-memory-range. > > (I still have to figure out what the side-effect of this patch is.) > > > > Thanks, > > -Takahiro AKASHI > > > > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> > <takahiro.akashi@linaro.org> wrote: > >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> > > <takahiro.akashi@linaro.org> wrote: > >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> > > >> <takahiro.akashi@linaro.org> wrote: > >> >>> > > >> > Bhupesh, Ard, > >> >>> > > >> > > >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> > > >> >> Hi Ard, Akashi > >> >>> > > >> >> > >> >>> > > >> > (snip) > >> >>> > > >> > > >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> > > >> >> , for details) > >> >>> > > >> > > >> >>> > > >> > Right. > >> >>> > > >> > > >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> > > >> >> with the crashkernel memory range: > >> >>> > > >> >> > >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> > > >> >> address_cells, size_cells); > >> >>> > > >> >> > >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> > > >> >> , for details) > >> >>> > > >> >> > >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> > > >> >> > >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >>> > > >> >> > >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >>> > > >> >> > >> >>> > > >> >> [snip..] > >> >>> > > >> >> > >> >>> > > >> >> Reserved memory range > >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >>> > > >> >> > >> >>> > > >> >> Coredump memory ranges > >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >>> > > >> >> > >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> > > >> >> > >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >>> > > >> >> { > >> >>> > > >> >> struct memblock_region reg = { > >> >>> > > >> >> .size = 0, > >> >>> > > >> >> }; > >> >>> > > >> >> > >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> > > >> >> > >> >>> > > >> >> if (reg.size) > >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> > > >> >> comment this out */ > >> >>> > > >> >> } > >> >>> > > >> > > >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >>> > > >> > memory contents of the *crashed* kernel. > >> >>> > > >> > > >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >>> > > >> >> > >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> > > >> >> fail. > >> >>> > > >> >> > >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >>> > > >> > > >> >>> > > >> > I still don't understand why we need to carry over the information > >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >>> > > >> > > >> >>> > > >> > >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >>> > > >> memblock_reserve()'d now. > >> >>> > > > > >> >>> > > > For my better understandings, who is actually accessing such regions > >> >>> > > > during boot time, uefi itself or efistub? > >> >>> > > > > >> >>> > > > >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >>> > > instance, on QEMU we have > >> >>> > > > >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> > > 01000013) > >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > > >> >>> > > covered by > >> >>> > > > >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> > > ... > >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >>> > > >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >>> > UEFI boot services. > >> >>> > > >> >>> > > > >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> > > >> when booting the next kernel. > >> >>> > > > > >> >>> > > > not really. > >> >>> > > > > >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> > > >> > on crash dump kernel?) > >> >>> > > >> > > >> >>> > > >> > >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >>> > > >> memblock_reserve regions may be affected as well) > >> >>> > > > > >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >>> > > > exposed to user space (via proc/iomem). > >> >>> > > > > >> >>> > > > >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >>> > > >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >>> > not to allocate them as part of crash kernel's memory. > >> >>> > > >> >>> > But I'm not still convinced that we should export them in useable- > >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >>> > (or memblocks), I guess. > >> >>> > -> Bhupesh? > >> >>> > >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >>> than usable memory (which is from the dt node instead) should be > >> >>> reinitialized according to efi passed info, no? > >> >> > >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> > >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> with multiple entries in usable-memory-range. > >> >> > >> > > >> > In any case, the root of the problem is that memory regions lose their > >> > 'memory' annotation due to the way the memory map is mangled before > >> > being supplied to the kexec kernel. > >> > > >> > Would it be possible to classify all memory that we want to hide from > >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> > so this seems to be the most appropriate way to deal with the host > >> > kernel's memory contents. > >> > >> Hmm. wouldn't appending the acpi reclaim regions to > >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> be better? Because its indirectly achieving a similar objective > >> (although may be a subset of all System RAM regions on the primary > >> kernel's memory). > >> > >> I am not aware of the background about the current kexec-tools > >> implementation where we add only the crashkernel range to the dtb > >> being passed to the crashkernel. > >> > >> Probably Akashi can answer better, as to how we arrived at this design > >> approach and why we didn't want to expose all System RAM regions (i.e. > >> ! NOMPAP regions) to the crashkernel. > >> > >> I am suspecting that some issues were seen/meet when the System RAM (! > >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> finalized on this design approach, but this is something which is just > >> my guess. > >> > >> Regards, > >> Bhupesh > >> > >> >>> > > >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >>> > via a kernel command line parameter, "memmap=". > >> >>> > >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >>> e820 table. > >> >> > >> >> Thanks. I remember that you have explained it before. > >> >> > >> >> -Takahiro AKASHI > >> >> > >> >>> [snip] > >> >>> > >> >>> Thanks > >> >>> Dave > > > > ===8<== > > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > > Date: Thu, 21 Dec 2017 19:14:23 +0900 > > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > > > > --- > > arch/arm64/mm/init.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 00e7b900ca41..8175db94257b 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > > struct memblock_region reg = { > > .size = 0, > > }; > > + u64 idx; > > + phys_addr_t start, end; > > > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > > - if (reg.size) > > - memblock_cap_memory_range(reg.base, reg.size); > > + if (reg.size) { > > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > > + &start, &end, NULL) > > + memblock_mark_nomap(start, end - start); > > + memblock_clear_nomap(reg.base, reg.size); > > + } > > } > > > > void __init arm64_memblock_init(void) > > -- > > 2.15.1 > > > > Thanks for the patch. After applying this on top of > 4.15.0-rc4-next-20171220, there seems to be a improvement and the > crashkernel boot no longer hangs while trying to access the acpi > tables. > > However I notice a minor issue. Please see the log below for > reference, the following message keeps spamming the console but I see > the crashkernel boot proceed further.: > > [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > [ 0.000000] NUMA: NODE_DATA(1) on node 0 > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > [ 0.000000] NUMA: NODE_DATA(2) on node 0 > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > [ 0.000000] NUMA: NODE_DATA(3) on node 0 > [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > page_structs > > [snip..] > [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > page_structs These messages shows that some "struct page" data are allocated on remote (numa) nodes. Since on your crash dump kernel, all the usable system memory (starting 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. In my best guess, you can ingore them except for some performance penality. This may be one side-effect. So does your crash dump kernel now boot successfully? Thanks, -Takahiro AKASHI > This WARNING message seems to come from vmemmap_verify() inside > 'mm/sparse-vmemmap.c' > > Regards, > Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-22 8:33 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-22 8:33 UTC (permalink / raw) To: linux-arm-kernel On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > Hello Akashi, > > On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > Bhupesh, > > > > Can you test the patch attached below, please? > > > > It is intended to retain already-reserved regions (ACPI reclaim memory > > in this case) in system ram (i.e. memblock.memory) without explicitly > > exporting them via usable-memory-range. > > (I still have to figure out what the side-effect of this patch is.) > > > > Thanks, > > -Takahiro AKASHI > > > > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> > <takahiro.akashi@linaro.org> wrote: > >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >>> > > <takahiro.akashi@linaro.org> wrote: > >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >>> > > >> <takahiro.akashi@linaro.org> wrote: > >> >>> > > >> > Bhupesh, Ard, > >> >>> > > >> > > >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >>> > > >> >> Hi Ard, Akashi > >> >>> > > >> >> > >> >>> > > >> > (snip) > >> >>> > > >> > > >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >>> > > >> >> , for details) > >> >>> > > >> > > >> >>> > > >> > Right. > >> >>> > > >> > > >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >>> > > >> >> with the crashkernel memory range: > >> >>> > > >> >> > >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >>> > > >> >> address_cells, size_cells); > >> >>> > > >> >> > >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >>> > > >> >> , for details) > >> >>> > > >> >> > >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >>> > > >> >> > >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >>> > > >> >> > >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >>> > > >> >> > >> >>> > > >> >> [snip..] > >> >>> > > >> >> > >> >>> > > >> >> Reserved memory range > >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >>> > > >> >> > >> >>> > > >> >> Coredump memory ranges > >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >>> > > >> >> > >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >>> > > >> >> > >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >>> > > >> >> { > >> >>> > > >> >> struct memblock_region reg = { > >> >>> > > >> >> .size = 0, > >> >>> > > >> >> }; > >> >>> > > >> >> > >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >>> > > >> >> > >> >>> > > >> >> if (reg.size) > >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >>> > > >> >> comment this out */ > >> >>> > > >> >> } > >> >>> > > >> > > >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >>> > > >> > memory contents of the *crashed* kernel. > >> >>> > > >> > > >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >>> > > >> >> > >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >>> > > >> >> fail. > >> >>> > > >> >> > >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >>> > > >> > > >> >>> > > >> > I still don't understand why we need to carry over the information > >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >>> > > >> > > >> >>> > > >> > >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >>> > > >> memblock_reserve()'d now. > >> >>> > > > > >> >>> > > > For my better understandings, who is actually accessing such regions > >> >>> > > > during boot time, uefi itself or efistub? > >> >>> > > > > >> >>> > > > >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >>> > > instance, on QEMU we have > >> >>> > > > >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >>> > > 01000013) > >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >>> > > BXPC 00000001) > >> >>> > > > >> >>> > > covered by > >> >>> > > > >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >>> > > ... > >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >>> > > >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >>> > UEFI boot services. > >> >>> > > >> >>> > > > >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >>> > > >> when booting the next kernel. > >> >>> > > > > >> >>> > > > not really. > >> >>> > > > > >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >>> > > >> > on crash dump kernel?) > >> >>> > > >> > > >> >>> > > >> > >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >>> > > >> memblock_reserve regions may be affected as well) > >> >>> > > > > >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >>> > > > exposed to user space (via proc/iomem). > >> >>> > > > > >> >>> > > > >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >>> > > >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >>> > not to allocate them as part of crash kernel's memory. > >> >>> > > >> >>> > But I'm not still convinced that we should export them in useable- > >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >>> > (or memblocks), I guess. > >> >>> > -> Bhupesh? > >> >>> > >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >>> than usable memory (which is from the dt node instead) should be > >> >>> reinitialized according to efi passed info, no? > >> >> > >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> > >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> with multiple entries in usable-memory-range. > >> >> > >> > > >> > In any case, the root of the problem is that memory regions lose their > >> > 'memory' annotation due to the way the memory map is mangled before > >> > being supplied to the kexec kernel. > >> > > >> > Would it be possible to classify all memory that we want to hide from > >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> > so this seems to be the most appropriate way to deal with the host > >> > kernel's memory contents. > >> > >> Hmm. wouldn't appending the acpi reclaim regions to > >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> be better? Because its indirectly achieving a similar objective > >> (although may be a subset of all System RAM regions on the primary > >> kernel's memory). > >> > >> I am not aware of the background about the current kexec-tools > >> implementation where we add only the crashkernel range to the dtb > >> being passed to the crashkernel. > >> > >> Probably Akashi can answer better, as to how we arrived at this design > >> approach and why we didn't want to expose all System RAM regions (i.e. > >> ! NOMPAP regions) to the crashkernel. > >> > >> I am suspecting that some issues were seen/meet when the System RAM (! > >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> finalized on this design approach, but this is something which is just > >> my guess. > >> > >> Regards, > >> Bhupesh > >> > >> >>> > > >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >>> > via a kernel command line parameter, "memmap=". > >> >>> > >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >>> e820 table. > >> >> > >> >> Thanks. I remember that you have explained it before. > >> >> > >> >> -Takahiro AKASHI > >> >> > >> >>> [snip] > >> >>> > >> >>> Thanks > >> >>> Dave > > > > ===8<== > > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > > Date: Thu, 21 Dec 2017 19:14:23 +0900 > > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > > > > --- > > arch/arm64/mm/init.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 00e7b900ca41..8175db94257b 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > > struct memblock_region reg = { > > .size = 0, > > }; > > + u64 idx; > > + phys_addr_t start, end; > > > > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > > > > - if (reg.size) > > - memblock_cap_memory_range(reg.base, reg.size); > > + if (reg.size) { > > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > > + &start, &end, NULL) > > + memblock_mark_nomap(start, end - start); > > + memblock_clear_nomap(reg.base, reg.size); > > + } > > } > > > > void __init arm64_memblock_init(void) > > -- > > 2.15.1 > > > > Thanks for the patch. After applying this on top of > 4.15.0-rc4-next-20171220, there seems to be a improvement and the > crashkernel boot no longer hangs while trying to access the acpi > tables. > > However I notice a minor issue. Please see the log below for > reference, the following message keeps spamming the console but I see > the crashkernel boot proceed further.: > > [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > [ 0.000000] NUMA: NODE_DATA(1) on node 0 > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > [ 0.000000] NUMA: NODE_DATA(2) on node 0 > [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > [ 0.000000] NUMA: NODE_DATA(3) on node 0 > [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > page_structs > [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > page_structs > > [snip..] > [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > page_structs These messages shows that some "struct page" data are allocated on remote (numa) nodes. Since on your crash dump kernel, all the usable system memory (starting 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. In my best guess, you can ingore them except for some performance penality. This may be one side-effect. So does your crash dump kernel now boot successfully? Thanks, -Takahiro AKASHI > This WARNING message seems to come from vmemmap_verify() inside > 'mm/sparse-vmemmap.c' > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-22 8:33 ` AKASHI Takahiro (?) @ 2017-12-23 19:51 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-23 19:51 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: >> Hello Akashi, >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > Bhupesh, >> > >> > Can you test the patch attached below, please? >> > >> > It is intended to retain already-reserved regions (ACPI reclaim memory >> > in this case) in system ram (i.e. memblock.memory) without explicitly >> > exporting them via usable-memory-range. >> > (I still have to figure out what the side-effect of this patch is.) >> > >> > Thanks, >> > -Takahiro AKASHI >> > >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >>> > > >> > Bhupesh, Ard, >> >> >>> > > >> > >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >>> > > >> >> Hi Ard, Akashi >> >> >>> > > >> >> >> >> >>> > > >> > (snip) >> >> >>> > > >> > >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >>> > > >> >> , for details) >> >> >>> > > >> > >> >> >>> > > >> > Right. >> >> >>> > > >> > >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >>> > > >> >> with the crashkernel memory range: >> >> >>> > > >> >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >>> > > >> >> address_cells, size_cells); >> >> >>> > > >> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >>> > > >> >> , for details) >> >> >>> > > >> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >>> > > >> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >> >>> > > >> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >> >>> > > >> >> >> >> >>> > > >> >> [snip..] >> >> >>> > > >> >> >> >> >>> > > >> >> Reserved memory range >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >> >>> > > >> >> >> >> >>> > > >> >> Coredump memory ranges >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >> >>> > > >> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >> >>> > > >> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >> >>> > > >> >> { >> >> >>> > > >> >> struct memblock_region reg = { >> >> >>> > > >> >> .size = 0, >> >> >>> > > >> >> }; >> >> >>> > > >> >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >>> > > >> >> >> >> >>> > > >> >> if (reg.size) >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >>> > > >> >> comment this out */ >> >> >>> > > >> >> } >> >> >>> > > >> > >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >> >>> > > >> > memory contents of the *crashed* kernel. >> >> >>> > > >> > >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >> >>> > > >> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >>> > > >> >> fail. >> >> >>> > > >> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >> >>> > > >> > >> >> >>> > > >> > I still don't understand why we need to carry over the information >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >> >>> > > >> > >> >> >>> > > >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >> >>> > > >> memblock_reserve()'d now. >> >> >>> > > > >> >> >>> > > > For my better understandings, who is actually accessing such regions >> >> >>> > > > during boot time, uefi itself or efistub? >> >> >>> > > > >> >> >>> > > >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >> >>> > > instance, on QEMU we have >> >> >>> > > >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> >>> > > 01000013) >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > >> >> >>> > > covered by >> >> >>> > > >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> >>> > > ... >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >>> > >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >> >>> > UEFI boot services. >> >> >>> > >> >> >>> > > >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >> >>> > > >> when booting the next kernel. >> >> >>> > > > >> >> >>> > > > not really. >> >> >>> > > > >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >> >>> > > >> > on crash dump kernel?) >> >> >>> > > >> > >> >> >>> > > >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >> >>> > > >> regions only revealed the bug, not created it (given that other >> >> >>> > > >> memblock_reserve regions may be affected as well) >> >> >>> > > > >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >> >>> > > > exposed to user space (via proc/iomem). >> >> >>> > > > >> >> >>> > > >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >> >>> > > as 'System RAM'. Do you think that could solve this? >> >> >>> > >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >> >>> > marking them under another name in /proc/iomem would also be good in order >> >> >>> > not to allocate them as part of crash kernel's memory. >> >> >>> > >> >> >>> > But I'm not still convinced that we should export them in useable- >> >> >>> > memory-range to crash dump kernel. They will be accessed through >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >> >>> > (or memblocks), I guess. >> >> >>> > -> Bhupesh? >> >> >>> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >> >>> than usable memory (which is from the dt node instead) should be >> >> >>> reinitialized according to efi passed info, no? >> >> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> >> with multiple entries in usable-memory-range. >> >> >> >> >> > >> >> > In any case, the root of the problem is that memory regions lose their >> >> > 'memory' annotation due to the way the memory map is mangled before >> >> > being supplied to the kexec kernel. >> >> > >> >> > Would it be possible to classify all memory that we want to hide from >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> >> > so this seems to be the most appropriate way to deal with the host >> >> > kernel's memory contents. >> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> >> be better? Because its indirectly achieving a similar objective >> >> (although may be a subset of all System RAM regions on the primary >> >> kernel's memory). >> >> >> >> I am not aware of the background about the current kexec-tools >> >> implementation where we add only the crashkernel range to the dtb >> >> being passed to the crashkernel. >> >> >> >> Probably Akashi can answer better, as to how we arrived at this design >> >> approach and why we didn't want to expose all System RAM regions (i.e. >> >> ! NOMPAP regions) to the crashkernel. >> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> >> NOMAP regions) were exposed to the crashkernel, and that's why we >> >> finalized on this design approach, but this is something which is just >> >> my guess. >> >> >> >> Regards, >> >> Bhupesh >> >> >> >> >>> > >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> >>> > via a kernel command line parameter, "memmap=". >> >> >>> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >> >>> e820 table. >> >> >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> >> >> -Takahiro AKASHI >> >> >> >> >> >>> [snip] >> >> >>> >> >> >>> Thanks >> >> >>> Dave >> > >> > ===8<== >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP >> > >> > --- >> > arch/arm64/mm/init.c | 10 ++++++++-- >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> > index 00e7b900ca41..8175db94257b 100644 >> > --- a/arch/arm64/mm/init.c >> > +++ b/arch/arm64/mm/init.c >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) >> > struct memblock_region reg = { >> > .size = 0, >> > }; >> > + u64 idx; >> > + phys_addr_t start, end; >> > >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >> > - if (reg.size) >> > - memblock_cap_memory_range(reg.base, reg.size); >> > + if (reg.size) { >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, >> > + &start, &end, NULL) >> > + memblock_mark_nomap(start, end - start); >> > + memblock_clear_nomap(reg.base, reg.size); >> > + } >> > } >> > >> > void __init arm64_memblock_init(void) >> > -- >> > 2.15.1 >> > >> >> Thanks for the patch. After applying this on top of >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the >> crashkernel boot no longer hangs while trying to access the acpi >> tables. >> >> However I notice a minor issue. Please see the log below for >> reference, the following message keeps spamming the console but I see >> the crashkernel boot proceed further.: >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode >> page_structs >> >> [snip..] >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode >> page_structs > > These messages shows that some "struct page" data are allocated on remote > (numa) nodes. > Since on your crash dump kernel, all the usable system memory (starting > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > > In my best guess, you can ingore them except for some performance penality. > This may be one side-effect. > > So does your crash dump kernel now boot successfully? > Indeed. The crash dump kernel now boots successfully and the crash dump core can be saved properly as well (I tried saving it to local disk). However, the 'potential offnode page_structs' WARN messages hog the console and delay crashkernel boot for a significant duration, which can be irritating. Can we also consider ratelimiting this WARNING message [which seems to come from vmemmap_verify()] if invoked in the context of crash kernel, in addition to making the above change suggested by you. Thanks for the help. Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-23 19:51 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-23 19:51 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: >> Hello Akashi, >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > Bhupesh, >> > >> > Can you test the patch attached below, please? >> > >> > It is intended to retain already-reserved regions (ACPI reclaim memory >> > in this case) in system ram (i.e. memblock.memory) without explicitly >> > exporting them via usable-memory-range. >> > (I still have to figure out what the side-effect of this patch is.) >> > >> > Thanks, >> > -Takahiro AKASHI >> > >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> >> <ard.biesheuvel@linaro.org> wrote: >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> >> > <takahiro.akashi@linaro.org> wrote: >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >> >>> > > <takahiro.akashi@linaro.org> wrote: >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: >> >> >>> > > >> > Bhupesh, Ard, >> >> >>> > > >> > >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >>> > > >> >> Hi Ard, Akashi >> >> >>> > > >> >> >> >> >>> > > >> > (snip) >> >> >>> > > >> > >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >>> > > >> >> , for details) >> >> >>> > > >> > >> >> >>> > > >> > Right. >> >> >>> > > >> > >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >>> > > >> >> with the crashkernel memory range: >> >> >>> > > >> >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >>> > > >> >> address_cells, size_cells); >> >> >>> > > >> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >>> > > >> >> , for details) >> >> >>> > > >> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >>> > > >> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >> >>> > > >> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >> >>> > > >> >> >> >> >>> > > >> >> [snip..] >> >> >>> > > >> >> >> >> >>> > > >> >> Reserved memory range >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >> >>> > > >> >> >> >> >>> > > >> >> Coredump memory ranges >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >> >>> > > >> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >> >>> > > >> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >> >>> > > >> >> { >> >> >>> > > >> >> struct memblock_region reg = { >> >> >>> > > >> >> .size = 0, >> >> >>> > > >> >> }; >> >> >>> > > >> >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >>> > > >> >> >> >> >>> > > >> >> if (reg.size) >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >>> > > >> >> comment this out */ >> >> >>> > > >> >> } >> >> >>> > > >> > >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >> >>> > > >> > memory contents of the *crashed* kernel. >> >> >>> > > >> > >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >> >>> > > >> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >>> > > >> >> fail. >> >> >>> > > >> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >> >>> > > >> > >> >> >>> > > >> > I still don't understand why we need to carry over the information >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >> >>> > > >> > >> >> >>> > > >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >> >>> > > >> memblock_reserve()'d now. >> >> >>> > > > >> >> >>> > > > For my better understandings, who is actually accessing such regions >> >> >>> > > > during boot time, uefi itself or efistub? >> >> >>> > > > >> >> >>> > > >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >> >>> > > instance, on QEMU we have >> >> >>> > > >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> >>> > > 01000013) >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > >> >> >>> > > covered by >> >> >>> > > >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> >>> > > ... >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >>> > >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >> >>> > UEFI boot services. >> >> >>> > >> >> >>> > > >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >> >>> > > >> when booting the next kernel. >> >> >>> > > > >> >> >>> > > > not really. >> >> >>> > > > >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >> >>> > > >> > on crash dump kernel?) >> >> >>> > > >> > >> >> >>> > > >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >> >>> > > >> regions only revealed the bug, not created it (given that other >> >> >>> > > >> memblock_reserve regions may be affected as well) >> >> >>> > > > >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >> >>> > > > exposed to user space (via proc/iomem). >> >> >>> > > > >> >> >>> > > >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >> >>> > > as 'System RAM'. Do you think that could solve this? >> >> >>> > >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >> >>> > marking them under another name in /proc/iomem would also be good in order >> >> >>> > not to allocate them as part of crash kernel's memory. >> >> >>> > >> >> >>> > But I'm not still convinced that we should export them in useable- >> >> >>> > memory-range to crash dump kernel. They will be accessed through >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >> >>> > (or memblocks), I guess. >> >> >>> > -> Bhupesh? >> >> >>> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >> >>> than usable memory (which is from the dt node instead) should be >> >> >>> reinitialized according to efi passed info, no? >> >> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> >> with multiple entries in usable-memory-range. >> >> >> >> >> > >> >> > In any case, the root of the problem is that memory regions lose their >> >> > 'memory' annotation due to the way the memory map is mangled before >> >> > being supplied to the kexec kernel. >> >> > >> >> > Would it be possible to classify all memory that we want to hide from >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> >> > so this seems to be the most appropriate way to deal with the host >> >> > kernel's memory contents. >> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> >> be better? Because its indirectly achieving a similar objective >> >> (although may be a subset of all System RAM regions on the primary >> >> kernel's memory). >> >> >> >> I am not aware of the background about the current kexec-tools >> >> implementation where we add only the crashkernel range to the dtb >> >> being passed to the crashkernel. >> >> >> >> Probably Akashi can answer better, as to how we arrived at this design >> >> approach and why we didn't want to expose all System RAM regions (i.e. >> >> ! NOMPAP regions) to the crashkernel. >> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> >> NOMAP regions) were exposed to the crashkernel, and that's why we >> >> finalized on this design approach, but this is something which is just >> >> my guess. >> >> >> >> Regards, >> >> Bhupesh >> >> >> >> >>> > >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> >>> > via a kernel command line parameter, "memmap=". >> >> >>> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >> >>> e820 table. >> >> >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> >> >> -Takahiro AKASHI >> >> >> >> >> >>> [snip] >> >> >>> >> >> >>> Thanks >> >> >>> Dave >> > >> > ===8<== >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP >> > >> > --- >> > arch/arm64/mm/init.c | 10 ++++++++-- >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> > index 00e7b900ca41..8175db94257b 100644 >> > --- a/arch/arm64/mm/init.c >> > +++ b/arch/arm64/mm/init.c >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) >> > struct memblock_region reg = { >> > .size = 0, >> > }; >> > + u64 idx; >> > + phys_addr_t start, end; >> > >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >> > - if (reg.size) >> > - memblock_cap_memory_range(reg.base, reg.size); >> > + if (reg.size) { >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, >> > + &start, &end, NULL) >> > + memblock_mark_nomap(start, end - start); >> > + memblock_clear_nomap(reg.base, reg.size); >> > + } >> > } >> > >> > void __init arm64_memblock_init(void) >> > -- >> > 2.15.1 >> > >> >> Thanks for the patch. After applying this on top of >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the >> crashkernel boot no longer hangs while trying to access the acpi >> tables. >> >> However I notice a minor issue. Please see the log below for >> reference, the following message keeps spamming the console but I see >> the crashkernel boot proceed further.: >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode >> page_structs >> >> [snip..] >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode >> page_structs > > These messages shows that some "struct page" data are allocated on remote > (numa) nodes. > Since on your crash dump kernel, all the usable system memory (starting > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > > In my best guess, you can ingore them except for some performance penality. > This may be one side-effect. > > So does your crash dump kernel now boot successfully? > Indeed. The crash dump kernel now boots successfully and the crash dump core can be saved properly as well (I tried saving it to local disk). However, the 'potential offnode page_structs' WARN messages hog the console and delay crashkernel boot for a significant duration, which can be irritating. Can we also consider ratelimiting this WARNING message [which seems to come from vmemmap_verify()] if invoked in the context of crash kernel, in addition to making the above change suggested by you. Thanks for the help. Regards, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-23 19:51 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-23 19:51 UTC (permalink / raw) To: linux-arm-kernel On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: >> Hello Akashi, >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > Bhupesh, >> > >> > Can you test the patch attached below, please? >> > >> > It is intended to retain already-reserved regions (ACPI reclaim memory >> > in this case) in system ram (i.e. memblock.memory) without explicitly >> > exporting them via usable-memory-range. >> > (I still have to figure out what the side-effect of this patch is.) >> > >> > Thanks, >> > -Takahiro AKASHI >> > >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> >> <ard.biesheuvel@linaro.org> wrote: >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> >> > <takahiro.akashi@linaro.org> wrote: >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >> >>> > > <takahiro.akashi@linaro.org> wrote: >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: >> >> >>> > > >> > Bhupesh, Ard, >> >> >>> > > >> > >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >>> > > >> >> Hi Ard, Akashi >> >> >>> > > >> >> >> >> >>> > > >> > (snip) >> >> >>> > > >> > >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >>> > > >> >> , for details) >> >> >>> > > >> > >> >> >>> > > >> > Right. >> >> >>> > > >> > >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >>> > > >> >> with the crashkernel memory range: >> >> >>> > > >> >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >>> > > >> >> address_cells, size_cells); >> >> >>> > > >> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >>> > > >> >> , for details) >> >> >>> > > >> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >>> > > >> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >> >>> > > >> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >> >>> > > >> >> >> >> >>> > > >> >> [snip..] >> >> >>> > > >> >> >> >> >>> > > >> >> Reserved memory range >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >> >>> > > >> >> >> >> >>> > > >> >> Coredump memory ranges >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >> >>> > > >> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >> >>> > > >> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >> >>> > > >> >> { >> >> >>> > > >> >> struct memblock_region reg = { >> >> >>> > > >> >> .size = 0, >> >> >>> > > >> >> }; >> >> >>> > > >> >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >>> > > >> >> >> >> >>> > > >> >> if (reg.size) >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >>> > > >> >> comment this out */ >> >> >>> > > >> >> } >> >> >>> > > >> > >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >> >>> > > >> > memory contents of the *crashed* kernel. >> >> >>> > > >> > >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >> >>> > > >> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >>> > > >> >> fail. >> >> >>> > > >> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >> >>> > > >> > >> >> >>> > > >> > I still don't understand why we need to carry over the information >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >> >>> > > >> > >> >> >>> > > >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >> >>> > > >> memblock_reserve()'d now. >> >> >>> > > > >> >> >>> > > > For my better understandings, who is actually accessing such regions >> >> >>> > > > during boot time, uefi itself or efistub? >> >> >>> > > > >> >> >>> > > >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >> >>> > > instance, on QEMU we have >> >> >>> > > >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> >>> > > 01000013) >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> >>> > > BXPC 00000001) >> >> >>> > > >> >> >>> > > covered by >> >> >>> > > >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> >>> > > ... >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >>> > >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >> >>> > UEFI boot services. >> >> >>> > >> >> >>> > > >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >> >>> > > >> when booting the next kernel. >> >> >>> > > > >> >> >>> > > > not really. >> >> >>> > > > >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >> >>> > > >> > on crash dump kernel?) >> >> >>> > > >> > >> >> >>> > > >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >> >>> > > >> regions only revealed the bug, not created it (given that other >> >> >>> > > >> memblock_reserve regions may be affected as well) >> >> >>> > > > >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >> >>> > > > exposed to user space (via proc/iomem). >> >> >>> > > > >> >> >>> > > >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >> >>> > > as 'System RAM'. Do you think that could solve this? >> >> >>> > >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >> >>> > marking them under another name in /proc/iomem would also be good in order >> >> >>> > not to allocate them as part of crash kernel's memory. >> >> >>> > >> >> >>> > But I'm not still convinced that we should export them in useable- >> >> >>> > memory-range to crash dump kernel. They will be accessed through >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >> >>> > (or memblocks), I guess. >> >> >>> > -> Bhupesh? >> >> >>> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >> >>> than usable memory (which is from the dt node instead) should be >> >> >>> reinitialized according to efi passed info, no? >> >> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> >> with multiple entries in usable-memory-range. >> >> >> >> >> > >> >> > In any case, the root of the problem is that memory regions lose their >> >> > 'memory' annotation due to the way the memory map is mangled before >> >> > being supplied to the kexec kernel. >> >> > >> >> > Would it be possible to classify all memory that we want to hide from >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> >> > so this seems to be the most appropriate way to deal with the host >> >> > kernel's memory contents. >> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> >> be better? Because its indirectly achieving a similar objective >> >> (although may be a subset of all System RAM regions on the primary >> >> kernel's memory). >> >> >> >> I am not aware of the background about the current kexec-tools >> >> implementation where we add only the crashkernel range to the dtb >> >> being passed to the crashkernel. >> >> >> >> Probably Akashi can answer better, as to how we arrived at this design >> >> approach and why we didn't want to expose all System RAM regions (i.e. >> >> ! NOMPAP regions) to the crashkernel. >> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> >> NOMAP regions) were exposed to the crashkernel, and that's why we >> >> finalized on this design approach, but this is something which is just >> >> my guess. >> >> >> >> Regards, >> >> Bhupesh >> >> >> >> >>> > >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> >>> > via a kernel command line parameter, "memmap=". >> >> >>> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >> >>> e820 table. >> >> >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> >> >> -Takahiro AKASHI >> >> >> >> >> >>> [snip] >> >> >>> >> >> >>> Thanks >> >> >>> Dave >> > >> > ===8<== >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP >> > >> > --- >> > arch/arm64/mm/init.c | 10 ++++++++-- >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> > index 00e7b900ca41..8175db94257b 100644 >> > --- a/arch/arm64/mm/init.c >> > +++ b/arch/arm64/mm/init.c >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) >> > struct memblock_region reg = { >> > .size = 0, >> > }; >> > + u64 idx; >> > + phys_addr_t start, end; >> > >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> > >> > - if (reg.size) >> > - memblock_cap_memory_range(reg.base, reg.size); >> > + if (reg.size) { >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, >> > + &start, &end, NULL) >> > + memblock_mark_nomap(start, end - start); >> > + memblock_clear_nomap(reg.base, reg.size); >> > + } >> > } >> > >> > void __init arm64_memblock_init(void) >> > -- >> > 2.15.1 >> > >> >> Thanks for the patch. After applying this on top of >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the >> crashkernel boot no longer hangs while trying to access the acpi >> tables. >> >> However I notice a minor issue. Please see the log below for >> reference, the following message keeps spamming the console but I see >> the crashkernel boot proceed further.: >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode >> page_structs >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode >> page_structs >> >> [snip..] >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode >> page_structs > > These messages shows that some "struct page" data are allocated on remote > (numa) nodes. > Since on your crash dump kernel, all the usable system memory (starting > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > > In my best guess, you can ingore them except for some performance penality. > This may be one side-effect. > > So does your crash dump kernel now boot successfully? > Indeed. The crash dump kernel now boots successfully and the crash dump core can be saved properly as well (I tried saving it to local disk). However, the 'potential offnode page_structs' WARN messages hog the console and delay crashkernel boot for a significant duration, which can be irritating. Can we also consider ratelimiting this WARNING message [which seems to come from vmemmap_verify()] if invoked in the context of crash kernel, in addition to making the above change suggested by you. Thanks for the help. Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpNF5i3Eo7nMLr_z9r4VVbXhDwSJCQoiOh-A_jB6hV0_2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-23 19:51 ` Bhupesh Sharma (?) @ 2017-12-25 3:25 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-25 3:25 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: > On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > >> Hello Akashi, > >> > >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > Bhupesh, > >> > > >> > Can you test the patch attached below, please? > >> > > >> > It is intended to retain already-reserved regions (ACPI reclaim memory > >> > in this case) in system ram (i.e. memblock.memory) without explicitly > >> > exporting them via usable-memory-range. > >> > (I still have to figure out what the side-effect of this patch is.) > >> > > >> > Thanks, > >> > -Takahiro AKASHI > >> > > >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >>> > > >> > Bhupesh, Ard, > >> >> >>> > > >> > > >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> >>> > > >> >> Hi Ard, Akashi > >> >> >>> > > >> >> > >> >> >>> > > >> > (snip) > >> >> >>> > > >> > > >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> >>> > > >> >> , for details) > >> >> >>> > > >> > > >> >> >>> > > >> > Right. > >> >> >>> > > >> > > >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> >>> > > >> >> with the crashkernel memory range: > >> >> >>> > > >> >> > >> >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> >>> > > >> >> address_cells, size_cells); > >> >> >>> > > >> >> > >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> >>> > > >> >> , for details) > >> >> >>> > > >> >> > >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> >>> > > >> >> > >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >> >>> > > >> >> > >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >> >>> > > >> >> > >> >> >>> > > >> >> [snip..] > >> >> >>> > > >> >> > >> >> >>> > > >> >> Reserved memory range > >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >> >>> > > >> >> > >> >> >>> > > >> >> Coredump memory ranges > >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >> >>> > > >> >> > >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> >>> > > >> >> > >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >> >>> > > >> >> { > >> >> >>> > > >> >> struct memblock_region reg = { > >> >> >>> > > >> >> .size = 0, > >> >> >>> > > >> >> }; > >> >> >>> > > >> >> > >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> >>> > > >> >> > >> >> >>> > > >> >> if (reg.size) > >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> >>> > > >> >> comment this out */ > >> >> >>> > > >> >> } > >> >> >>> > > >> > > >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >> >>> > > >> > memory contents of the *crashed* kernel. > >> >> >>> > > >> > > >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> >>> > > >> >> > >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> >>> > > >> >> fail. > >> >> >>> > > >> >> > >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >> >>> > > >> > > >> >> >>> > > >> > I still don't understand why we need to carry over the information > >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >> >>> > > >> > > >> >> >>> > > >> > >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >> >>> > > >> memblock_reserve()'d now. > >> >> >>> > > > > >> >> >>> > > > For my better understandings, who is actually accessing such regions > >> >> >>> > > > during boot time, uefi itself or efistub? > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >> >>> > > instance, on QEMU we have > >> >> >>> > > > >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >> >>> > > 01000013) > >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > > >> >> >>> > > covered by > >> >> >>> > > > >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >> >>> > > ... > >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> >>> > > >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >> >>> > UEFI boot services. > >> >> >>> > > >> >> >>> > > > >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >> >>> > > >> when booting the next kernel. > >> >> >>> > > > > >> >> >>> > > > not really. > >> >> >>> > > > > >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >> >>> > > >> > on crash dump kernel?) > >> >> >>> > > >> > > >> >> >>> > > >> > >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >> >>> > > >> memblock_reserve regions may be affected as well) > >> >> >>> > > > > >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >> >>> > > > exposed to user space (via proc/iomem). > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >> >>> > > >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >> >>> > not to allocate them as part of crash kernel's memory. > >> >> >>> > > >> >> >>> > But I'm not still convinced that we should export them in useable- > >> >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >> >>> > (or memblocks), I guess. > >> >> >>> > -> Bhupesh? > >> >> >>> > >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >> >>> than usable memory (which is from the dt node instead) should be > >> >> >>> reinitialized according to efi passed info, no? > >> >> >> > >> >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> >> > >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> >> with multiple entries in usable-memory-range. > >> >> >> > >> >> > > >> >> > In any case, the root of the problem is that memory regions lose their > >> >> > 'memory' annotation due to the way the memory map is mangled before > >> >> > being supplied to the kexec kernel. > >> >> > > >> >> > Would it be possible to classify all memory that we want to hide from > >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> >> > so this seems to be the most appropriate way to deal with the host > >> >> > kernel's memory contents. > >> >> > >> >> Hmm. wouldn't appending the acpi reclaim regions to > >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> >> be better? Because its indirectly achieving a similar objective > >> >> (although may be a subset of all System RAM regions on the primary > >> >> kernel's memory). > >> >> > >> >> I am not aware of the background about the current kexec-tools > >> >> implementation where we add only the crashkernel range to the dtb > >> >> being passed to the crashkernel. > >> >> > >> >> Probably Akashi can answer better, as to how we arrived at this design > >> >> approach and why we didn't want to expose all System RAM regions (i.e. > >> >> ! NOMPAP regions) to the crashkernel. > >> >> > >> >> I am suspecting that some issues were seen/meet when the System RAM (! > >> >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> >> finalized on this design approach, but this is something which is just > >> >> my guess. > >> >> > >> >> Regards, > >> >> Bhupesh > >> >> > >> >> >>> > > >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> >>> > via a kernel command line parameter, "memmap=". > >> >> >>> > >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >> >>> e820 table. > >> >> >> > >> >> >> Thanks. I remember that you have explained it before. > >> >> >> > >> >> >> -Takahiro AKASHI > >> >> >> > >> >> >>> [snip] > >> >> >>> > >> >> >>> Thanks > >> >> >>> Dave > >> > > >> > ===8<== > >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> > >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > >> > > >> > --- > >> > arch/arm64/mm/init.c | 10 ++++++++-- > >> > 1 file changed, 8 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> > index 00e7b900ca41..8175db94257b 100644 > >> > --- a/arch/arm64/mm/init.c > >> > +++ b/arch/arm64/mm/init.c > >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > >> > struct memblock_region reg = { > >> > .size = 0, > >> > }; > >> > + u64 idx; > >> > + phys_addr_t start, end; > >> > > >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > > >> > - if (reg.size) > >> > - memblock_cap_memory_range(reg.base, reg.size); > >> > + if (reg.size) { > >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > >> > + &start, &end, NULL) > >> > + memblock_mark_nomap(start, end - start); > >> > + memblock_clear_nomap(reg.base, reg.size); > >> > + } > >> > } > >> > > >> > void __init arm64_memblock_init(void) > >> > -- > >> > 2.15.1 > >> > > >> > >> Thanks for the patch. After applying this on top of > >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the > >> crashkernel boot no longer hangs while trying to access the acpi > >> tables. > >> > >> However I notice a minor issue. Please see the log below for > >> reference, the following message keeps spamming the console but I see > >> the crashkernel boot proceed further.: > >> > >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 > >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > >> page_structs > >> > >> [snip..] > >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > >> page_structs > > > > These messages shows that some "struct page" data are allocated on remote > > (numa) nodes. > > Since on your crash dump kernel, all the usable system memory (starting > > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > > > > In my best guess, you can ingore them except for some performance penality. > > This may be one side-effect. > > > > So does your crash dump kernel now boot successfully? > > > > Indeed. The crash dump kernel now boots successfully and the crash > dump core can be saved properly as well (I tried saving it to local > disk). Thank you for the confirmation. (I'd like to suggest you to examine the core dump with crash utility.) > However, the 'potential offnode page_structs' WARN messages hog the > console and delay crashkernel boot for a significant duration, which > can be irritating. > > Can we also consider ratelimiting this WARNING message [which seems to > come from vmemmap_verify()] if invoked in the context of crash kernel, > in addition to making the above change suggested by you. Well, we may be able to change pr_warn() to pr_warn_once() here, but I hope that adding "numa=off" to kernel command line should also work. Thanks, -Takahiro AKASHI > Thanks for the help. > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-25 3:25 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-25 3:25 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: > On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > >> Hello Akashi, > >> > >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > >> <takahiro.akashi@linaro.org> wrote: > >> > Bhupesh, > >> > > >> > Can you test the patch attached below, please? > >> > > >> > It is intended to retain already-reserved regions (ACPI reclaim memory > >> > in this case) in system ram (i.e. memblock.memory) without explicitly > >> > exporting them via usable-memory-range. > >> > (I still have to figure out what the side-effect of this patch is.) > >> > > >> > Thanks, > >> > -Takahiro AKASHI > >> > > >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> >> <ard.biesheuvel@linaro.org> wrote: > >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> >> > <takahiro.akashi@linaro.org> wrote: > >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >> >>> > > <takahiro.akashi@linaro.org> wrote: > >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: > >> >> >>> > > >> > Bhupesh, Ard, > >> >> >>> > > >> > > >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> >>> > > >> >> Hi Ard, Akashi > >> >> >>> > > >> >> > >> >> >>> > > >> > (snip) > >> >> >>> > > >> > > >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> >>> > > >> >> , for details) > >> >> >>> > > >> > > >> >> >>> > > >> > Right. > >> >> >>> > > >> > > >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> >>> > > >> >> with the crashkernel memory range: > >> >> >>> > > >> >> > >> >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> >>> > > >> >> address_cells, size_cells); > >> >> >>> > > >> >> > >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> >>> > > >> >> , for details) > >> >> >>> > > >> >> > >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> >>> > > >> >> > >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >> >>> > > >> >> > >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >> >>> > > >> >> > >> >> >>> > > >> >> [snip..] > >> >> >>> > > >> >> > >> >> >>> > > >> >> Reserved memory range > >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >> >>> > > >> >> > >> >> >>> > > >> >> Coredump memory ranges > >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >> >>> > > >> >> > >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> >>> > > >> >> > >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >> >>> > > >> >> { > >> >> >>> > > >> >> struct memblock_region reg = { > >> >> >>> > > >> >> .size = 0, > >> >> >>> > > >> >> }; > >> >> >>> > > >> >> > >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> >>> > > >> >> > >> >> >>> > > >> >> if (reg.size) > >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> >>> > > >> >> comment this out */ > >> >> >>> > > >> >> } > >> >> >>> > > >> > > >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >> >>> > > >> > memory contents of the *crashed* kernel. > >> >> >>> > > >> > > >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> >>> > > >> >> > >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> >>> > > >> >> fail. > >> >> >>> > > >> >> > >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >> >>> > > >> > > >> >> >>> > > >> > I still don't understand why we need to carry over the information > >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >> >>> > > >> > > >> >> >>> > > >> > >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >> >>> > > >> memblock_reserve()'d now. > >> >> >>> > > > > >> >> >>> > > > For my better understandings, who is actually accessing such regions > >> >> >>> > > > during boot time, uefi itself or efistub? > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >> >>> > > instance, on QEMU we have > >> >> >>> > > > >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >> >>> > > 01000013) > >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > > >> >> >>> > > covered by > >> >> >>> > > > >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >> >>> > > ... > >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> >>> > > >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >> >>> > UEFI boot services. > >> >> >>> > > >> >> >>> > > > >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >> >>> > > >> when booting the next kernel. > >> >> >>> > > > > >> >> >>> > > > not really. > >> >> >>> > > > > >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >> >>> > > >> > on crash dump kernel?) > >> >> >>> > > >> > > >> >> >>> > > >> > >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >> >>> > > >> memblock_reserve regions may be affected as well) > >> >> >>> > > > > >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >> >>> > > > exposed to user space (via proc/iomem). > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >> >>> > > >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >> >>> > not to allocate them as part of crash kernel's memory. > >> >> >>> > > >> >> >>> > But I'm not still convinced that we should export them in useable- > >> >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >> >>> > (or memblocks), I guess. > >> >> >>> > -> Bhupesh? > >> >> >>> > >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >> >>> than usable memory (which is from the dt node instead) should be > >> >> >>> reinitialized according to efi passed info, no? > >> >> >> > >> >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> >> > >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> >> with multiple entries in usable-memory-range. > >> >> >> > >> >> > > >> >> > In any case, the root of the problem is that memory regions lose their > >> >> > 'memory' annotation due to the way the memory map is mangled before > >> >> > being supplied to the kexec kernel. > >> >> > > >> >> > Would it be possible to classify all memory that we want to hide from > >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> >> > so this seems to be the most appropriate way to deal with the host > >> >> > kernel's memory contents. > >> >> > >> >> Hmm. wouldn't appending the acpi reclaim regions to > >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> >> be better? Because its indirectly achieving a similar objective > >> >> (although may be a subset of all System RAM regions on the primary > >> >> kernel's memory). > >> >> > >> >> I am not aware of the background about the current kexec-tools > >> >> implementation where we add only the crashkernel range to the dtb > >> >> being passed to the crashkernel. > >> >> > >> >> Probably Akashi can answer better, as to how we arrived at this design > >> >> approach and why we didn't want to expose all System RAM regions (i.e. > >> >> ! NOMPAP regions) to the crashkernel. > >> >> > >> >> I am suspecting that some issues were seen/meet when the System RAM (! > >> >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> >> finalized on this design approach, but this is something which is just > >> >> my guess. > >> >> > >> >> Regards, > >> >> Bhupesh > >> >> > >> >> >>> > > >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> >>> > via a kernel command line parameter, "memmap=". > >> >> >>> > >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >> >>> e820 table. > >> >> >> > >> >> >> Thanks. I remember that you have explained it before. > >> >> >> > >> >> >> -Takahiro AKASHI > >> >> >> > >> >> >>> [snip] > >> >> >>> > >> >> >>> Thanks > >> >> >>> Dave > >> > > >> > ===8<== > >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > >> > > >> > --- > >> > arch/arm64/mm/init.c | 10 ++++++++-- > >> > 1 file changed, 8 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> > index 00e7b900ca41..8175db94257b 100644 > >> > --- a/arch/arm64/mm/init.c > >> > +++ b/arch/arm64/mm/init.c > >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > >> > struct memblock_region reg = { > >> > .size = 0, > >> > }; > >> > + u64 idx; > >> > + phys_addr_t start, end; > >> > > >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > > >> > - if (reg.size) > >> > - memblock_cap_memory_range(reg.base, reg.size); > >> > + if (reg.size) { > >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > >> > + &start, &end, NULL) > >> > + memblock_mark_nomap(start, end - start); > >> > + memblock_clear_nomap(reg.base, reg.size); > >> > + } > >> > } > >> > > >> > void __init arm64_memblock_init(void) > >> > -- > >> > 2.15.1 > >> > > >> > >> Thanks for the patch. After applying this on top of > >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the > >> crashkernel boot no longer hangs while trying to access the acpi > >> tables. > >> > >> However I notice a minor issue. Please see the log below for > >> reference, the following message keeps spamming the console but I see > >> the crashkernel boot proceed further.: > >> > >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 > >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > >> page_structs > >> > >> [snip..] > >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > >> page_structs > > > > These messages shows that some "struct page" data are allocated on remote > > (numa) nodes. > > Since on your crash dump kernel, all the usable system memory (starting > > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > > > > In my best guess, you can ingore them except for some performance penality. > > This may be one side-effect. > > > > So does your crash dump kernel now boot successfully? > > > > Indeed. The crash dump kernel now boots successfully and the crash > dump core can be saved properly as well (I tried saving it to local > disk). Thank you for the confirmation. (I'd like to suggest you to examine the core dump with crash utility.) > However, the 'potential offnode page_structs' WARN messages hog the > console and delay crashkernel boot for a significant duration, which > can be irritating. > > Can we also consider ratelimiting this WARNING message [which seems to > come from vmemmap_verify()] if invoked in the context of crash kernel, > in addition to making the above change suggested by you. Well, we may be able to change pr_warn() to pr_warn_once() here, but I hope that adding "numa=off" to kernel command line should also work. Thanks, -Takahiro AKASHI > Thanks for the help. > > Regards, > Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-25 3:25 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-25 3:25 UTC (permalink / raw) To: linux-arm-kernel On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: > On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > >> Hello Akashi, > >> > >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > >> <takahiro.akashi@linaro.org> wrote: > >> > Bhupesh, > >> > > >> > Can you test the patch attached below, please? > >> > > >> > It is intended to retain already-reserved regions (ACPI reclaim memory > >> > in this case) in system ram (i.e. memblock.memory) without explicitly > >> > exporting them via usable-memory-range. > >> > (I still have to figure out what the side-effect of this patch is.) > >> > > >> > Thanks, > >> > -Takahiro AKASHI > >> > > >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> >> <ard.biesheuvel@linaro.org> wrote: > >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> >> > <takahiro.akashi@linaro.org> wrote: > >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >> >>> > > <takahiro.akashi@linaro.org> wrote: > >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: > >> >> >>> > > >> > Bhupesh, Ard, > >> >> >>> > > >> > > >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> >>> > > >> >> Hi Ard, Akashi > >> >> >>> > > >> >> > >> >> >>> > > >> > (snip) > >> >> >>> > > >> > > >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> >>> > > >> >> , for details) > >> >> >>> > > >> > > >> >> >>> > > >> > Right. > >> >> >>> > > >> > > >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> >>> > > >> >> with the crashkernel memory range: > >> >> >>> > > >> >> > >> >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> >>> > > >> >> address_cells, size_cells); > >> >> >>> > > >> >> > >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> >>> > > >> >> , for details) > >> >> >>> > > >> >> > >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> >>> > > >> >> > >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >> >>> > > >> >> > >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >> >>> > > >> >> > >> >> >>> > > >> >> [snip..] > >> >> >>> > > >> >> > >> >> >>> > > >> >> Reserved memory range > >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >> >>> > > >> >> > >> >> >>> > > >> >> Coredump memory ranges > >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >> >>> > > >> >> > >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> >>> > > >> >> > >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >> >>> > > >> >> { > >> >> >>> > > >> >> struct memblock_region reg = { > >> >> >>> > > >> >> .size = 0, > >> >> >>> > > >> >> }; > >> >> >>> > > >> >> > >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> >>> > > >> >> > >> >> >>> > > >> >> if (reg.size) > >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> >>> > > >> >> comment this out */ > >> >> >>> > > >> >> } > >> >> >>> > > >> > > >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >> >>> > > >> > memory contents of the *crashed* kernel. > >> >> >>> > > >> > > >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> >>> > > >> >> > >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> >>> > > >> >> fail. > >> >> >>> > > >> >> > >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >> >>> > > >> > > >> >> >>> > > >> > I still don't understand why we need to carry over the information > >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >> >>> > > >> > > >> >> >>> > > >> > >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >> >>> > > >> memblock_reserve()'d now. > >> >> >>> > > > > >> >> >>> > > > For my better understandings, who is actually accessing such regions > >> >> >>> > > > during boot time, uefi itself or efistub? > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >> >>> > > instance, on QEMU we have > >> >> >>> > > > >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >> >>> > > 01000013) > >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >> >>> > > BXPC 00000001) > >> >> >>> > > > >> >> >>> > > covered by > >> >> >>> > > > >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >> >>> > > ... > >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> >>> > > >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >> >>> > UEFI boot services. > >> >> >>> > > >> >> >>> > > > >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >> >>> > > >> when booting the next kernel. > >> >> >>> > > > > >> >> >>> > > > not really. > >> >> >>> > > > > >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >> >>> > > >> > on crash dump kernel?) > >> >> >>> > > >> > > >> >> >>> > > >> > >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >> >>> > > >> memblock_reserve regions may be affected as well) > >> >> >>> > > > > >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >> >>> > > > exposed to user space (via proc/iomem). > >> >> >>> > > > > >> >> >>> > > > >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >> >>> > > >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >> >>> > not to allocate them as part of crash kernel's memory. > >> >> >>> > > >> >> >>> > But I'm not still convinced that we should export them in useable- > >> >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >> >>> > (or memblocks), I guess. > >> >> >>> > -> Bhupesh? > >> >> >>> > >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >> >>> than usable memory (which is from the dt node instead) should be > >> >> >>> reinitialized according to efi passed info, no? > >> >> >> > >> >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> >> > >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> >> with multiple entries in usable-memory-range. > >> >> >> > >> >> > > >> >> > In any case, the root of the problem is that memory regions lose their > >> >> > 'memory' annotation due to the way the memory map is mangled before > >> >> > being supplied to the kexec kernel. > >> >> > > >> >> > Would it be possible to classify all memory that we want to hide from > >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> >> > so this seems to be the most appropriate way to deal with the host > >> >> > kernel's memory contents. > >> >> > >> >> Hmm. wouldn't appending the acpi reclaim regions to > >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> >> be better? Because its indirectly achieving a similar objective > >> >> (although may be a subset of all System RAM regions on the primary > >> >> kernel's memory). > >> >> > >> >> I am not aware of the background about the current kexec-tools > >> >> implementation where we add only the crashkernel range to the dtb > >> >> being passed to the crashkernel. > >> >> > >> >> Probably Akashi can answer better, as to how we arrived at this design > >> >> approach and why we didn't want to expose all System RAM regions (i.e. > >> >> ! NOMPAP regions) to the crashkernel. > >> >> > >> >> I am suspecting that some issues were seen/meet when the System RAM (! > >> >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> >> finalized on this design approach, but this is something which is just > >> >> my guess. > >> >> > >> >> Regards, > >> >> Bhupesh > >> >> > >> >> >>> > > >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> >>> > via a kernel command line parameter, "memmap=". > >> >> >>> > >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >> >>> e820 table. > >> >> >> > >> >> >> Thanks. I remember that you have explained it before. > >> >> >> > >> >> >> -Takahiro AKASHI > >> >> >> > >> >> >>> [snip] > >> >> >>> > >> >> >>> Thanks > >> >> >>> Dave > >> > > >> > ===8<== > >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > >> > > >> > --- > >> > arch/arm64/mm/init.c | 10 ++++++++-- > >> > 1 file changed, 8 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> > index 00e7b900ca41..8175db94257b 100644 > >> > --- a/arch/arm64/mm/init.c > >> > +++ b/arch/arm64/mm/init.c > >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > >> > struct memblock_region reg = { > >> > .size = 0, > >> > }; > >> > + u64 idx; > >> > + phys_addr_t start, end; > >> > > >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> > > >> > - if (reg.size) > >> > - memblock_cap_memory_range(reg.base, reg.size); > >> > + if (reg.size) { > >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > >> > + &start, &end, NULL) > >> > + memblock_mark_nomap(start, end - start); > >> > + memblock_clear_nomap(reg.base, reg.size); > >> > + } > >> > } > >> > > >> > void __init arm64_memblock_init(void) > >> > -- > >> > 2.15.1 > >> > > >> > >> Thanks for the patch. After applying this on top of > >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the > >> crashkernel boot no longer hangs while trying to access the acpi > >> tables. > >> > >> However I notice a minor issue. Please see the log below for > >> reference, the following message keeps spamming the console but I see > >> the crashkernel boot proceed further.: > >> > >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 > >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 > >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > >> page_structs > >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > >> page_structs > >> > >> [snip..] > >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > >> page_structs > > > > These messages shows that some "struct page" data are allocated on remote > > (numa) nodes. > > Since on your crash dump kernel, all the usable system memory (starting > > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > > > > In my best guess, you can ingore them except for some performance penality. > > This may be one side-effect. > > > > So does your crash dump kernel now boot successfully? > > > > Indeed. The crash dump kernel now boots successfully and the crash > dump core can be saved properly as well (I tried saving it to local > disk). Thank you for the confirmation. (I'd like to suggest you to examine the core dump with crash utility.) > However, the 'potential offnode page_structs' WARN messages hog the > console and delay crashkernel boot for a significant duration, which > can be irritating. > > Can we also consider ratelimiting this WARNING message [which seems to > come from vmemmap_verify()] if invoked in the context of crash kernel, > in addition to making the above change suggested by you. Well, we may be able to change pr_warn() to pr_warn_once() here, but I hope that adding "numa=off" to kernel command line should also work. Thanks, -Takahiro AKASHI > Thanks for the help. > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171225032500.GA8877-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-25 3:25 ` AKASHI Takahiro (?) @ 2017-12-25 20:14 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-25 20:14 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: >> >> Hello Akashi, >> >> >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro >> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> > Bhupesh, >> >> > >> >> > Can you test the patch attached below, please? >> >> > >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly >> >> > exporting them via usable-memory-range. >> >> > (I still have to figure out what the side-effect of this patch is.) >> >> > >> >> > Thanks, >> >> > -Takahiro AKASHI >> >> > >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> >> >>> > > >> > Bhupesh, Ard, >> >> >> >>> > > >> > >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >> >>> > > >> >> Hi Ard, Akashi >> >> >> >>> > > >> >> >> >> >> >>> > > >> > (snip) >> >> >> >>> > > >> > >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >> >>> > > >> >> , for details) >> >> >> >>> > > >> > >> >> >> >>> > > >> > Right. >> >> >> >>> > > >> > >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >> >>> > > >> >> with the crashkernel memory range: >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >> >>> > > >> >> address_cells, size_cells); >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >> >>> > > >> >> , for details) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> [snip..] >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> Reserved memory range >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> Coredump memory ranges >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >> >> >>> > > >> >> { >> >> >> >>> > > >> >> struct memblock_region reg = { >> >> >> >>> > > >> >> .size = 0, >> >> >> >>> > > >> >> }; >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> if (reg.size) >> >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >> >>> > > >> >> comment this out */ >> >> >> >>> > > >> >> } >> >> >> >>> > > >> > >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >> >> >>> > > >> > memory contents of the *crashed* kernel. >> >> >> >>> > > >> > >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >> >>> > > >> >> fail. >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >> >> >>> > > >> > >> >> >> >>> > > >> > I still don't understand why we need to carry over the information >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >> >> >>> > > >> > >> >> >> >>> > > >> >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >> >> >>> > > >> memblock_reserve()'d now. >> >> >> >>> > > > >> >> >> >>> > > > For my better understandings, who is actually accessing such regions >> >> >> >>> > > > during boot time, uefi itself or efistub? >> >> >> >>> > > > >> >> >> >>> > > >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >> >> >>> > > instance, on QEMU we have >> >> >> >>> > > >> >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> >> >>> > > 01000013) >> >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > >> >> >> >>> > > covered by >> >> >> >>> > > >> >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> >> >>> > > ... >> >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >>> > >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >> >> >>> > UEFI boot services. >> >> >> >>> > >> >> >> >>> > > >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >> >> >>> > > >> when booting the next kernel. >> >> >> >>> > > > >> >> >> >>> > > > not really. >> >> >> >>> > > > >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >> >> >>> > > >> > on crash dump kernel?) >> >> >> >>> > > >> > >> >> >> >>> > > >> >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other >> >> >> >>> > > >> memblock_reserve regions may be affected as well) >> >> >> >>> > > > >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >> >> >>> > > > exposed to user space (via proc/iomem). >> >> >> >>> > > > >> >> >> >>> > > >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >> >> >>> > > as 'System RAM'. Do you think that could solve this? >> >> >> >>> > >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order >> >> >> >>> > not to allocate them as part of crash kernel's memory. >> >> >> >>> > >> >> >> >>> > But I'm not still convinced that we should export them in useable- >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >> >> >>> > (or memblocks), I guess. >> >> >> >>> > -> Bhupesh? >> >> >> >>> >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >> >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >> >> >>> than usable memory (which is from the dt node instead) should be >> >> >> >>> reinitialized according to efi passed info, no? >> >> >> >> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> >> >> with multiple entries in usable-memory-range. >> >> >> >> >> >> >> > >> >> >> > In any case, the root of the problem is that memory regions lose their >> >> >> > 'memory' annotation due to the way the memory map is mangled before >> >> >> > being supplied to the kexec kernel. >> >> >> > >> >> >> > Would it be possible to classify all memory that we want to hide from >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> >> >> > so this seems to be the most appropriate way to deal with the host >> >> >> > kernel's memory contents. >> >> >> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> >> >> be better? Because its indirectly achieving a similar objective >> >> >> (although may be a subset of all System RAM regions on the primary >> >> >> kernel's memory). >> >> >> >> >> >> I am not aware of the background about the current kexec-tools >> >> >> implementation where we add only the crashkernel range to the dtb >> >> >> being passed to the crashkernel. >> >> >> >> >> >> Probably Akashi can answer better, as to how we arrived at this design >> >> >> approach and why we didn't want to expose all System RAM regions (i.e. >> >> >> ! NOMPAP regions) to the crashkernel. >> >> >> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we >> >> >> finalized on this design approach, but this is something which is just >> >> >> my guess. >> >> >> >> >> >> Regards, >> >> >> Bhupesh >> >> >> >> >> >> >>> > >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> >> >>> > via a kernel command line parameter, "memmap=". >> >> >> >>> >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >> >> >>> e820 table. >> >> >> >> >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> >> >> >> >> -Takahiro AKASHI >> >> >> >> >> >> >> >>> [snip] >> >> >> >>> >> >> >> >>> Thanks >> >> >> >>> Dave >> >> > >> >> > ===8<== >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 >> >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP >> >> > >> >> > --- >> >> > arch/arm64/mm/init.c | 10 ++++++++-- >> >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> >> > >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> >> > index 00e7b900ca41..8175db94257b 100644 >> >> > --- a/arch/arm64/mm/init.c >> >> > +++ b/arch/arm64/mm/init.c >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) >> >> > struct memblock_region reg = { >> >> > .size = 0, >> >> > }; >> >> > + u64 idx; >> >> > + phys_addr_t start, end; >> >> > >> >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >> >> > - if (reg.size) >> >> > - memblock_cap_memory_range(reg.base, reg.size); >> >> > + if (reg.size) { >> >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, >> >> > + &start, &end, NULL) >> >> > + memblock_mark_nomap(start, end - start); >> >> > + memblock_clear_nomap(reg.base, reg.size); >> >> > + } >> >> > } >> >> > >> >> > void __init arm64_memblock_init(void) >> >> > -- >> >> > 2.15.1 >> >> > >> >> >> >> Thanks for the patch. After applying this on top of >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the >> >> crashkernel boot no longer hangs while trying to access the acpi >> >> tables. >> >> >> >> However I notice a minor issue. Please see the log below for >> >> reference, the following message keeps spamming the console but I see >> >> the crashkernel boot proceed further.: >> >> >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] >> >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] >> >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] >> >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] >> >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 >> >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode >> >> page_structs >> >> >> >> [snip..] >> >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode >> >> page_structs >> > >> > These messages shows that some "struct page" data are allocated on remote >> > (numa) nodes. >> > Since on your crash dump kernel, all the usable system memory (starting >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. >> > >> > In my best guess, you can ingore them except for some performance penality. >> > This may be one side-effect. >> > >> > So does your crash dump kernel now boot successfully? >> > >> >> Indeed. The crash dump kernel now boots successfully and the crash >> dump core can be saved properly as well (I tried saving it to local >> disk). > > Thank you for the confirmation. > (I'd like to suggest you to examine the core dump with crash utility.) > >> However, the 'potential offnode page_structs' WARN messages hog the >> console and delay crashkernel boot for a significant duration, which >> can be irritating. >> >> Can we also consider ratelimiting this WARNING message [which seems to >> come from vmemmap_verify()] if invoked in the context of crash kernel, >> in addition to making the above change suggested by you. > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > I hope that adding "numa=off" to kernel command line should also work. Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was my initial thought process as well, but I am not sure if this will cause any regressions on aarch64 systems which use crashdump feature. I think the 2nd solution, i.e limiting the warn message print frequency might be a better option. Can you please add the following patch (may be as a separate one) and send it along the patch which marks all areas other than the crashkernel region being passed to the crashkernel as NOMAP, so that we can get this issue fixed in upstream aarch64 kernel: diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 17acf01791fa..4c13fe3c644d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node, int actual_node = early_pfn_to_nid(pfn); if (node_distance(actual_node, node) > LOCAL_DISTANCE) - pr_warn("[%lx-%lx] potential offnode page_structs\n", + pr_warn_once("[%lx-%lx] potential offnode page_structs\n", start, end - 1); } I have tested this solution on huawei taishan board and can boot crashkernel successfully and also save the crash core properly (without the console warn message flooding which used to hold up the crashkernel boot). Thanks, Bhupesh ^ permalink raw reply related [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-25 20:14 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-25 20:14 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Dave Young, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: >> >> Hello Akashi, >> >> >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro >> >> <takahiro.akashi@linaro.org> wrote: >> >> > Bhupesh, >> >> > >> >> > Can you test the patch attached below, please? >> >> > >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly >> >> > exporting them via usable-memory-range. >> >> > (I still have to figure out what the side-effect of this patch is.) >> >> > >> >> > Thanks, >> >> > -Takahiro AKASHI >> >> > >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> >> >> <ard.biesheuvel@linaro.org> wrote: >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> >> >> > <takahiro.akashi@linaro.org> wrote: >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >> >> >>> > > <takahiro.akashi@linaro.org> wrote: >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: >> >> >> >>> > > >> > Bhupesh, Ard, >> >> >> >>> > > >> > >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >> >>> > > >> >> Hi Ard, Akashi >> >> >> >>> > > >> >> >> >> >> >>> > > >> > (snip) >> >> >> >>> > > >> > >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >> >>> > > >> >> , for details) >> >> >> >>> > > >> > >> >> >> >>> > > >> > Right. >> >> >> >>> > > >> > >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >> >>> > > >> >> with the crashkernel memory range: >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >> >>> > > >> >> address_cells, size_cells); >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >> >>> > > >> >> , for details) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> [snip..] >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> Reserved memory range >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> Coredump memory ranges >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >> >> >>> > > >> >> { >> >> >> >>> > > >> >> struct memblock_region reg = { >> >> >> >>> > > >> >> .size = 0, >> >> >> >>> > > >> >> }; >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> if (reg.size) >> >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >> >>> > > >> >> comment this out */ >> >> >> >>> > > >> >> } >> >> >> >>> > > >> > >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >> >> >>> > > >> > memory contents of the *crashed* kernel. >> >> >> >>> > > >> > >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >> >>> > > >> >> fail. >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >> >> >>> > > >> > >> >> >> >>> > > >> > I still don't understand why we need to carry over the information >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >> >> >>> > > >> > >> >> >> >>> > > >> >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >> >> >>> > > >> memblock_reserve()'d now. >> >> >> >>> > > > >> >> >> >>> > > > For my better understandings, who is actually accessing such regions >> >> >> >>> > > > during boot time, uefi itself or efistub? >> >> >> >>> > > > >> >> >> >>> > > >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >> >> >>> > > instance, on QEMU we have >> >> >> >>> > > >> >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> >> >>> > > 01000013) >> >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > >> >> >> >>> > > covered by >> >> >> >>> > > >> >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> >> >>> > > ... >> >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >>> > >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >> >> >>> > UEFI boot services. >> >> >> >>> > >> >> >> >>> > > >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >> >> >>> > > >> when booting the next kernel. >> >> >> >>> > > > >> >> >> >>> > > > not really. >> >> >> >>> > > > >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >> >> >>> > > >> > on crash dump kernel?) >> >> >> >>> > > >> > >> >> >> >>> > > >> >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other >> >> >> >>> > > >> memblock_reserve regions may be affected as well) >> >> >> >>> > > > >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >> >> >>> > > > exposed to user space (via proc/iomem). >> >> >> >>> > > > >> >> >> >>> > > >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >> >> >>> > > as 'System RAM'. Do you think that could solve this? >> >> >> >>> > >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order >> >> >> >>> > not to allocate them as part of crash kernel's memory. >> >> >> >>> > >> >> >> >>> > But I'm not still convinced that we should export them in useable- >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >> >> >>> > (or memblocks), I guess. >> >> >> >>> > -> Bhupesh? >> >> >> >>> >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >> >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >> >> >>> than usable memory (which is from the dt node instead) should be >> >> >> >>> reinitialized according to efi passed info, no? >> >> >> >> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> >> >> with multiple entries in usable-memory-range. >> >> >> >> >> >> >> > >> >> >> > In any case, the root of the problem is that memory regions lose their >> >> >> > 'memory' annotation due to the way the memory map is mangled before >> >> >> > being supplied to the kexec kernel. >> >> >> > >> >> >> > Would it be possible to classify all memory that we want to hide from >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> >> >> > so this seems to be the most appropriate way to deal with the host >> >> >> > kernel's memory contents. >> >> >> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> >> >> be better? Because its indirectly achieving a similar objective >> >> >> (although may be a subset of all System RAM regions on the primary >> >> >> kernel's memory). >> >> >> >> >> >> I am not aware of the background about the current kexec-tools >> >> >> implementation where we add only the crashkernel range to the dtb >> >> >> being passed to the crashkernel. >> >> >> >> >> >> Probably Akashi can answer better, as to how we arrived at this design >> >> >> approach and why we didn't want to expose all System RAM regions (i.e. >> >> >> ! NOMPAP regions) to the crashkernel. >> >> >> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we >> >> >> finalized on this design approach, but this is something which is just >> >> >> my guess. >> >> >> >> >> >> Regards, >> >> >> Bhupesh >> >> >> >> >> >> >>> > >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> >> >>> > via a kernel command line parameter, "memmap=". >> >> >> >>> >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >> >> >>> e820 table. >> >> >> >> >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> >> >> >> >> -Takahiro AKASHI >> >> >> >> >> >> >> >>> [snip] >> >> >> >>> >> >> >> >>> Thanks >> >> >> >>> Dave >> >> > >> >> > ===8<== >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 >> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP >> >> > >> >> > --- >> >> > arch/arm64/mm/init.c | 10 ++++++++-- >> >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> >> > >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> >> > index 00e7b900ca41..8175db94257b 100644 >> >> > --- a/arch/arm64/mm/init.c >> >> > +++ b/arch/arm64/mm/init.c >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) >> >> > struct memblock_region reg = { >> >> > .size = 0, >> >> > }; >> >> > + u64 idx; >> >> > + phys_addr_t start, end; >> >> > >> >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >> >> > - if (reg.size) >> >> > - memblock_cap_memory_range(reg.base, reg.size); >> >> > + if (reg.size) { >> >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, >> >> > + &start, &end, NULL) >> >> > + memblock_mark_nomap(start, end - start); >> >> > + memblock_clear_nomap(reg.base, reg.size); >> >> > + } >> >> > } >> >> > >> >> > void __init arm64_memblock_init(void) >> >> > -- >> >> > 2.15.1 >> >> > >> >> >> >> Thanks for the patch. After applying this on top of >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the >> >> crashkernel boot no longer hangs while trying to access the acpi >> >> tables. >> >> >> >> However I notice a minor issue. Please see the log below for >> >> reference, the following message keeps spamming the console but I see >> >> the crashkernel boot proceed further.: >> >> >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] >> >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] >> >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] >> >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] >> >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 >> >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode >> >> page_structs >> >> >> >> [snip..] >> >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode >> >> page_structs >> > >> > These messages shows that some "struct page" data are allocated on remote >> > (numa) nodes. >> > Since on your crash dump kernel, all the usable system memory (starting >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. >> > >> > In my best guess, you can ingore them except for some performance penality. >> > This may be one side-effect. >> > >> > So does your crash dump kernel now boot successfully? >> > >> >> Indeed. The crash dump kernel now boots successfully and the crash >> dump core can be saved properly as well (I tried saving it to local >> disk). > > Thank you for the confirmation. > (I'd like to suggest you to examine the core dump with crash utility.) > >> However, the 'potential offnode page_structs' WARN messages hog the >> console and delay crashkernel boot for a significant duration, which >> can be irritating. >> >> Can we also consider ratelimiting this WARNING message [which seems to >> come from vmemmap_verify()] if invoked in the context of crash kernel, >> in addition to making the above change suggested by you. > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > I hope that adding "numa=off" to kernel command line should also work. Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was my initial thought process as well, but I am not sure if this will cause any regressions on aarch64 systems which use crashdump feature. I think the 2nd solution, i.e limiting the warn message print frequency might be a better option. Can you please add the following patch (may be as a separate one) and send it along the patch which marks all areas other than the crashkernel region being passed to the crashkernel as NOMAP, so that we can get this issue fixed in upstream aarch64 kernel: diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 17acf01791fa..4c13fe3c644d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node, int actual_node = early_pfn_to_nid(pfn); if (node_distance(actual_node, node) > LOCAL_DISTANCE) - pr_warn("[%lx-%lx] potential offnode page_structs\n", + pr_warn_once("[%lx-%lx] potential offnode page_structs\n", start, end - 1); } I have tested this solution on huawei taishan board and can boot crashkernel successfully and also save the crash core properly (without the console warn message flooding which used to hold up the crashkernel boot). Thanks, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply related [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-25 20:14 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-25 20:14 UTC (permalink / raw) To: linux-arm-kernel On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro >> <takahiro.akashi@linaro.org> wrote: >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: >> >> Hello Akashi, >> >> >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro >> >> <takahiro.akashi@linaro.org> wrote: >> >> > Bhupesh, >> >> > >> >> > Can you test the patch attached below, please? >> >> > >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly >> >> > exporting them via usable-memory-range. >> >> > (I still have to figure out what the side-effect of this patch is.) >> >> > >> >> > Thanks, >> >> > -Takahiro AKASHI >> >> > >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel >> >> >> <ard.biesheuvel@linaro.org> wrote: >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro >> >> >> > <takahiro.akashi@linaro.org> wrote: >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro >> >> >> >>> > > <takahiro.akashi@linaro.org> wrote: >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro >> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: >> >> >> >>> > > >> > Bhupesh, Ard, >> >> >> >>> > > >> > >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: >> >> >> >>> > > >> >> Hi Ard, Akashi >> >> >> >>> > > >> >> >> >> >> >>> > > >> > (snip) >> >> >> >>> > > >> > >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt >> >> >> >>> > > >> >> , for details) >> >> >> >>> > > >> > >> >> >> >>> > > >> > Right. >> >> >> >>> > > >> > >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only >> >> >> >>> > > >> >> with the crashkernel memory range: >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ >> >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); >> >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, >> >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, >> >> >> >>> > > >> >> address_cells, size_cells); >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 >> >> >> >>> > > >> >> , for details) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> [snip..] >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> Reserved memory range >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> Coredump memory ranges >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) >> >> >> >>> > > >> >> { >> >> >> >>> > > >> >> struct memblock_region reg = { >> >> >> >>> > > >> >> .size = 0, >> >> >> >>> > > >> >> }; >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> if (reg.size) >> >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* >> >> >> >>> > > >> >> comment this out */ >> >> >> >>> > > >> >> } >> >> >> >>> > > >> > >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on >> >> >> >>> > > >> > memory contents of the *crashed* kernel. >> >> >> >>> > > >> > >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to >> >> >> >>> > > >> >> fail. >> >> >> >>> > > >> >> >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range' >> >> >> >>> > > >> > >> >> >> >>> > > >> > I still don't understand why we need to carry over the information >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? >> >> >> >>> > > >> > >> >> >> >>> > > >> >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are >> >> >> >>> > > >> memblock_reserve()'d now. >> >> >> >>> > > > >> >> >> >>> > > > For my better understandings, who is actually accessing such regions >> >> >> >>> > > > during boot time, uefi itself or efistub? >> >> >> >>> > > > >> >> >> >>> > > >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For >> >> >> >>> > > instance, on QEMU we have >> >> >> >>> > > >> >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) >> >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 >> >> >> >>> > > 01000013) >> >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 >> >> >> >>> > > BXPC 00000001) >> >> >> >>> > > >> >> >> >>> > > covered by >> >> >> >>> > > >> >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] >> >> >> >>> > > ... >> >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] >> >> >> >>> > >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting >> >> >> >>> > UEFI boot services. >> >> >> >>> > >> >> >> >>> > > >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table >> >> >> >>> > > >> when booting the next kernel. >> >> >> >>> > > > >> >> >> >>> > > > not really. >> >> >> >>> > > > >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code >> >> >> >>> > > >> > on crash dump kernel?) >> >> >> >>> > > >> > >> >> >> >>> > > >> >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other >> >> >> >>> > > >> memblock_reserve regions may be affected as well) >> >> >> >>> > > > >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is >> >> >> >>> > > > exposed to user space (via proc/iomem). >> >> >> >>> > > > >> >> >> >>> > > >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them >> >> >> >>> > > as 'System RAM'. Do you think that could solve this? >> >> >> >>> > >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order >> >> >> >>> > not to allocate them as part of crash kernel's memory. >> >> >> >>> > >> >> >> >>> > But I'm not still convinced that we should export them in useable- >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram >> >> >> >>> > (or memblocks), I guess. >> >> >> >>> > -> Bhupesh? >> >> >> >>> >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize >> >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all >> >> >> >>> the memory according to the efi memmap? For kdump kernel anything other >> >> >> >>> than usable memory (which is from the dt node instead) should be >> >> >> >>> reinitialized according to efi passed info, no? >> >> >> >> >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as >> >> >> >> usable-memory-range by fdt_enforce_memory_region(). >> >> >> >> >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well >> >> >> >> with multiple entries in usable-memory-range. >> >> >> >> >> >> >> > >> >> >> > In any case, the root of the problem is that memory regions lose their >> >> >> > 'memory' annotation due to the way the memory map is mangled before >> >> >> > being supplied to the kexec kernel. >> >> >> > >> >> >> > Would it be possible to classify all memory that we want to hide from >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), >> >> >> > so this seems to be the most appropriate way to deal with the host >> >> >> > kernel's memory contents. >> >> >> >> >> >> Hmm. wouldn't appending the acpi reclaim regions to >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel >> >> >> be better? Because its indirectly achieving a similar objective >> >> >> (although may be a subset of all System RAM regions on the primary >> >> >> kernel's memory). >> >> >> >> >> >> I am not aware of the background about the current kexec-tools >> >> >> implementation where we add only the crashkernel range to the dtb >> >> >> being passed to the crashkernel. >> >> >> >> >> >> Probably Akashi can answer better, as to how we arrived at this design >> >> >> approach and why we didn't want to expose all System RAM regions (i.e. >> >> >> ! NOMPAP regions) to the crashkernel. >> >> >> >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we >> >> >> finalized on this design approach, but this is something which is just >> >> >> my guess. >> >> >> >> >> >> Regards, >> >> >> Bhupesh >> >> >> >> >> >> >>> > >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel >> >> >> >>> > via a kernel command line parameter, "memmap=". >> >> >> >>> >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via >> >> >> >>> e820 table. >> >> >> >> >> >> >> >> Thanks. I remember that you have explained it before. >> >> >> >> >> >> >> >> -Takahiro AKASHI >> >> >> >> >> >> >> >>> [snip] >> >> >> >>> >> >> >> >>> Thanks >> >> >> >>> Dave >> >> > >> >> > ===8<== >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 >> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP >> >> > >> >> > --- >> >> > arch/arm64/mm/init.c | 10 ++++++++-- >> >> > 1 file changed, 8 insertions(+), 2 deletions(-) >> >> > >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> >> > index 00e7b900ca41..8175db94257b 100644 >> >> > --- a/arch/arm64/mm/init.c >> >> > +++ b/arch/arm64/mm/init.c >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) >> >> > struct memblock_region reg = { >> >> > .size = 0, >> >> > }; >> >> > + u64 idx; >> >> > + phys_addr_t start, end; >> >> > >> >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); >> >> > >> >> > - if (reg.size) >> >> > - memblock_cap_memory_range(reg.base, reg.size); >> >> > + if (reg.size) { >> >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, >> >> > + &start, &end, NULL) >> >> > + memblock_mark_nomap(start, end - start); >> >> > + memblock_clear_nomap(reg.base, reg.size); >> >> > + } >> >> > } >> >> > >> >> > void __init arm64_memblock_init(void) >> >> > -- >> >> > 2.15.1 >> >> > >> >> >> >> Thanks for the patch. After applying this on top of >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the >> >> crashkernel boot no longer hangs while trying to access the acpi >> >> tables. >> >> >> >> However I notice a minor issue. Please see the log below for >> >> reference, the following message keeps spamming the console but I see >> >> the crashkernel boot proceed further.: >> >> >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] >> >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] >> >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] >> >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] >> >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] >> >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 >> >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode >> >> page_structs >> >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode >> >> page_structs >> >> >> >> [snip..] >> >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode >> >> page_structs >> > >> > These messages shows that some "struct page" data are allocated on remote >> > (numa) nodes. >> > Since on your crash dump kernel, all the usable system memory (starting >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. >> > >> > In my best guess, you can ingore them except for some performance penality. >> > This may be one side-effect. >> > >> > So does your crash dump kernel now boot successfully? >> > >> >> Indeed. The crash dump kernel now boots successfully and the crash >> dump core can be saved properly as well (I tried saving it to local >> disk). > > Thank you for the confirmation. > (I'd like to suggest you to examine the core dump with crash utility.) > >> However, the 'potential offnode page_structs' WARN messages hog the >> console and delay crashkernel boot for a significant duration, which >> can be irritating. >> >> Can we also consider ratelimiting this WARNING message [which seems to >> come from vmemmap_verify()] if invoked in the context of crash kernel, >> in addition to making the above change suggested by you. > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > I hope that adding "numa=off" to kernel command line should also work. Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was my initial thought process as well, but I am not sure if this will cause any regressions on aarch64 systems which use crashdump feature. I think the 2nd solution, i.e limiting the warn message print frequency might be a better option. Can you please add the following patch (may be as a separate one) and send it along the patch which marks all areas other than the crashkernel region being passed to the crashkernel as NOMAP, so that we can get this issue fixed in upstream aarch64 kernel: diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 17acf01791fa..4c13fe3c644d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node, int actual_node = early_pfn_to_nid(pfn); if (node_distance(actual_node, node) > LOCAL_DISTANCE) - pr_warn("[%lx-%lx] potential offnode page_structs\n", + pr_warn_once("[%lx-%lx] potential offnode page_structs\n", start, end - 1); } I have tested this solution on huawei taishan board and can boot crashkernel successfully and also save the crash core properly (without the console warn message flooding which used to hold up the crashkernel boot). Thanks, Bhupesh ^ permalink raw reply related [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpMzYidDaC0_yfwgVOisH-FqcNViYj+Z54uKfUtHkJKKXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-25 20:14 ` Bhupesh Sharma (?) @ 2017-12-26 1:32 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 1:32 UTC (permalink / raw) To: Bhupesh Sharma Cc: AKASHI Takahiro, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On 12/26/17 at 01:44am, Bhupesh Sharma wrote: > On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: > >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > >> >> Hello Akashi, > >> >> > >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > >> >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> > Bhupesh, > >> >> > > >> >> > Can you test the patch attached below, please? > >> >> > > >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory > >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly > >> >> > exporting them via usable-memory-range. > >> >> > (I still have to figure out what the side-effect of this patch is.) > >> >> > > >> >> > Thanks, > >> >> > -Takahiro AKASHI > >> >> > > >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> >> >> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> >> >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >> >> >>> > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >> >> >>> > > >> <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> >> >> >>> > > >> > Bhupesh, Ard, > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> >> >>> > > >> >> Hi Ard, Akashi > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> > (snip) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> >> >>> > > >> >> , for details) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > Right. > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> >> >>> > > >> >> with the crashkernel memory range: > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> >> >>> > > >> >> address_cells, size_cells); > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> >> >>> > > >> >> , for details) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> [snip..] > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> Reserved memory range > >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> Coredump memory ranges > >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >> >> >>> > > >> >> { > >> >> >> >>> > > >> >> struct memblock_region reg = { > >> >> >> >>> > > >> >> .size = 0, > >> >> >> >>> > > >> >> }; > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> if (reg.size) > >> >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> >> >>> > > >> >> comment this out */ > >> >> >> >>> > > >> >> } > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >> >> >>> > > >> > memory contents of the *crashed* kernel. > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> >> >>> > > >> >> fail. > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > I still don't understand why we need to carry over the information > >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >> >> >>> > > >> memblock_reserve()'d now. > >> >> >> >>> > > > > >> >> >> >>> > > > For my better understandings, who is actually accessing such regions > >> >> >> >>> > > > during boot time, uefi itself or efistub? > >> >> >> >>> > > > > >> >> >> >>> > > > >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >> >> >>> > > instance, on QEMU we have > >> >> >> >>> > > > >> >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >> >> >>> > > 01000013) > >> >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > > >> >> >> >>> > > covered by > >> >> >> >>> > > > >> >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >> >> >>> > > ... > >> >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> >> >>> > > >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >> >> >>> > UEFI boot services. > >> >> >> >>> > > >> >> >> >>> > > > >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >> >> >>> > > >> when booting the next kernel. > >> >> >> >>> > > > > >> >> >> >>> > > > not really. > >> >> >> >>> > > > > >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >> >> >>> > > >> > on crash dump kernel?) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >> >> >>> > > >> memblock_reserve regions may be affected as well) > >> >> >> >>> > > > > >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >> >> >>> > > > exposed to user space (via proc/iomem). > >> >> >> >>> > > > > >> >> >> >>> > > > >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >> >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >> >> >>> > > >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >> >> >>> > not to allocate them as part of crash kernel's memory. > >> >> >> >>> > > >> >> >> >>> > But I'm not still convinced that we should export them in useable- > >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >> >> >>> > (or memblocks), I guess. > >> >> >> >>> > -> Bhupesh? > >> >> >> >>> > >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >> >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >> >> >>> than usable memory (which is from the dt node instead) should be > >> >> >> >>> reinitialized according to efi passed info, no? > >> >> >> >> > >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> >> >> > >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> >> >> with multiple entries in usable-memory-range. > >> >> >> >> > >> >> >> > > >> >> >> > In any case, the root of the problem is that memory regions lose their > >> >> >> > 'memory' annotation due to the way the memory map is mangled before > >> >> >> > being supplied to the kexec kernel. > >> >> >> > > >> >> >> > Would it be possible to classify all memory that we want to hide from > >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> >> >> > so this seems to be the most appropriate way to deal with the host > >> >> >> > kernel's memory contents. > >> >> >> > >> >> >> Hmm. wouldn't appending the acpi reclaim regions to > >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> >> >> be better? Because its indirectly achieving a similar objective > >> >> >> (although may be a subset of all System RAM regions on the primary > >> >> >> kernel's memory). > >> >> >> > >> >> >> I am not aware of the background about the current kexec-tools > >> >> >> implementation where we add only the crashkernel range to the dtb > >> >> >> being passed to the crashkernel. > >> >> >> > >> >> >> Probably Akashi can answer better, as to how we arrived at this design > >> >> >> approach and why we didn't want to expose all System RAM regions (i.e. > >> >> >> ! NOMPAP regions) to the crashkernel. > >> >> >> > >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! > >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> >> >> finalized on this design approach, but this is something which is just > >> >> >> my guess. > >> >> >> > >> >> >> Regards, > >> >> >> Bhupesh > >> >> >> > >> >> >> >>> > > >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> >> >>> > via a kernel command line parameter, "memmap=". > >> >> >> >>> > >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >> >> >>> e820 table. > >> >> >> >> > >> >> >> >> Thanks. I remember that you have explained it before. > >> >> >> >> > >> >> >> >> -Takahiro AKASHI > >> >> >> >> > >> >> >> >>> [snip] > >> >> >> >>> > >> >> >> >>> Thanks > >> >> >> >>> Dave > >> >> > > >> >> > ===8<== > >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > >> >> > From: AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> > >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > >> >> > > >> >> > --- > >> >> > arch/arm64/mm/init.c | 10 ++++++++-- > >> >> > 1 file changed, 8 insertions(+), 2 deletions(-) > >> >> > > >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> >> > index 00e7b900ca41..8175db94257b 100644 > >> >> > --- a/arch/arm64/mm/init.c > >> >> > +++ b/arch/arm64/mm/init.c > >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > >> >> > struct memblock_region reg = { > >> >> > .size = 0, > >> >> > }; > >> >> > + u64 idx; > >> >> > + phys_addr_t start, end; > >> >> > > >> >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> > > >> >> > - if (reg.size) > >> >> > - memblock_cap_memory_range(reg.base, reg.size); > >> >> > + if (reg.size) { > >> >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > >> >> > + &start, &end, NULL) > >> >> > + memblock_mark_nomap(start, end - start); > >> >> > + memblock_clear_nomap(reg.base, reg.size); > >> >> > + } > >> >> > } > >> >> > > >> >> > void __init arm64_memblock_init(void) > >> >> > -- > >> >> > 2.15.1 > >> >> > > >> >> > >> >> Thanks for the patch. After applying this on top of > >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the > >> >> crashkernel boot no longer hangs while trying to access the acpi > >> >> tables. > >> >> > >> >> However I notice a minor issue. Please see the log below for > >> >> reference, the following message keeps spamming the console but I see > >> >> the crashkernel boot proceed further.: > >> >> > >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > >> >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > >> >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > >> >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 > >> >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > >> >> page_structs > >> >> > >> >> [snip..] > >> >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > >> >> page_structs > >> > > >> > These messages shows that some "struct page" data are allocated on remote > >> > (numa) nodes. > >> > Since on your crash dump kernel, all the usable system memory (starting > >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > >> > > >> > In my best guess, you can ingore them except for some performance penality. > >> > This may be one side-effect. > >> > > >> > So does your crash dump kernel now boot successfully? > >> > > >> > >> Indeed. The crash dump kernel now boots successfully and the crash > >> dump core can be saved properly as well (I tried saving it to local > >> disk). > > > > Thank you for the confirmation. > > (I'd like to suggest you to examine the core dump with crash utility.) > > > >> However, the 'potential offnode page_structs' WARN messages hog the > >> console and delay crashkernel boot for a significant duration, which > >> can be irritating. > >> > >> Can we also consider ratelimiting this WARNING message [which seems to > >> come from vmemmap_verify()] if invoked in the context of crash kernel, > >> in addition to making the above change suggested by you. > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > I hope that adding "numa=off" to kernel command line should also work. > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > my initial thought process as well, but I am not sure if this will > cause any regressions on aarch64 systems which use crashdump feature. It should be fine since we use numa=off by default for all other arches ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save mm component memory usage. > > I think the 2nd solution, i.e limiting the warn message print > frequency might be a better option. Can you please add the following > patch (may be as a separate one) and send it along the patch which > marks all areas other than the crashkernel region being passed to the > crashkernel as NOMAP, so that we can get this issue fixed in upstream > aarch64 kernel: > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > index 17acf01791fa..4c13fe3c644d 100644 > --- a/mm/sparse-vmemmap.c > +++ b/mm/sparse-vmemmap.c > @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node, > int actual_node = early_pfn_to_nid(pfn); > > if (node_distance(actual_node, node) > LOCAL_DISTANCE) > - pr_warn("[%lx-%lx] potential offnode page_structs\n", > + pr_warn_once("[%lx-%lx] potential offnode page_structs\n", > start, end - 1); > } > > I have tested this solution on huawei taishan board and can boot > crashkernel successfully and also save the crash core properly > (without the console warn message flooding which used to hold up the > crashkernel boot). > > Thanks, > Bhupesh Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 1:32 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 1:32 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, AKASHI Takahiro, James Morse, Bhupesh SHARMA, linux-arm-kernel On 12/26/17 at 01:44am, Bhupesh Sharma wrote: > On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: > >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro > >> <takahiro.akashi@linaro.org> wrote: > >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > >> >> Hello Akashi, > >> >> > >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > >> >> <takahiro.akashi@linaro.org> wrote: > >> >> > Bhupesh, > >> >> > > >> >> > Can you test the patch attached below, please? > >> >> > > >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory > >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly > >> >> > exporting them via usable-memory-range. > >> >> > (I still have to figure out what the side-effect of this patch is.) > >> >> > > >> >> > Thanks, > >> >> > -Takahiro AKASHI > >> >> > > >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> >> >> <ard.biesheuvel@linaro.org> wrote: > >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> >> >> > <takahiro.akashi@linaro.org> wrote: > >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >> >> >>> > > <takahiro.akashi@linaro.org> wrote: > >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: > >> >> >> >>> > > >> > Bhupesh, Ard, > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> >> >>> > > >> >> Hi Ard, Akashi > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> > (snip) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> >> >>> > > >> >> , for details) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > Right. > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> >> >>> > > >> >> with the crashkernel memory range: > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> >> >>> > > >> >> address_cells, size_cells); > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> >> >>> > > >> >> , for details) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> [snip..] > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> Reserved memory range > >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> Coredump memory ranges > >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >> >> >>> > > >> >> { > >> >> >> >>> > > >> >> struct memblock_region reg = { > >> >> >> >>> > > >> >> .size = 0, > >> >> >> >>> > > >> >> }; > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> if (reg.size) > >> >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> >> >>> > > >> >> comment this out */ > >> >> >> >>> > > >> >> } > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >> >> >>> > > >> > memory contents of the *crashed* kernel. > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> >> >>> > > >> >> fail. > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > I still don't understand why we need to carry over the information > >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >> >> >>> > > >> memblock_reserve()'d now. > >> >> >> >>> > > > > >> >> >> >>> > > > For my better understandings, who is actually accessing such regions > >> >> >> >>> > > > during boot time, uefi itself or efistub? > >> >> >> >>> > > > > >> >> >> >>> > > > >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >> >> >>> > > instance, on QEMU we have > >> >> >> >>> > > > >> >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >> >> >>> > > 01000013) > >> >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > > >> >> >> >>> > > covered by > >> >> >> >>> > > > >> >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >> >> >>> > > ... > >> >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> >> >>> > > >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >> >> >>> > UEFI boot services. > >> >> >> >>> > > >> >> >> >>> > > > >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >> >> >>> > > >> when booting the next kernel. > >> >> >> >>> > > > > >> >> >> >>> > > > not really. > >> >> >> >>> > > > > >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >> >> >>> > > >> > on crash dump kernel?) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >> >> >>> > > >> memblock_reserve regions may be affected as well) > >> >> >> >>> > > > > >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >> >> >>> > > > exposed to user space (via proc/iomem). > >> >> >> >>> > > > > >> >> >> >>> > > > >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >> >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >> >> >>> > > >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >> >> >>> > not to allocate them as part of crash kernel's memory. > >> >> >> >>> > > >> >> >> >>> > But I'm not still convinced that we should export them in useable- > >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >> >> >>> > (or memblocks), I guess. > >> >> >> >>> > -> Bhupesh? > >> >> >> >>> > >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >> >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >> >> >>> than usable memory (which is from the dt node instead) should be > >> >> >> >>> reinitialized according to efi passed info, no? > >> >> >> >> > >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> >> >> > >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> >> >> with multiple entries in usable-memory-range. > >> >> >> >> > >> >> >> > > >> >> >> > In any case, the root of the problem is that memory regions lose their > >> >> >> > 'memory' annotation due to the way the memory map is mangled before > >> >> >> > being supplied to the kexec kernel. > >> >> >> > > >> >> >> > Would it be possible to classify all memory that we want to hide from > >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> >> >> > so this seems to be the most appropriate way to deal with the host > >> >> >> > kernel's memory contents. > >> >> >> > >> >> >> Hmm. wouldn't appending the acpi reclaim regions to > >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> >> >> be better? Because its indirectly achieving a similar objective > >> >> >> (although may be a subset of all System RAM regions on the primary > >> >> >> kernel's memory). > >> >> >> > >> >> >> I am not aware of the background about the current kexec-tools > >> >> >> implementation where we add only the crashkernel range to the dtb > >> >> >> being passed to the crashkernel. > >> >> >> > >> >> >> Probably Akashi can answer better, as to how we arrived at this design > >> >> >> approach and why we didn't want to expose all System RAM regions (i.e. > >> >> >> ! NOMPAP regions) to the crashkernel. > >> >> >> > >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! > >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> >> >> finalized on this design approach, but this is something which is just > >> >> >> my guess. > >> >> >> > >> >> >> Regards, > >> >> >> Bhupesh > >> >> >> > >> >> >> >>> > > >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> >> >>> > via a kernel command line parameter, "memmap=". > >> >> >> >>> > >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >> >> >>> e820 table. > >> >> >> >> > >> >> >> >> Thanks. I remember that you have explained it before. > >> >> >> >> > >> >> >> >> -Takahiro AKASHI > >> >> >> >> > >> >> >> >>> [snip] > >> >> >> >>> > >> >> >> >>> Thanks > >> >> >> >>> Dave > >> >> > > >> >> > ===8<== > >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > >> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > >> >> > > >> >> > --- > >> >> > arch/arm64/mm/init.c | 10 ++++++++-- > >> >> > 1 file changed, 8 insertions(+), 2 deletions(-) > >> >> > > >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> >> > index 00e7b900ca41..8175db94257b 100644 > >> >> > --- a/arch/arm64/mm/init.c > >> >> > +++ b/arch/arm64/mm/init.c > >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > >> >> > struct memblock_region reg = { > >> >> > .size = 0, > >> >> > }; > >> >> > + u64 idx; > >> >> > + phys_addr_t start, end; > >> >> > > >> >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> > > >> >> > - if (reg.size) > >> >> > - memblock_cap_memory_range(reg.base, reg.size); > >> >> > + if (reg.size) { > >> >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > >> >> > + &start, &end, NULL) > >> >> > + memblock_mark_nomap(start, end - start); > >> >> > + memblock_clear_nomap(reg.base, reg.size); > >> >> > + } > >> >> > } > >> >> > > >> >> > void __init arm64_memblock_init(void) > >> >> > -- > >> >> > 2.15.1 > >> >> > > >> >> > >> >> Thanks for the patch. After applying this on top of > >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the > >> >> crashkernel boot no longer hangs while trying to access the acpi > >> >> tables. > >> >> > >> >> However I notice a minor issue. Please see the log below for > >> >> reference, the following message keeps spamming the console but I see > >> >> the crashkernel boot proceed further.: > >> >> > >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > >> >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > >> >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > >> >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 > >> >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > >> >> page_structs > >> >> > >> >> [snip..] > >> >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > >> >> page_structs > >> > > >> > These messages shows that some "struct page" data are allocated on remote > >> > (numa) nodes. > >> > Since on your crash dump kernel, all the usable system memory (starting > >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > >> > > >> > In my best guess, you can ingore them except for some performance penality. > >> > This may be one side-effect. > >> > > >> > So does your crash dump kernel now boot successfully? > >> > > >> > >> Indeed. The crash dump kernel now boots successfully and the crash > >> dump core can be saved properly as well (I tried saving it to local > >> disk). > > > > Thank you for the confirmation. > > (I'd like to suggest you to examine the core dump with crash utility.) > > > >> However, the 'potential offnode page_structs' WARN messages hog the > >> console and delay crashkernel boot for a significant duration, which > >> can be irritating. > >> > >> Can we also consider ratelimiting this WARNING message [which seems to > >> come from vmemmap_verify()] if invoked in the context of crash kernel, > >> in addition to making the above change suggested by you. > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > I hope that adding "numa=off" to kernel command line should also work. > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > my initial thought process as well, but I am not sure if this will > cause any regressions on aarch64 systems which use crashdump feature. It should be fine since we use numa=off by default for all other arches ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save mm component memory usage. > > I think the 2nd solution, i.e limiting the warn message print > frequency might be a better option. Can you please add the following > patch (may be as a separate one) and send it along the patch which > marks all areas other than the crashkernel region being passed to the > crashkernel as NOMAP, so that we can get this issue fixed in upstream > aarch64 kernel: > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > index 17acf01791fa..4c13fe3c644d 100644 > --- a/mm/sparse-vmemmap.c > +++ b/mm/sparse-vmemmap.c > @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node, > int actual_node = early_pfn_to_nid(pfn); > > if (node_distance(actual_node, node) > LOCAL_DISTANCE) > - pr_warn("[%lx-%lx] potential offnode page_structs\n", > + pr_warn_once("[%lx-%lx] potential offnode page_structs\n", > start, end - 1); > } > > I have tested this solution on huawei taishan board and can boot > crashkernel successfully and also save the crash core properly > (without the console warn message flooding which used to hold up the > crashkernel boot). > > Thanks, > Bhupesh Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 1:32 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 1:32 UTC (permalink / raw) To: linux-arm-kernel On 12/26/17 at 01:44am, Bhupesh Sharma wrote: > On Mon, Dec 25, 2017 at 8:55 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Sun, Dec 24, 2017 at 01:21:02AM +0530, Bhupesh Sharma wrote: > >> On Fri, Dec 22, 2017 at 2:03 PM, AKASHI Takahiro > >> <takahiro.akashi@linaro.org> wrote: > >> > On Thu, Dec 21, 2017 at 05:36:30PM +0530, Bhupesh Sharma wrote: > >> >> Hello Akashi, > >> >> > >> >> On Thu, Dec 21, 2017 at 4:04 PM, AKASHI Takahiro > >> >> <takahiro.akashi@linaro.org> wrote: > >> >> > Bhupesh, > >> >> > > >> >> > Can you test the patch attached below, please? > >> >> > > >> >> > It is intended to retain already-reserved regions (ACPI reclaim memory > >> >> > in this case) in system ram (i.e. memblock.memory) without explicitly > >> >> > exporting them via usable-memory-range. > >> >> > (I still have to figure out what the side-effect of this patch is.) > >> >> > > >> >> > Thanks, > >> >> > -Takahiro AKASHI > >> >> > > >> >> > On Thu, Dec 21, 2017 at 01:30:43AM +0530, Bhupesh Sharma wrote: > >> >> >> On Tue, Dec 19, 2017 at 6:39 PM, Ard Biesheuvel > >> >> >> <ard.biesheuvel@linaro.org> wrote: > >> >> >> > On 19 December 2017 at 07:09, AKASHI Takahiro > >> >> >> > <takahiro.akashi@linaro.org> wrote: > >> >> >> >> On Mon, Dec 18, 2017 at 01:40:09PM +0800, Dave Young wrote: > >> >> >> >>> On 12/15/17 at 05:59pm, AKASHI Takahiro wrote: > >> >> >> >>> > On Wed, Dec 13, 2017 at 12:17:22PM +0000, Ard Biesheuvel wrote: > >> >> >> >>> > > On 13 December 2017 at 12:16, AKASHI Takahiro > >> >> >> >>> > > <takahiro.akashi@linaro.org> wrote: > >> >> >> >>> > > > On Wed, Dec 13, 2017 at 10:49:27AM +0000, Ard Biesheuvel wrote: > >> >> >> >>> > > >> On 13 December 2017 at 10:26, AKASHI Takahiro > >> >> >> >>> > > >> <takahiro.akashi@linaro.org> wrote: > >> >> >> >>> > > >> > Bhupesh, Ard, > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > On Wed, Dec 13, 2017 at 03:21:59AM +0530, Bhupesh Sharma wrote: > >> >> >> >>> > > >> >> Hi Ard, Akashi > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> > (snip) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> Looking deeper into the issue, since the arm64 kexec-tools uses the > >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt property to allow crash dump kernel to > >> >> >> >>> > > >> >> identify its own usable memory and exclude, at its boot time, any > >> >> >> >>> > > >> >> other memory areas that are part of the panicked kernel's memory. > >> >> >> >>> > > >> >> (see https://www.kernel.org/doc/Documentation/devicetree/bindings/chosen.txt > >> >> >> >>> > > >> >> , for details) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > Right. > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> 1). Now when 'kexec -p' is executed, this node is patched up only > >> >> >> >>> > > >> >> with the crashkernel memory range: > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> /* add linux,usable-memory-range */ > >> >> >> >>> > > >> >> nodeoffset = fdt_path_offset(new_buf, "/chosen"); > >> >> >> >>> > > >> >> result = fdt_setprop_range(new_buf, nodeoffset, > >> >> >> >>> > > >> >> PROP_USABLE_MEM_RANGE, &crash_reserved_mem, > >> >> >> >>> > > >> >> address_cells, size_cells); > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> (see https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n465 > >> >> >> >>> > > >> >> , for details) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 2). This excludes the ACPI reclaim regions irrespective of whether > >> >> >> >>> > > >> >> they are marked as System RAM or as RESERVED. As, > >> >> >> >>> > > >> >> 'linux,usable-memory-range' dt node is patched up only with > >> >> >> >>> > > >> >> 'crash_reserved_mem' and not 'system_memory_ranges' > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 3). As a result when the crashkernel boots up it doesn't find this > >> >> >> >>> > > >> >> ACPI memory and crashes while trying to access the same: > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname > >> >> >> >>> > > >> >> -r`.img --reuse-cmdline -d > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> [snip..] > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> Reserved memory range > >> >> >> >>> > > >> >> 000000000e800000-000000002e7fffff (0) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> Coredump memory ranges > >> >> >> >>> > > >> >> 0000000000000000-000000000e7fffff (0) > >> >> >> >>> > > >> >> 000000002e800000-000000003961ffff (0) > >> >> >> >>> > > >> >> 0000000039d40000-000000003ed2ffff (0) > >> >> >> >>> > > >> >> 000000003ed60000-000000003fbfffff (0) > >> >> >> >>> > > >> >> 0000001040000000-0000001ffbffffff (0) > >> >> >> >>> > > >> >> 0000002000000000-0000002ffbffffff (0) > >> >> >> >>> > > >> >> 0000009000000000-0000009ffbffffff (0) > >> >> >> >>> > > >> >> 000000a000000000-000000affbffffff (0) > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 4). So if we revert Ard's patch or just comment the fixing up of the > >> >> >> >>> > > >> >> memory cap'ing passed to the crash kernel inside > >> >> >> >>> > > >> >> 'arch/arm64/mm/init.c' (see below): > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> static void __init fdt_enforce_memory_region(void) > >> >> >> >>> > > >> >> { > >> >> >> >>> > > >> >> struct memblock_region reg = { > >> >> >> >>> > > >> >> .size = 0, > >> >> >> >>> > > >> >> }; > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> if (reg.size) > >> >> >> >>> > > >> >> //memblock_cap_memory_range(reg.base, reg.size); /* > >> >> >> >>> > > >> >> comment this out */ > >> >> >> >>> > > >> >> } > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > Please just don't do that. It can cause a fatal damage on > >> >> >> >>> > > >> > memory contents of the *crashed* kernel. > >> >> >> >>> > > >> > > >> >> >> >>> > > >> >> 5). Both the above temporary solutions fix the problem. > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 6). However exposing all System RAM regions to the crashkernel is not > >> >> >> >>> > > >> >> advisable and may cause the crashkernel or some crashkernel drivers to > >> >> >> >>> > > >> >> fail. > >> >> >> >>> > > >> >> > >> >> >> >>> > > >> >> 6a). I am trying an approach now, where the ACPI reclaim regions are > >> >> >> >>> > > >> >> added to '/proc/iomem' separately as ACPI reclaim regions by the > >> >> >> >>> > > >> >> kernel code and on the other hand the user-space 'kexec-tools' will > >> >> >> >>> > > >> >> pick up the ACPI reclaim regions from '/proc/iomem' and add it to the > >> >> >> >>> > > >> >> dt node 'linux,usable-memory-range' > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > I still don't understand why we need to carry over the information > >> >> >> >>> > > >> > about "ACPI Reclaim memory" to crash dump kernel. In my understandings, > >> >> >> >>> > > >> > such regions are free to be reused by the kernel after some point of > >> >> >> >>> > > >> > initialization. Why does crash dump kernel need to know about them? > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > >> >> >> >>> > > >> Not really. According to the UEFI spec, they can be reclaimed after > >> >> >> >>> > > >> the OS has initialized, i.e., when it has consumed the ACPI tables and > >> >> >> >>> > > >> no longer needs them. Of course, in order to be able to boot a kexec > >> >> >> >>> > > >> kernel, those regions needs to be preserved, which is why they are > >> >> >> >>> > > >> memblock_reserve()'d now. > >> >> >> >>> > > > > >> >> >> >>> > > > For my better understandings, who is actually accessing such regions > >> >> >> >>> > > > during boot time, uefi itself or efistub? > >> >> >> >>> > > > > >> >> >> >>> > > > >> >> >> >>> > > No, only the kernel. This is where the ACPI tables are stored. For > >> >> >> >>> > > instance, on QEMU we have > >> >> >> >>> > > > >> >> >> >>> > > ACPI: RSDP 0x0000000078980000 000024 (v02 BOCHS ) > >> >> >> >>> > > ACPI: XSDT 0x0000000078970000 000054 (v01 BOCHS BXPCFACP 00000001 > >> >> >> >>> > > 01000013) > >> >> >> >>> > > ACPI: FACP 0x0000000078930000 00010C (v05 BOCHS BXPCFACP 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: DSDT 0x0000000078940000 0011DA (v02 BOCHS BXPCDSDT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: APIC 0x0000000078920000 000140 (v03 BOCHS BXPCAPIC 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: GTDT 0x0000000078910000 000060 (v02 BOCHS BXPCGTDT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: MCFG 0x0000000078900000 00003C (v01 BOCHS BXPCMCFG 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: SPCR 0x00000000788F0000 000050 (v02 BOCHS BXPCSPCR 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > ACPI: IORT 0x00000000788E0000 00007C (v00 BOCHS BXPCIORT 00000001 > >> >> >> >>> > > BXPC 00000001) > >> >> >> >>> > > > >> >> >> >>> > > covered by > >> >> >> >>> > > > >> >> >> >>> > > efi: 0x0000788e0000-0x00007894ffff [ACPI Reclaim Memory ...] > >> >> >> >>> > > ... > >> >> >> >>> > > efi: 0x000078970000-0x00007898ffff [ACPI Reclaim Memory ...] > >> >> >> >>> > > >> >> >> >>> > OK. I mistakenly understood those regions could be freed after exiting > >> >> >> >>> > UEFI boot services. > >> >> >> >>> > > >> >> >> >>> > > > >> >> >> >>> > > >> So it seems that kexec does not honour the memblock_reserve() table > >> >> >> >>> > > >> when booting the next kernel. > >> >> >> >>> > > > > >> >> >> >>> > > > not really. > >> >> >> >>> > > > > >> >> >> >>> > > >> > (In other words, can or should we skip some part of ACPI-related init code > >> >> >> >>> > > >> > on crash dump kernel?) > >> >> >> >>> > > >> > > >> >> >> >>> > > >> > >> >> >> >>> > > >> I don't think so. And the change to the handling of ACPI reclaim > >> >> >> >>> > > >> regions only revealed the bug, not created it (given that other > >> >> >> >>> > > >> memblock_reserve regions may be affected as well) > >> >> >> >>> > > > > >> >> >> >>> > > > As whether we should honor such reserved regions over kexec'ing > >> >> >> >>> > > > depends on each one's specific nature, we will have to take care one-by-one. > >> >> >> >>> > > > As a matter of fact, no information about "reserved" memblocks is > >> >> >> >>> > > > exposed to user space (via proc/iomem). > >> >> >> >>> > > > > >> >> >> >>> > > > >> >> >> >>> > > That is why I suggested (somewhere in this thread?) to not expose them > >> >> >> >>> > > as 'System RAM'. Do you think that could solve this? > >> >> >> >>> > > >> >> >> >>> > Memblock-reserv'ing them is necessary to prevent their corruption and > >> >> >> >>> > marking them under another name in /proc/iomem would also be good in order > >> >> >> >>> > not to allocate them as part of crash kernel's memory. > >> >> >> >>> > > >> >> >> >>> > But I'm not still convinced that we should export them in useable- > >> >> >> >>> > memory-range to crash dump kernel. They will be accessed through > >> >> >> >>> > acpi_os_map_memory() and so won't be required to be part of system ram > >> >> >> >>> > (or memblocks), I guess. > >> >> >> >>> > -> Bhupesh? > >> >> >> >>> > >> >> >> >>> I forgot how arm64 kernel retrieve the memory ranges and initialize > >> >> >> >>> them. If no "e820" like interfaces shouldn't kernel reinitialize all > >> >> >> >>> the memory according to the efi memmap? For kdump kernel anything other > >> >> >> >>> than usable memory (which is from the dt node instead) should be > >> >> >> >>> reinitialized according to efi passed info, no? > >> >> >> >> > >> >> >> >> All the regions exported in efi memmap will be added to memblock.memory > >> >> >> >> in (u)efi_init() and then trimmed down to the exact range specified as > >> >> >> >> usable-memory-range by fdt_enforce_memory_region(). > >> >> >> >> > >> >> >> >> Now I noticed that the current fdt_enforce_memory_region() may not work well > >> >> >> >> with multiple entries in usable-memory-range. > >> >> >> >> > >> >> >> > > >> >> >> > In any case, the root of the problem is that memory regions lose their > >> >> >> > 'memory' annotation due to the way the memory map is mangled before > >> >> >> > being supplied to the kexec kernel. > >> >> >> > > >> >> >> > Would it be possible to classify all memory that we want to hide from > >> >> >> > the kexec kernel as NOMAP instead? That way, it will not be mapped > >> >> >> > implicitly, but will still be mapped cacheable by acpi_os_ioremap(), > >> >> >> > so this seems to be the most appropriate way to deal with the host > >> >> >> > kernel's memory contents. > >> >> >> > >> >> >> Hmm. wouldn't appending the acpi reclaim regions to > >> >> >> 'linux,usable-memory-range' in the dtb being passed to the crashkernel > >> >> >> be better? Because its indirectly achieving a similar objective > >> >> >> (although may be a subset of all System RAM regions on the primary > >> >> >> kernel's memory). > >> >> >> > >> >> >> I am not aware of the background about the current kexec-tools > >> >> >> implementation where we add only the crashkernel range to the dtb > >> >> >> being passed to the crashkernel. > >> >> >> > >> >> >> Probably Akashi can answer better, as to how we arrived at this design > >> >> >> approach and why we didn't want to expose all System RAM regions (i.e. > >> >> >> ! NOMPAP regions) to the crashkernel. > >> >> >> > >> >> >> I am suspecting that some issues were seen/meet when the System RAM (! > >> >> >> NOMAP regions) were exposed to the crashkernel, and that's why we > >> >> >> finalized on this design approach, but this is something which is just > >> >> >> my guess. > >> >> >> > >> >> >> Regards, > >> >> >> Bhupesh > >> >> >> > >> >> >> >>> > > >> >> >> >>> > Just FYI, on x86, ACPI tables seems to be exposed to crash dump kernel > >> >> >> >>> > via a kernel command line parameter, "memmap=". > >> >> >> >>> > >> >> >> >>> memmap= is only used in old kexec-tools, now we are passing them via > >> >> >> >>> e820 table. > >> >> >> >> > >> >> >> >> Thanks. I remember that you have explained it before. > >> >> >> >> > >> >> >> >> -Takahiro AKASHI > >> >> >> >> > >> >> >> >>> [snip] > >> >> >> >>> > >> >> >> >>> Thanks > >> >> >> >>> Dave > >> >> > > >> >> > ===8<== > >> >> > From 74e2451fea83d546feae76160ba7de426913fe03 Mon Sep 17 00:00:00 2001 > >> >> > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > >> >> > Date: Thu, 21 Dec 2017 19:14:23 +0900 > >> >> > Subject: [PATCH] arm64: kdump: mark unusable memory as NOMAP > >> >> > > >> >> > --- > >> >> > arch/arm64/mm/init.c | 10 ++++++++-- > >> >> > 1 file changed, 8 insertions(+), 2 deletions(-) > >> >> > > >> >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> >> > index 00e7b900ca41..8175db94257b 100644 > >> >> > --- a/arch/arm64/mm/init.c > >> >> > +++ b/arch/arm64/mm/init.c > >> >> > @@ -352,11 +352,17 @@ static void __init fdt_enforce_memory_region(void) > >> >> > struct memblock_region reg = { > >> >> > .size = 0, > >> >> > }; > >> >> > + u64 idx; > >> >> > + phys_addr_t start, end; > >> >> > > >> >> > of_scan_flat_dt(early_init_dt_scan_usablemem, ®); > >> >> > > >> >> > - if (reg.size) > >> >> > - memblock_cap_memory_range(reg.base, reg.size); > >> >> > + if (reg.size) { > >> >> > + for_each_free_mem_range(idx, NUMA_NO_NODE, MEMBLOCK_NONE, > >> >> > + &start, &end, NULL) > >> >> > + memblock_mark_nomap(start, end - start); > >> >> > + memblock_clear_nomap(reg.base, reg.size); > >> >> > + } > >> >> > } > >> >> > > >> >> > void __init arm64_memblock_init(void) > >> >> > -- > >> >> > 2.15.1 > >> >> > > >> >> > >> >> Thanks for the patch. After applying this on top of > >> >> 4.15.0-rc4-next-20171220, there seems to be a improvement and the > >> >> crashkernel boot no longer hangs while trying to access the acpi > >> >> tables. > >> >> > >> >> However I notice a minor issue. Please see the log below for > >> >> reference, the following message keeps spamming the console but I see > >> >> the crashkernel boot proceed further.: > >> >> > >> >> [ 0.000000] ACPI: NUMA: SRAT: PXM 3 -> MPIDR 0x70303 -> Node 3 > >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x3fffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x2000000000-0x2fffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0x1fffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0xa000000000-0xafffffffff] > >> >> [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x9000000000-0x9fffffffff] > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffe200-0x1ffbffffff] > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffc400-0x1ffbffe1ff] > >> >> [ 0.000000] NUMA: NODE_DATA(1) on node 0 > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbffa600-0x1ffbffc3ff] > >> >> [ 0.000000] NUMA: NODE_DATA(2) on node 0 > >> >> [ 0.000000] NUMA: NODE_DATA [mem 0x1ffbff8800-0x1ffbffa5ff] > >> >> [ 0.000000] NUMA: NODE_DATA(3) on node 0 > >> >> [ 0.000000] [ffff7fe008000000-ffff7fe00800ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008010000-ffff7fe00801ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008020000-ffff7fe00802ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008030000-ffff7fe00803ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008040000-ffff7fe00804ffff] potential offnode > >> >> page_structs > >> >> [ 0.000000] [ffff7fe008050000-ffff7fe00805ffff] potential offnode > >> >> page_structs > >> >> > >> >> [snip..] > >> >> [ 0.000000] [ffff7fe0081f0000-ffff7fe0081fffff] potential offnode > >> >> page_structs > >> > > >> > These messages shows that some "struct page" data are allocated on remote > >> > (numa) nodes. > >> > Since on your crash dump kernel, all the usable system memory (starting > >> > 0x0e800000) belongs to Node#0, we can't avoid such non-local allocations. > >> > > >> > In my best guess, you can ingore them except for some performance penality. > >> > This may be one side-effect. > >> > > >> > So does your crash dump kernel now boot successfully? > >> > > >> > >> Indeed. The crash dump kernel now boots successfully and the crash > >> dump core can be saved properly as well (I tried saving it to local > >> disk). > > > > Thank you for the confirmation. > > (I'd like to suggest you to examine the core dump with crash utility.) > > > >> However, the 'potential offnode page_structs' WARN messages hog the > >> console and delay crashkernel boot for a significant duration, which > >> can be irritating. > >> > >> Can we also consider ratelimiting this WARNING message [which seems to > >> come from vmemmap_verify()] if invoked in the context of crash kernel, > >> in addition to making the above change suggested by you. > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > I hope that adding "numa=off" to kernel command line should also work. > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > my initial thought process as well, but I am not sure if this will > cause any regressions on aarch64 systems which use crashdump feature. It should be fine since we use numa=off by default for all other arches ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save mm component memory usage. > > I think the 2nd solution, i.e limiting the warn message print > frequency might be a better option. Can you please add the following > patch (may be as a separate one) and send it along the patch which > marks all areas other than the crashkernel region being passed to the > crashkernel as NOMAP, so that we can get this issue fixed in upstream > aarch64 kernel: > > diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c > index 17acf01791fa..4c13fe3c644d 100644 > --- a/mm/sparse-vmemmap.c > +++ b/mm/sparse-vmemmap.c > @@ -169,7 +169,7 @@ void __meminit vmemmap_verify(pte_t *pte, int node, > int actual_node = early_pfn_to_nid(pfn); > > if (node_distance(actual_node, node) > LOCAL_DISTANCE) > - pr_warn("[%lx-%lx] potential offnode page_structs\n", > + pr_warn_once("[%lx-%lx] potential offnode page_structs\n", > start, end - 1); > } > > I have tested this solution on huawei taishan board and can boot > crashkernel successfully and also save the crash core properly > (without the console warn message flooding which used to hold up the > crashkernel boot). > > Thanks, > Bhupesh Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171226013217.GA2119-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 1:32 ` Dave Young (?) @ 2017-12-26 1:35 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 1:35 UTC (permalink / raw) To: Bhupesh Sharma Cc: AKASHI Takahiro, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r [snip] > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > I hope that adding "numa=off" to kernel command line should also work. > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > my initial thought process as well, but I am not sure if this will > > cause any regressions on aarch64 systems which use crashdump feature. > > It should be fine since we use numa=off by default for all other arches > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > mm component memory usage. > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 1:35 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 1:35 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, AKASHI Takahiro, James Morse, Bhupesh SHARMA, linux-arm-kernel [snip] > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > I hope that adding "numa=off" to kernel command line should also work. > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > my initial thought process as well, but I am not sure if this will > > cause any regressions on aarch64 systems which use crashdump feature. > > It should be fine since we use numa=off by default for all other arches > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > mm component memory usage. > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 1:35 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 1:35 UTC (permalink / raw) To: linux-arm-kernel [snip] > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > I hope that adding "numa=off" to kernel command line should also work. > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > my initial thought process as well, but I am not sure if this will > > cause any regressions on aarch64 systems which use crashdump feature. > > It should be fine since we use numa=off by default for all other arches > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > mm component memory usage. > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171226013517.GA2186-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 1:35 ` Dave Young (?) @ 2017-12-26 2:28 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-26 2:28 UTC (permalink / raw) To: Dave Young Cc: Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > [snip] > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > my initial thought process as well, but I am not sure if this will > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > It should be fine since we use numa=off by default for all other arches > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > mm component memory usage. > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. Thank you for the clarification. (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) -Takahiro AKASHI ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 2:28 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-26 2:28 UTC (permalink / raw) To: Dave Young Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, Bhupesh Sharma, kexec, James Morse, Bhupesh SHARMA, linux-arm-kernel On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > [snip] > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > my initial thought process as well, but I am not sure if this will > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > It should be fine since we use numa=off by default for all other arches > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > mm component memory usage. > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. Thank you for the clarification. (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) -Takahiro AKASHI _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 2:28 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2017-12-26 2:28 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > [snip] > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > my initial thought process as well, but I am not sure if this will > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > It should be fine since we use numa=off by default for all other arches > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > mm component memory usage. > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. Thank you for the clarification. (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) -Takahiro AKASHI ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171226022807.GB8877-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 2:28 ` AKASHI Takahiro (?) @ 2017-12-26 2:56 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-26 2:56 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >> [snip] >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >> > > > I hope that adding "numa=off" to kernel command line should also work. >> > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >> > > my initial thought process as well, but I am not sure if this will >> > > cause any regressions on aarch64 systems which use crashdump feature. >> > >> > It should be fine since we use numa=off by default for all other arches >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >> > mm component memory usage. >> > >> >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > Thank you for the clarification. > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > Not sure if we can leave this to the distribution-specific kdump scripts (as the crashkernel boot can be held up for sufficient time and may appear stuck). The distribution scripts may be different (for e.g. ubuntu and RHEL/fedora) across distributions and may have different bootarg options. So how about considering a kernel fix only which doesn't require relying on changing the distribution-specific kdump scripts, as we should avoid introducing a regression while trying to fix a regression :) Just my 2 cents. Thanks, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 2:56 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-26 2:56 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >> [snip] >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >> > > > I hope that adding "numa=off" to kernel command line should also work. >> > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >> > > my initial thought process as well, but I am not sure if this will >> > > cause any regressions on aarch64 systems which use crashdump feature. >> > >> > It should be fine since we use numa=off by default for all other arches >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >> > mm component memory usage. >> > >> >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > Thank you for the clarification. > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > Not sure if we can leave this to the distribution-specific kdump scripts (as the crashkernel boot can be held up for sufficient time and may appear stuck). The distribution scripts may be different (for e.g. ubuntu and RHEL/fedora) across distributions and may have different bootarg options. So how about considering a kernel fix only which doesn't require relying on changing the distribution-specific kdump scripts, as we should avoid introducing a regression while trying to fix a regression :) Just my 2 cents. Thanks, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 2:56 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2017-12-26 2:56 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >> [snip] >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >> > > > I hope that adding "numa=off" to kernel command line should also work. >> > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >> > > my initial thought process as well, but I am not sure if this will >> > > cause any regressions on aarch64 systems which use crashdump feature. >> > >> > It should be fine since we use numa=off by default for all other arches >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >> > mm component memory usage. >> > >> >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > Thank you for the clarification. > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > Not sure if we can leave this to the distribution-specific kdump scripts (as the crashkernel boot can be held up for sufficient time and may appear stuck). The distribution scripts may be different (for e.g. ubuntu and RHEL/fedora) across distributions and may have different bootarg options. So how about considering a kernel fix only which doesn't require relying on changing the distribution-specific kdump scripts, as we should avoid introducing a regression while trying to fix a regression :) Just my 2 cents. Thanks, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpNRtXh-j9Y9HwRatDZwRMr++-ZeaSnk62vD3btpxsVv7w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 2:56 ` Bhupesh Sharma (?) @ 2017-12-26 6:58 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 6:58 UTC (permalink / raw) To: Bhupesh Sharma Cc: AKASHI Takahiro, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On 12/26/17 at 08:26am, Bhupesh Sharma wrote: > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > >> [snip] > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > >> > > > I hope that adding "numa=off" to kernel command line should also work. > >> > > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > >> > > my initial thought process as well, but I am not sure if this will > >> > > cause any regressions on aarch64 systems which use crashdump feature. > >> > > >> > It should be fine since we use numa=off by default for all other arches > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > >> > mm component memory usage. > >> > > >> > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > Thank you for the clarification. > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > > > Not sure if we can leave this to the distribution-specific kdump > scripts (as the crashkernel boot can be held up for sufficient time > and may appear stuck). The distribution scripts may be different (for > e.g. ubuntu and RHEL/fedora) across distributions and may have > different bootarg options. Personally I think distribution should take care of this param as for kdump. But as AKASHI said it could be a issue for 1st kernel with nr_cpus=1 booting. Problem is why we do not see this issue on other machines. > > So how about considering a kernel fix only which doesn't require > relying on changing the distribution-specific kdump scripts, as we > should avoid introducing a regression while trying to fix a regression > :) > > Just my 2 cents. > > Thanks, > Bhupesh Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 6:58 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 6:58 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, AKASHI Takahiro, James Morse, Bhupesh SHARMA, linux-arm-kernel On 12/26/17 at 08:26am, Bhupesh Sharma wrote: > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > >> [snip] > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > >> > > > I hope that adding "numa=off" to kernel command line should also work. > >> > > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > >> > > my initial thought process as well, but I am not sure if this will > >> > > cause any regressions on aarch64 systems which use crashdump feature. > >> > > >> > It should be fine since we use numa=off by default for all other arches > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > >> > mm component memory usage. > >> > > >> > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > Thank you for the clarification. > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > > > Not sure if we can leave this to the distribution-specific kdump > scripts (as the crashkernel boot can be held up for sufficient time > and may appear stuck). The distribution scripts may be different (for > e.g. ubuntu and RHEL/fedora) across distributions and may have > different bootarg options. Personally I think distribution should take care of this param as for kdump. But as AKASHI said it could be a issue for 1st kernel with nr_cpus=1 booting. Problem is why we do not see this issue on other machines. > > So how about considering a kernel fix only which doesn't require > relying on changing the distribution-specific kdump scripts, as we > should avoid introducing a regression while trying to fix a regression > :) > > Just my 2 cents. > > Thanks, > Bhupesh Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 6:58 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 6:58 UTC (permalink / raw) To: linux-arm-kernel On 12/26/17 at 08:26am, Bhupesh Sharma wrote: > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > >> [snip] > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > >> > > > I hope that adding "numa=off" to kernel command line should also work. > >> > > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > >> > > my initial thought process as well, but I am not sure if this will > >> > > cause any regressions on aarch64 systems which use crashdump feature. > >> > > >> > It should be fine since we use numa=off by default for all other arches > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > >> > mm component memory usage. > >> > > >> > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > Thank you for the clarification. > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > > > Not sure if we can leave this to the distribution-specific kdump > scripts (as the crashkernel boot can be held up for sufficient time > and may appear stuck). The distribution scripts may be different (for > e.g. ubuntu and RHEL/fedora) across distributions and may have > different bootarg options. Personally I think distribution should take care of this param as for kdump. But as AKASHI said it could be a issue for 1st kernel with nr_cpus=1 booting. Problem is why we do not see this issue on other machines. > > So how about considering a kernel fix only which doesn't require > relying on changing the distribution-specific kdump scripts, as we > should avoid introducing a regression while trying to fix a regression > :) > > Just my 2 cents. > > Thanks, > Bhupesh Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171226065845.GB5354-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 6:58 ` Dave Young (?) @ 2018-01-09 5:22 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 5:22 UTC (permalink / raw) To: Dave Young Cc: Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Tue, Dec 26, 2017 at 02:58:45PM +0800, Dave Young wrote: > On 12/26/17 at 08:26am, Bhupesh Sharma wrote: > > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > >> [snip] > > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > >> > > > I hope that adding "numa=off" to kernel command line should also work. > > >> > > > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > >> > > my initial thought process as well, but I am not sure if this will > > >> > > cause any regressions on aarch64 systems which use crashdump feature. > > >> > > > >> > It should be fine since we use numa=off by default for all other arches > > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > >> > mm component memory usage. > > >> > > > >> > > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > > > Thank you for the clarification. > > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > > > > > > Not sure if we can leave this to the distribution-specific kdump > > scripts (as the crashkernel boot can be held up for sufficient time > > and may appear stuck). The distribution scripts may be different (for > > e.g. ubuntu and RHEL/fedora) across distributions and may have > > different bootarg options. > > Personally I think distribution should take care of this param as for > kdump. But as AKASHI said it could be a issue for 1st kernel with > nr_cpus=1 booting. Problem is why we do not see this issue on other > machines. The issue won't be kdump-specific. Theoretically, it also takes place when "mem=" is specified on numa. Since we can avoid annoying messages by adding "numa=off", I'm reluctant to suppress most of messages but the first. My suggestion here is to add some notes in Documentation/kdump/kdump.txt regarding NUMA case. Thanks, Takahiro AKASHI > > > > So how about considering a kernel fix only which doesn't require > > relying on changing the distribution-specific kdump scripts, as we > > should avoid introducing a regression while trying to fix a regression > > :) > > > > Just my 2 cents. > > > > Thanks, > > Bhupesh > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 5:22 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 5:22 UTC (permalink / raw) To: Dave Young Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, Bhupesh Sharma, kexec, James Morse, Bhupesh SHARMA, linux-arm-kernel On Tue, Dec 26, 2017 at 02:58:45PM +0800, Dave Young wrote: > On 12/26/17 at 08:26am, Bhupesh Sharma wrote: > > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > >> [snip] > > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > >> > > > I hope that adding "numa=off" to kernel command line should also work. > > >> > > > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > >> > > my initial thought process as well, but I am not sure if this will > > >> > > cause any regressions on aarch64 systems which use crashdump feature. > > >> > > > >> > It should be fine since we use numa=off by default for all other arches > > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > >> > mm component memory usage. > > >> > > > >> > > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > > > Thank you for the clarification. > > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > > > > > > Not sure if we can leave this to the distribution-specific kdump > > scripts (as the crashkernel boot can be held up for sufficient time > > and may appear stuck). The distribution scripts may be different (for > > e.g. ubuntu and RHEL/fedora) across distributions and may have > > different bootarg options. > > Personally I think distribution should take care of this param as for > kdump. But as AKASHI said it could be a issue for 1st kernel with > nr_cpus=1 booting. Problem is why we do not see this issue on other > machines. The issue won't be kdump-specific. Theoretically, it also takes place when "mem=" is specified on numa. Since we can avoid annoying messages by adding "numa=off", I'm reluctant to suppress most of messages but the first. My suggestion here is to add some notes in Documentation/kdump/kdump.txt regarding NUMA case. Thanks, Takahiro AKASHI > > > > So how about considering a kernel fix only which doesn't require > > relying on changing the distribution-specific kdump scripts, as we > > should avoid introducing a regression while trying to fix a regression > > :) > > > > Just my 2 cents. > > > > Thanks, > > Bhupesh > > Thanks > Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 5:22 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 5:22 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 26, 2017 at 02:58:45PM +0800, Dave Young wrote: > On 12/26/17 at 08:26am, Bhupesh Sharma wrote: > > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > >> [snip] > > >> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > >> > > > I hope that adding "numa=off" to kernel command line should also work. > > >> > > > > >> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > >> > > my initial thought process as well, but I am not sure if this will > > >> > > cause any regressions on aarch64 systems which use crashdump feature. > > >> > > > >> > It should be fine since we use numa=off by default for all other arches > > >> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > >> > mm component memory usage. > > >> > > > >> > > >> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > > > Thank you for the clarification. > > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > > > > > > Not sure if we can leave this to the distribution-specific kdump > > scripts (as the crashkernel boot can be held up for sufficient time > > and may appear stuck). The distribution scripts may be different (for > > e.g. ubuntu and RHEL/fedora) across distributions and may have > > different bootarg options. > > Personally I think distribution should take care of this param as for > kdump. But as AKASHI said it could be a issue for 1st kernel with > nr_cpus=1 booting. Problem is why we do not see this issue on other > machines. The issue won't be kdump-specific. Theoretically, it also takes place when "mem=" is specified on numa. Since we can avoid annoying messages by adding "numa=off", I'm reluctant to suppress most of messages but the first. My suggestion here is to add some notes in Documentation/kdump/kdump.txt regarding NUMA case. Thanks, Takahiro AKASHI > > > > So how about considering a kernel fix only which doesn't require > > relying on changing the distribution-specific kdump scripts, as we > > should avoid introducing a regression while trying to fix a regression > > :) > > > > Just my 2 cents. > > > > Thanks, > > Bhupesh > > Thanks > Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 2:56 ` Bhupesh Sharma (?) @ 2018-01-08 20:00 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2018-01-08 20:00 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Hello Akashi, On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >>> [snip] >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >>> > > > I hope that adding "numa=off" to kernel command line should also work. >>> > > >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >>> > > my initial thought process as well, but I am not sure if this will >>> > > cause any regressions on aarch64 systems which use crashdump feature. >>> > >>> > It should be fine since we use numa=off by default for all other arches >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >>> > mm component memory usage. >>> > >>> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. >> >> Thank you for the clarification. >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) >> > > Not sure if we can leave this to the distribution-specific kdump > scripts (as the crashkernel boot can be held up for sufficient time > and may appear stuck). The distribution scripts may be different (for > e.g. ubuntu and RHEL/fedora) across distributions and may have > different bootarg options. > > So how about considering a kernel fix only which doesn't require > relying on changing the distribution-specific kdump scripts, as we > should avoid introducing a regression while trying to fix a regression > :) > > Just my 2 cents. > Sorry for the delay but I was on holidays in the last week. Are you planning to send a patch to fix this issue or do you want me to send a RFC version instead? i think this is a blocking issue for aarch64 kdump support on newer kernels (v4.14) and we are already hearing about this issue from other users as well, so it would be great to get this fixed now that we have root-caused the issue and found a possible way around. Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-08 20:00 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2018-01-08 20:00 UTC (permalink / raw) To: AKASHI Takahiro, Dave Young, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec Hello Akashi, On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote: > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >>> [snip] >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >>> > > > I hope that adding "numa=off" to kernel command line should also work. >>> > > >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >>> > > my initial thought process as well, but I am not sure if this will >>> > > cause any regressions on aarch64 systems which use crashdump feature. >>> > >>> > It should be fine since we use numa=off by default for all other arches >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >>> > mm component memory usage. >>> > >>> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. >> >> Thank you for the clarification. >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) >> > > Not sure if we can leave this to the distribution-specific kdump > scripts (as the crashkernel boot can be held up for sufficient time > and may appear stuck). The distribution scripts may be different (for > e.g. ubuntu and RHEL/fedora) across distributions and may have > different bootarg options. > > So how about considering a kernel fix only which doesn't require > relying on changing the distribution-specific kdump scripts, as we > should avoid introducing a regression while trying to fix a regression > :) > > Just my 2 cents. > Sorry for the delay but I was on holidays in the last week. Are you planning to send a patch to fix this issue or do you want me to send a RFC version instead? i think this is a blocking issue for aarch64 kdump support on newer kernels (v4.14) and we are already hearing about this issue from other users as well, so it would be great to get this fixed now that we have root-caused the issue and found a possible way around. Regards, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-08 20:00 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2018-01-08 20:00 UTC (permalink / raw) To: linux-arm-kernel Hello Akashi, On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote: > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > <takahiro.akashi@linaro.org> wrote: >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >>> [snip] >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >>> > > > I hope that adding "numa=off" to kernel command line should also work. >>> > > >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >>> > > my initial thought process as well, but I am not sure if this will >>> > > cause any regressions on aarch64 systems which use crashdump feature. >>> > >>> > It should be fine since we use numa=off by default for all other arches >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >>> > mm component memory usage. >>> > >>> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. >> >> Thank you for the clarification. >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) >> > > Not sure if we can leave this to the distribution-specific kdump > scripts (as the crashkernel boot can be held up for sufficient time > and may appear stuck). The distribution scripts may be different (for > e.g. ubuntu and RHEL/fedora) across distributions and may have > different bootarg options. > > So how about considering a kernel fix only which doesn't require > relying on changing the distribution-specific kdump scripts, as we > should avoid introducing a regression while trying to fix a regression > :) > > Just my 2 cents. > Sorry for the delay but I was on holidays in the last week. Are you planning to send a patch to fix this issue or do you want me to send a RFC version instead? i think this is a blocking issue for aarch64 kdump support on newer kernels (v4.14) and we are already hearing about this issue from other users as well, so it would be great to get this fixed now that we have root-caused the issue and found a possible way around. Regards, Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <CACi5LpNeSNHoUcM9xOq0bjN_okaEUDbaz1qyuqAct7BSNLQqKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2018-01-08 20:00 ` Bhupesh Sharma (?) @ 2018-01-09 4:42 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 4:42 UTC (permalink / raw) To: Bhupesh Sharma Cc: Dave Young, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Bhupesh, On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote: > Hello Akashi, > > On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > >>> [snip] > >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > >>> > > > I hope that adding "numa=off" to kernel command line should also work. > >>> > > > >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > >>> > > my initial thought process as well, but I am not sure if this will > >>> > > cause any regressions on aarch64 systems which use crashdump feature. > >>> > > >>> > It should be fine since we use numa=off by default for all other arches > >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > >>> > mm component memory usage. > >>> > > >>> > >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > >> > >> Thank you for the clarification. > >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > >> > > > > Not sure if we can leave this to the distribution-specific kdump > > scripts (as the crashkernel boot can be held up for sufficient time > > and may appear stuck). The distribution scripts may be different (for > > e.g. ubuntu and RHEL/fedora) across distributions and may have > > different bootarg options. > > > > So how about considering a kernel fix only which doesn't require > > relying on changing the distribution-specific kdump scripts, as we > > should avoid introducing a regression while trying to fix a regression > > :) > > > > Just my 2 cents. > > > > Sorry for the delay but I was on holidays in the last week. > > Are you planning to send a patch to fix this issue or do you want me > to send a RFC version instead? I should have submitted my own patch before my new year holidays, but I will do so as soon as possible. Thanks, -Takahiro AKASHI > i think this is a blocking issue for aarch64 kdump support on newer > kernels (v4.14) and we are already hearing about this issue from other > users as well, so it would be great to get this fixed now that we have > root-caused the issue and found a possible way around. > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 4:42 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 4:42 UTC (permalink / raw) To: Bhupesh Sharma Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, kexec, James Morse, Bhupesh SHARMA, Dave Young, linux-arm-kernel Bhupesh, On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote: > Hello Akashi, > > On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote: > > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > >>> [snip] > >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > >>> > > > I hope that adding "numa=off" to kernel command line should also work. > >>> > > > >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > >>> > > my initial thought process as well, but I am not sure if this will > >>> > > cause any regressions on aarch64 systems which use crashdump feature. > >>> > > >>> > It should be fine since we use numa=off by default for all other arches > >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > >>> > mm component memory usage. > >>> > > >>> > >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > >> > >> Thank you for the clarification. > >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > >> > > > > Not sure if we can leave this to the distribution-specific kdump > > scripts (as the crashkernel boot can be held up for sufficient time > > and may appear stuck). The distribution scripts may be different (for > > e.g. ubuntu and RHEL/fedora) across distributions and may have > > different bootarg options. > > > > So how about considering a kernel fix only which doesn't require > > relying on changing the distribution-specific kdump scripts, as we > > should avoid introducing a regression while trying to fix a regression > > :) > > > > Just my 2 cents. > > > > Sorry for the delay but I was on holidays in the last week. > > Are you planning to send a patch to fix this issue or do you want me > to send a RFC version instead? I should have submitted my own patch before my new year holidays, but I will do so as soon as possible. Thanks, -Takahiro AKASHI > i think this is a blocking issue for aarch64 kdump support on newer > kernels (v4.14) and we are already hearing about this issue from other > users as well, so it would be great to get this fixed now that we have > root-caused the issue and found a possible way around. > > Regards, > Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 4:42 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 4:42 UTC (permalink / raw) To: linux-arm-kernel Bhupesh, On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote: > Hello Akashi, > > On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote: > > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro > > <takahiro.akashi@linaro.org> wrote: > >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > >>> [snip] > >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > >>> > > > I hope that adding "numa=off" to kernel command line should also work. > >>> > > > >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > >>> > > my initial thought process as well, but I am not sure if this will > >>> > > cause any regressions on aarch64 systems which use crashdump feature. > >>> > > >>> > It should be fine since we use numa=off by default for all other arches > >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > >>> > mm component memory usage. > >>> > > >>> > >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > >> > >> Thank you for the clarification. > >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > >> > > > > Not sure if we can leave this to the distribution-specific kdump > > scripts (as the crashkernel boot can be held up for sufficient time > > and may appear stuck). The distribution scripts may be different (for > > e.g. ubuntu and RHEL/fedora) across distributions and may have > > different bootarg options. > > > > So how about considering a kernel fix only which doesn't require > > relying on changing the distribution-specific kdump scripts, as we > > should avoid introducing a regression while trying to fix a regression > > :) > > > > Just my 2 cents. > > > > Sorry for the delay but I was on holidays in the last week. > > Are you planning to send a patch to fix this issue or do you want me > to send a RFC version instead? I should have submitted my own patch before my new year holidays, but I will do so as soon as possible. Thanks, -Takahiro AKASHI > i think this is a blocking issue for aarch64 kdump support on newer > kernels (v4.14) and we are already hearing about this issue from other > users as well, so it would be great to get this fixed now that we have > root-caused the issue and found a possible way around. > > Regards, > Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20180109030717.GA18820-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2018-01-09 4:42 ` AKASHI Takahiro (?) @ 2018-01-09 11:46 ` Bhupesh Sharma -1 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2018-01-09 11:46 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Dave Young, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Tue, Jan 9, 2018 at 10:12 AM, AKASHI Takahiro <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > Bhupesh, > > On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote: >> Hello Akashi, >> >> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro >> > <takahiro.akashi-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: >> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >> >>> [snip] >> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >> >>> > > > I hope that adding "numa=off" to kernel command line should also work. >> >>> > > >> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >> >>> > > my initial thought process as well, but I am not sure if this will >> >>> > > cause any regressions on aarch64 systems which use crashdump feature. >> >>> > >> >>> > It should be fine since we use numa=off by default for all other arches >> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >> >>> > mm component memory usage. >> >>> > >> >>> >> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. >> >> >> >> Thank you for the clarification. >> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) >> >> >> > >> > Not sure if we can leave this to the distribution-specific kdump >> > scripts (as the crashkernel boot can be held up for sufficient time >> > and may appear stuck). The distribution scripts may be different (for >> > e.g. ubuntu and RHEL/fedora) across distributions and may have >> > different bootarg options. >> > >> > So how about considering a kernel fix only which doesn't require >> > relying on changing the distribution-specific kdump scripts, as we >> > should avoid introducing a regression while trying to fix a regression >> > :) >> > >> > Just my 2 cents. >> > >> >> Sorry for the delay but I was on holidays in the last week. >> >> Are you planning to send a patch to fix this issue or do you want me >> to send a RFC version instead? > > I should have submitted my own patch before my new year holidays, > but I will do so as soon as possible. Thanks for the confirmation. I will look forward to the patches and give them a go on the arm64 boards available with me. Regards, Bhupesh > >> i think this is a blocking issue for aarch64 kdump support on newer >> kernels (v4.14) and we are already hearing about this issue from other >> users as well, so it would be great to get this fixed now that we have >> root-caused the issue and found a possible way around. >> >> Regards, >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 11:46 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2018-01-09 11:46 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Dave Young, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec On Tue, Jan 9, 2018 at 10:12 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote: >> Hello Akashi, >> >> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote: >> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >> >>> [snip] >> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >> >>> > > > I hope that adding "numa=off" to kernel command line should also work. >> >>> > > >> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >> >>> > > my initial thought process as well, but I am not sure if this will >> >>> > > cause any regressions on aarch64 systems which use crashdump feature. >> >>> > >> >>> > It should be fine since we use numa=off by default for all other arches >> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >> >>> > mm component memory usage. >> >>> > >> >>> >> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. >> >> >> >> Thank you for the clarification. >> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) >> >> >> > >> > Not sure if we can leave this to the distribution-specific kdump >> > scripts (as the crashkernel boot can be held up for sufficient time >> > and may appear stuck). The distribution scripts may be different (for >> > e.g. ubuntu and RHEL/fedora) across distributions and may have >> > different bootarg options. >> > >> > So how about considering a kernel fix only which doesn't require >> > relying on changing the distribution-specific kdump scripts, as we >> > should avoid introducing a regression while trying to fix a regression >> > :) >> > >> > Just my 2 cents. >> > >> >> Sorry for the delay but I was on holidays in the last week. >> >> Are you planning to send a patch to fix this issue or do you want me >> to send a RFC version instead? > > I should have submitted my own patch before my new year holidays, > but I will do so as soon as possible. Thanks for the confirmation. I will look forward to the patches and give them a go on the arm64 boards available with me. Regards, Bhupesh > >> i think this is a blocking issue for aarch64 kdump support on newer >> kernels (v4.14) and we are already hearing about this issue from other >> users as well, so it would be great to get this fixed now that we have >> root-caused the issue and found a possible way around. >> >> Regards, >> Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 11:46 ` Bhupesh Sharma 0 siblings, 0 replies; 135+ messages in thread From: Bhupesh Sharma @ 2018-01-09 11:46 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jan 9, 2018 at 10:12 AM, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > Bhupesh, > > On Tue, Jan 09, 2018 at 01:30:07AM +0530, Bhupesh Sharma wrote: >> Hello Akashi, >> >> On Tue, Dec 26, 2017 at 8:26 AM, Bhupesh Sharma <bhsharma@redhat.com> wrote: >> > On Tue, Dec 26, 2017 at 7:58 AM, AKASHI Takahiro >> > <takahiro.akashi@linaro.org> wrote: >> >> On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: >> >>> [snip] >> >>> > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but >> >>> > > > I hope that adding "numa=off" to kernel command line should also work. >> >>> > > >> >>> > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was >> >>> > > my initial thought process as well, but I am not sure if this will >> >>> > > cause any regressions on aarch64 systems which use crashdump feature. >> >>> > >> >>> > It should be fine since we use numa=off by default for all other arches >> >>> > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save >> >>> > mm component memory usage. >> >>> > >> >>> >> >>> Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. >> >> >> >> Thank you for the clarification. >> >> (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) >> >> >> > >> > Not sure if we can leave this to the distribution-specific kdump >> > scripts (as the crashkernel boot can be held up for sufficient time >> > and may appear stuck). The distribution scripts may be different (for >> > e.g. ubuntu and RHEL/fedora) across distributions and may have >> > different bootarg options. >> > >> > So how about considering a kernel fix only which doesn't require >> > relying on changing the distribution-specific kdump scripts, as we >> > should avoid introducing a regression while trying to fix a regression >> > :) >> > >> > Just my 2 cents. >> > >> >> Sorry for the delay but I was on holidays in the last week. >> >> Are you planning to send a patch to fix this issue or do you want me >> to send a RFC version instead? > > I should have submitted my own patch before my new year holidays, > but I will do so as soon as possible. Thanks for the confirmation. I will look forward to the patches and give them a go on the arm64 boards available with me. Regards, Bhupesh > >> i think this is a blocking issue for aarch64 kdump support on newer >> kernels (v4.14) and we are already hearing about this issue from other >> users as well, so it would be great to get this fixed now that we have >> root-caused the issue and found a possible way around. >> >> Regards, >> Bhupesh ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 2:28 ` AKASHI Takahiro (?) @ 2017-12-26 6:56 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 6:56 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On 12/26/17 at 11:28am, AKASHI Takahiro wrote: > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > [snip] > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > > my initial thought process as well, but I am not sure if this will > > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > > > It should be fine since we use numa=off by default for all other arches > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > > mm component memory usage. > > > > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > Thank you for the clarification. > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off I'm not sure why I do not see the warning messages on x86 machines, maybe something arm64 specific? > > -Takahiro AKASHI ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 6:56 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 6:56 UTC (permalink / raw) To: AKASHI Takahiro, Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel, linux-efi, Mark Rutland, James Morse, kexec On 12/26/17 at 11:28am, AKASHI Takahiro wrote: > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > [snip] > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > > my initial thought process as well, but I am not sure if this will > > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > > > It should be fine since we use numa=off by default for all other arches > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > > mm component memory usage. > > > > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > Thank you for the clarification. > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off I'm not sure why I do not see the warning messages on x86 machines, maybe something arm64 specific? > > -Takahiro AKASHI _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-12-26 6:56 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-12-26 6:56 UTC (permalink / raw) To: linux-arm-kernel On 12/26/17 at 11:28am, AKASHI Takahiro wrote: > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > [snip] > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > > my initial thought process as well, but I am not sure if this will > > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > > > It should be fine since we use numa=off by default for all other arches > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > > mm component memory usage. > > > > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > Thank you for the clarification. > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off I'm not sure why I do not see the warning messages on x86 machines, maybe something arm64 specific? > > -Takahiro AKASHI ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <20171226065636.GA5354-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>]
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-12-26 6:56 ` Dave Young (?) @ 2018-01-09 5:02 ` AKASHI Takahiro -1 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 5:02 UTC (permalink / raw) To: Dave Young Cc: Bhupesh Sharma, Ard Biesheuvel, Bhupesh SHARMA, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Tue, Dec 26, 2017 at 02:56:36PM +0800, Dave Young wrote: > On 12/26/17 at 11:28am, AKASHI Takahiro wrote: > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > > [snip] > > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > > > my initial thought process as well, but I am not sure if this will > > > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > > > > > It should be fine since we use numa=off by default for all other arches > > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > > > mm component memory usage. > > > > > > > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > Thank you for the clarification. > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off > I'm not sure why I do not see the warning messages on x86 > machines, maybe something arm64 specific? I didn't see the messages(i.e. "potential offnode page_structs") on arm64 qemu (with -smp 2 -numa node -numa node). It seems that qemu doesn't generate acpi slit(inter-node distance table). Thanks, -Takahiro AKASHI > > > > -Takahiro AKASHI ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 5:02 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 5:02 UTC (permalink / raw) To: Dave Young Cc: Mark Rutland, linux-efi, Ard Biesheuvel, Matt Fleming, Bhupesh Sharma, kexec, James Morse, Bhupesh SHARMA, linux-arm-kernel On Tue, Dec 26, 2017 at 02:56:36PM +0800, Dave Young wrote: > On 12/26/17 at 11:28am, AKASHI Takahiro wrote: > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > > [snip] > > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > > > my initial thought process as well, but I am not sure if this will > > > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > > > > > It should be fine since we use numa=off by default for all other arches > > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > > > mm component memory usage. > > > > > > > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > Thank you for the clarification. > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off > I'm not sure why I do not see the warning messages on x86 > machines, maybe something arm64 specific? I didn't see the messages(i.e. "potential offnode page_structs") on arm64 qemu (with -smp 2 -numa node -numa node). It seems that qemu doesn't generate acpi slit(inter-node distance table). Thanks, -Takahiro AKASHI > > > > -Takahiro AKASHI _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2018-01-09 5:02 ` AKASHI Takahiro 0 siblings, 0 replies; 135+ messages in thread From: AKASHI Takahiro @ 2018-01-09 5:02 UTC (permalink / raw) To: linux-arm-kernel On Tue, Dec 26, 2017 at 02:56:36PM +0800, Dave Young wrote: > On 12/26/17 at 11:28am, AKASHI Takahiro wrote: > > On Tue, Dec 26, 2017 at 09:35:17AM +0800, Dave Young wrote: > > > [snip] > > > > > > Well, we may be able to change pr_warn() to pr_warn_once() here, but > > > > > > I hope that adding "numa=off" to kernel command line should also work. > > > > > > > > > > Hmm, adding "numa=off" to crashkernel bootargs works, and TBH it was > > > > > my initial thought process as well, but I am not sure if this will > > > > > cause any regressions on aarch64 systems which use crashdump feature. > > > > > > > > It should be fine since we use numa=off by default for all other arches > > > > ie. x86, ppc64 and s390. Actually disabling numa in kdump kernel can save > > > > mm component memory usage. > > > > > > > > > > Forgot to say I means in RHEL and Fedora we use numa=off for kdump.. > > > > Thank you for the clarification. > > (It might be better to make numa off automatically if maxcpus == 0 (and 1?).) > > Hmm, I did a quick test with qemu/kvm, kdump kernel boot without numa=off > I'm not sure why I do not see the warning messages on x86 > machines, maybe something arm64 specific? I didn't see the messages(i.e. "potential offnode page_structs") on arm64 qemu (with -smp 2 -numa node -numa node). It seems that qemu doesn't generate acpi slit(inter-node distance table). Thanks, -Takahiro AKASHI > > > > -Takahiro AKASHI ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP 2017-11-15 10:58 ` Bhupesh Sharma @ 2017-11-24 8:47 ` Dave Young -1 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-11-24 8:47 UTC (permalink / raw) To: Bhupesh Sharma Cc: Ard Biesheuvel, AKASHI Takahiro, Matt Fleming, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-efi-u79uwXL29TY76Z2rM5mHXA, Mark Rutland, James Morse, Bhupesh SHARMA [snip] > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > > index 7768423b39d3..61d867647cca 100644 > > --- a/arch/arm64/kernel/setup.c > > +++ b/arch/arm64/kernel/setup.c > > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) > > > > for_each_memblock(memory, region) { > > res = alloc_bootmem_low(sizeof(*res)); > > - if (memblock_is_nomap(region)) { > > + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { > > res->name = "reserved"; > > res->flags = IORESOURCE_MEM; > > } else { > > > Bhupesh, does insert resource work in efi_init/reserve_regions()? Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
* arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP @ 2017-11-24 8:47 ` Dave Young 0 siblings, 0 replies; 135+ messages in thread From: Dave Young @ 2017-11-24 8:47 UTC (permalink / raw) To: linux-arm-kernel [snip] > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c > > index 7768423b39d3..61d867647cca 100644 > > --- a/arch/arm64/kernel/setup.c > > +++ b/arch/arm64/kernel/setup.c > > @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) > > > > for_each_memblock(memory, region) { > > res = alloc_bootmem_low(sizeof(*res)); > > - if (memblock_is_nomap(region)) { > > + if (memblock_is_nomap(region) || memblock_is_reserved(region)) { > > res->name = "reserved"; > > res->flags = IORESOURCE_MEM; > > } else { > > > Bhupesh, does insert resource work in efi_init/reserve_regions()? Thanks Dave ^ permalink raw reply [flat|nested] 135+ messages in thread
end of thread, other threads:[~2018-01-09 11:47 UTC | newest] Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-10 12:09 arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP Bhupesh Sharma 2017-11-10 12:09 ` Bhupesh Sharma [not found] ` <CACi5LpM_95ebYFguPTyjWk+qHT5rDJVXiYDkNWbszo6Zw41zRA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-11-10 12:11 ` Bhupesh Sharma 2017-11-10 12:11 ` Bhupesh Sharma [not found] ` <CACi5LpNV_E9pvhTwLcy6vtEj9qbL1ZEHe-5sv=iiW0k9JxPD1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-11-13 9:27 ` AKASHI Takahiro 2017-11-13 9:27 ` AKASHI Takahiro [not found] ` <20171113092730.GA29552-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-11-14 11:20 ` Ard Biesheuvel 2017-11-14 11:20 ` Ard Biesheuvel [not found] ` <CAKv+Gu_eQ-s0J22tKeHKJme4qXcvxvDkS7vKrNW+o_XtMTkMhQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-11-15 10:58 ` Bhupesh Sharma 2017-11-15 10:58 ` Bhupesh Sharma [not found] ` <3df4c6c5-0abe-01ee-730d-2edaa5f497d2-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-11-16 7:00 ` AKASHI Takahiro 2017-11-16 7:00 ` AKASHI Takahiro [not found] ` <20171116070005.GI29552-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-11-26 8:29 ` Bhupesh SHARMA 2017-11-26 8:29 ` Bhupesh SHARMA [not found] ` <CAFTCetQHmpprAVu6uYO+rc5Xi4EUVhmovbmSaU6nM1n1mAH62w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-04 14:02 ` Ard Biesheuvel 2017-12-04 14:02 ` Ard Biesheuvel [not found] ` <CAKv+Gu9oda1Ee8AoXsCEw+Bjn-XF3wZA_CsxvqhjtT6_bmJ7uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-12 21:51 ` Bhupesh Sharma 2017-12-12 21:51 ` Bhupesh Sharma [not found] ` <CACi5LpOZ=WOx14gTwH5jfLozepT2Jw8JSY5x+bfEZ_YaiQvFpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-13 10:26 ` AKASHI Takahiro 2017-12-13 10:26 ` AKASHI Takahiro [not found] ` <20171213102624.GC28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-13 10:49 ` Ard Biesheuvel 2017-12-13 10:49 ` Ard Biesheuvel [not found] ` <CAKv+Gu_BmFN9Zg861SCS+R=V4khFykjuOzkmfEknsL=NvWW3Eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-13 12:16 ` AKASHI Takahiro 2017-12-13 12:16 ` AKASHI Takahiro [not found] ` <20171213121605.GE28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-13 12:17 ` Ard Biesheuvel 2017-12-13 12:17 ` Ard Biesheuvel [not found] ` <CAKv+Gu_G8kBEAdAznVauZVAdJOFkr1vmu0Gf6tOwJfH2CgdufA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-13 19:22 ` Bhupesh SHARMA 2017-12-13 19:22 ` Bhupesh SHARMA 2017-12-15 8:59 ` AKASHI Takahiro 2017-12-15 8:59 ` AKASHI Takahiro 2017-12-15 9:35 ` Ard Biesheuvel 2017-12-15 9:35 ` Ard Biesheuvel [not found] ` <CAKv+Gu-W5VpVrgA=FVZCCevksaRGOVvPdE+B8WkpZc6AE1jOPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-17 21:01 ` Bhupesh Sharma 2017-12-17 21:01 ` Bhupesh Sharma 2017-12-18 5:16 ` Dave Young 2017-12-18 5:16 ` Dave Young 2017-12-18 5:16 ` Dave Young 2017-12-18 5:16 ` Dave Young 2017-12-18 5:54 ` AKASHI Takahiro 2017-12-18 5:54 ` AKASHI Takahiro 2017-12-18 5:54 ` AKASHI Takahiro 2017-12-18 5:54 ` AKASHI Takahiro 2017-12-18 8:59 ` Bhupesh SHARMA 2017-12-18 8:59 ` Bhupesh SHARMA 2017-12-18 8:59 ` Bhupesh SHARMA 2017-12-18 8:59 ` Bhupesh SHARMA [not found] ` <CAFTCetQ55zUKe25jSku0DHp8uVZA4hB32d5W6MSCNsTVpxu7Gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-18 11:18 ` AKASHI Takahiro 2017-12-18 11:18 ` AKASHI Takahiro 2017-12-18 11:18 ` AKASHI Takahiro 2017-12-18 11:18 ` AKASHI Takahiro 2017-12-18 22:28 ` Bhupesh Sharma 2017-12-18 22:28 ` Bhupesh Sharma 2017-12-18 22:28 ` Bhupesh Sharma 2017-12-18 22:28 ` Bhupesh Sharma 2017-12-19 5:01 ` AKASHI Takahiro 2017-12-19 5:01 ` AKASHI Takahiro 2017-12-19 5:01 ` AKASHI Takahiro 2017-12-19 5:01 ` AKASHI Takahiro [not found] ` <20171219050113.GF28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-20 19:52 ` Bhupesh Sharma 2017-12-20 19:52 ` Bhupesh Sharma 2017-12-20 19:52 ` Bhupesh Sharma 2017-12-20 19:52 ` Bhupesh Sharma 2017-12-18 21:28 ` Bhupesh Sharma 2017-12-18 21:28 ` Bhupesh Sharma 2017-12-18 21:28 ` Bhupesh Sharma 2017-12-18 21:28 ` Bhupesh Sharma 2017-12-19 5:25 ` AKASHI Takahiro 2017-12-19 5:25 ` AKASHI Takahiro 2017-12-19 5:25 ` AKASHI Takahiro 2017-12-19 5:25 ` AKASHI Takahiro 2017-12-18 5:40 ` Dave Young 2017-12-18 5:40 ` Dave Young 2017-12-18 5:43 ` Dave Young 2017-12-18 5:43 ` Dave Young 2017-12-18 5:43 ` Dave Young 2017-12-18 5:43 ` Dave Young [not found] ` <20171218054009.GA6392-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org> 2017-12-19 6:09 ` AKASHI Takahiro 2017-12-19 6:09 ` AKASHI Takahiro [not found] ` <20171219060927.GH28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-19 13:09 ` Ard Biesheuvel 2017-12-19 13:09 ` Ard Biesheuvel [not found] ` <CAKv+Gu-gmbWdZ7rxp5qGrtSBQ7dM=3FqF-Pw=J0LaL=oKTMg4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-20 20:00 ` Bhupesh Sharma 2017-12-20 20:00 ` Bhupesh Sharma [not found] ` <CACi5LpOscbcBecWaC3Q9P22kheRYc+M2Ynfusszk14fPY-cJ5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-21 10:34 ` AKASHI Takahiro 2017-12-21 10:34 ` AKASHI Takahiro 2017-12-21 10:34 ` AKASHI Takahiro [not found] ` <20171221103440.GJ28046-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-21 12:06 ` Bhupesh Sharma 2017-12-21 12:06 ` Bhupesh Sharma 2017-12-21 12:06 ` Bhupesh Sharma [not found] ` <CACi5LpMUnUKxiALAHW9_PE2RYC8GNWLPGpdJ5ca53g=v3rNkfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-22 8:33 ` AKASHI Takahiro 2017-12-22 8:33 ` AKASHI Takahiro 2017-12-22 8:33 ` AKASHI Takahiro 2017-12-23 19:51 ` Bhupesh Sharma 2017-12-23 19:51 ` Bhupesh Sharma 2017-12-23 19:51 ` Bhupesh Sharma [not found] ` <CACi5LpNF5i3Eo7nMLr_z9r4VVbXhDwSJCQoiOh-A_jB6hV0_2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-25 3:25 ` AKASHI Takahiro 2017-12-25 3:25 ` AKASHI Takahiro 2017-12-25 3:25 ` AKASHI Takahiro [not found] ` <20171225032500.GA8877-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-25 20:14 ` Bhupesh Sharma 2017-12-25 20:14 ` Bhupesh Sharma 2017-12-25 20:14 ` Bhupesh Sharma [not found] ` <CACi5LpMzYidDaC0_yfwgVOisH-FqcNViYj+Z54uKfUtHkJKKXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-26 1:32 ` Dave Young 2017-12-26 1:32 ` Dave Young 2017-12-26 1:32 ` Dave Young [not found] ` <20171226013217.GA2119-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org> 2017-12-26 1:35 ` Dave Young 2017-12-26 1:35 ` Dave Young 2017-12-26 1:35 ` Dave Young [not found] ` <20171226013517.GA2186-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org> 2017-12-26 2:28 ` AKASHI Takahiro 2017-12-26 2:28 ` AKASHI Takahiro 2017-12-26 2:28 ` AKASHI Takahiro [not found] ` <20171226022807.GB8877-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2017-12-26 2:56 ` Bhupesh Sharma 2017-12-26 2:56 ` Bhupesh Sharma 2017-12-26 2:56 ` Bhupesh Sharma [not found] ` <CACi5LpNRtXh-j9Y9HwRatDZwRMr++-ZeaSnk62vD3btpxsVv7w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-26 6:58 ` Dave Young 2017-12-26 6:58 ` Dave Young 2017-12-26 6:58 ` Dave Young [not found] ` <20171226065845.GB5354-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org> 2018-01-09 5:22 ` AKASHI Takahiro 2018-01-09 5:22 ` AKASHI Takahiro 2018-01-09 5:22 ` AKASHI Takahiro 2018-01-08 20:00 ` Bhupesh Sharma 2018-01-08 20:00 ` Bhupesh Sharma 2018-01-08 20:00 ` Bhupesh Sharma [not found] ` <CACi5LpNeSNHoUcM9xOq0bjN_okaEUDbaz1qyuqAct7BSNLQqKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-01-09 4:42 ` AKASHI Takahiro 2018-01-09 4:42 ` AKASHI Takahiro 2018-01-09 4:42 ` AKASHI Takahiro [not found] ` <20180109030717.GA18820-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> 2018-01-09 11:46 ` Bhupesh Sharma 2018-01-09 11:46 ` Bhupesh Sharma 2018-01-09 11:46 ` Bhupesh Sharma 2017-12-26 6:56 ` Dave Young 2017-12-26 6:56 ` Dave Young 2017-12-26 6:56 ` Dave Young [not found] ` <20171226065636.GA5354-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org> 2018-01-09 5:02 ` AKASHI Takahiro 2018-01-09 5:02 ` AKASHI Takahiro 2018-01-09 5:02 ` AKASHI Takahiro 2017-11-24 8:47 ` Dave Young 2017-11-24 8:47 ` Dave Young
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.