* makedumpfile: get_max_mapnr() from ELF header problem @ 2014-02-28 12:41 Michael Holzheu 2014-03-03 3:11 ` Atsushi Kumagai 2014-03-12 4:15 ` HATAYAMA Daisuke 0 siblings, 2 replies; 15+ messages in thread From: Michael Holzheu @ 2014-02-28 12:41 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec Hello Atsushi, On s390 we have the following little problem: We use hypervisor or stand-alone dump tools to create Linux system dumps. These tools do not know the kernel parameter line and dump the full physical memory. We use makedumpfile to filter those dumps. If a Linux system has specified the "mem=" parameter, the dump tools still dump the whole phypsical memory. Unfortunately in "get_max_mapnr()" makedumpfile uses the ELF header to get the maxmimum page frame number. Since this is not the correct value in our case makedumpfile fails to filter the dump. We get the following error on s390 with makedumpfile version 1.5.3: makedumpfile -c -d 31 vmcore dump.kdump cyclic buffer size has been changed: 22156083 => 22156032 Excluding unnecessary pages : [ 21 %] vtop_s390x: Address too big for the number of page table levels. readmem: Can't convert a virtual address(8000180104670) to physical address. readmem: type_addr: 0, addr:8000180104670, size:32768 __exclude_unnecessary_pages: Can't read the buffer of struct page. Excluding unnecessary pages : [ 23 %] vtop_s390x: Address too big for the number of page table levels. readmem: Can't convert a virtual address(8000180104670) to physical address. readmem: type_addr: 0, addr:8000180104670, size:327681.5.5 __exclude_unnecessary_pages: Can't read the buffer of struct page. Since version 1.5.4 makedumpfile seems to loop in __exclude_unnecessary_pages(). We thought about several ways to fix this problem but have not found a good solution up to now. Do you have an idea how we could fix that? Best Regards, Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: makedumpfile: get_max_mapnr() from ELF header problem 2014-02-28 12:41 makedumpfile: get_max_mapnr() from ELF header problem Michael Holzheu @ 2014-03-03 3:11 ` Atsushi Kumagai 2014-03-03 9:44 ` Michael Holzheu 2014-03-12 4:15 ` HATAYAMA Daisuke 1 sibling, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2014-03-03 3:11 UTC (permalink / raw) To: holzheu; +Cc: d.hatayama, kexec Hello Michael, >Hello Atsushi, > >On s390 we have the following little problem: > >We use hypervisor or stand-alone dump tools to create Linux system >dumps. These tools do not know the kernel parameter line and dump the >full physical memory. > >We use makedumpfile to filter those dumps. > >If a Linux system has specified the "mem=" parameter, the dump tools >still dump the whole phypsical memory. I guess this is a problem of the tools, it sounds that the tools ignore the actual memory map and just make wrong ELF headers. How do the tools decide the range of System RAM to create ELF headers ? At least, if the tools respect the actual memory map like /proc/vmcore, it can create correct ELF headers and makedumpfile will work normally. >Unfortunately in "get_max_mapnr()" makedumpfile uses the ELF header to >get the maxmimum page frame number. Since this is not the correct value >in our case makedumpfile fails to filter the dump. makedumpfile depends on the ELF file format, you know. I think you should fix the tools to create correct ELF files. Thanks Atsushi Kumagai >We get the following error on s390 with makedumpfile version 1.5.3: > >makedumpfile -c -d 31 vmcore dump.kdump >cyclic buffer size has been changed: 22156083 => 22156032 >Excluding unnecessary pages : [ 21 %] vtop_s390x: Address too big for the number of page table levels. >readmem: Can't convert a virtual address(8000180104670) to physical address. >readmem: type_addr: 0, addr:8000180104670, size:32768 >__exclude_unnecessary_pages: Can't read the buffer of struct page. >Excluding unnecessary pages : [ 23 %] vtop_s390x: Address too big for the number of page table levels. readmem: >Can't convert a >virtual address(8000180104670) to physical address. readmem: type_addr: >0, addr:8000180104670, size:327681.5.5 __exclude_unnecessary_pages: >Can't read the buffer of struct page. > >Since version 1.5.4 makedumpfile seems to loop in __exclude_unnecessary_pages(). > >We thought about several ways to fix this problem but have not found a >good solution up to now. > >Do you have an idea how we could fix that? > >Best Regards, >Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-03 3:11 ` Atsushi Kumagai @ 2014-03-03 9:44 ` Michael Holzheu 2014-03-11 6:22 ` Atsushi Kumagai 0 siblings, 1 reply; 15+ messages in thread From: Michael Holzheu @ 2014-03-03 9:44 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec On Mon, 3 Mar 2014 03:11:23 +0000 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > Hello Michael, > > >Hello Atsushi, > > > >On s390 we have the following little problem: > > > >We use hypervisor or stand-alone dump tools to create Linux system > >dumps. These tools do not know the kernel parameter line and dump the > >full physical memory. > > > >We use makedumpfile to filter those dumps. > > > >If a Linux system has specified the "mem=" parameter, the dump tools > >still dump the whole phypsical memory. > > I guess this is a problem of the tools, it sounds that the tools ignore > the actual memory map and just make wrong ELF headers. > How do the tools decide the range of System RAM to create ELF headers ? The tools do a physical memory detection and that defines the range of memory to be dumped and also defines the memory chunks for the ELF header. And I think we are not the only ones that have this problem. For example, the KVM virsh dump probably also has that problem. > > At least, if the tools respect the actual memory map like /proc/vmcore, it > can create correct ELF headers and makedumpfile will work normally. As I said, the tools do not know the Linux memory map. They only know the physical available memory. Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-03 9:44 ` Michael Holzheu @ 2014-03-11 6:22 ` Atsushi Kumagai 2014-03-11 11:35 ` Michael Holzheu 0 siblings, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2014-03-11 6:22 UTC (permalink / raw) To: holzheu; +Cc: d.hatayama, kexec >On Mon, 3 Mar 2014 03:11:23 +0000 >Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > >> Hello Michael, >> >> >Hello Atsushi, >> > >> >On s390 we have the following little problem: >> > >> >We use hypervisor or stand-alone dump tools to create Linux system >> >dumps. These tools do not know the kernel parameter line and dump the >> >full physical memory. >> > >> >We use makedumpfile to filter those dumps. >> > >> >If a Linux system has specified the "mem=" parameter, the dump tools >> >still dump the whole phypsical memory. >> >> I guess this is a problem of the tools, it sounds that the tools ignore >> the actual memory map and just make wrong ELF headers. >> How do the tools decide the range of System RAM to create ELF headers ? > >The tools do a physical memory detection and that defines the range >of memory to be dumped and also defines the memory chunks for the >ELF header. makedumpfile is designed for kdump, this means it relies on dependable ELF headers. If we support such an incorrect ELF header, makedumpfile has to get the actual memory map from vmcore (but I have no ideas how to do it now) and re-calculate all PT_LOAD regions with it. It sounds too much work for irregular case, I don't plan to take care of it now. >And I think we are not the only ones that have this problem. For example, >the KVM virsh dump probably also has that problem. virsh dump seems to have the same issue as you said, but I suppose qemu developers don't worry about that because they are developing an original way to dump guest's memory in kdump-compressed format as "dump-guest-memory" command. It seems that they know such case is out of the scope of makedumpfile. Thanks Atsushi Kumagai >> At least, if the tools respect the actual memory map like /proc/vmcore, it >> can create correct ELF headers and makedumpfile will work normally. > >As I said, the tools do not know the Linux memory map. They only know >the physical available memory. > >Michael > > >_______________________________________________ >kexec mailing list >kexec@lists.infradead.org >http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-11 6:22 ` Atsushi Kumagai @ 2014-03-11 11:35 ` Michael Holzheu 0 siblings, 0 replies; 15+ messages in thread From: Michael Holzheu @ 2014-03-11 11:35 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec On Tue, 11 Mar 2014 06:22:41 +0000 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > >On Mon, 3 Mar 2014 03:11:23 +0000 > >Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > >The tools do a physical memory detection and that defines the range > >of memory to be dumped and also defines the memory chunks for the > >ELF header. > > makedumpfile is designed for kdump, this means it relies on dependable ELF > headers. If we support such an incorrect ELF header, makedumpfile has to get > the actual memory map from vmcore (but I have no ideas how to do it now) and > re-calculate all PT_LOAD regions with it. It sounds too much work for > irregular case, I don't plan to take care of it now. Ok, fair. > >And I think we are not the only ones that have this problem. For example, > >the KVM virsh dump probably also has that problem. > > virsh dump seems to have the same issue as you said, but I suppose qemu > developers don't worry about that because they are developing an original > way to dump guest's memory in kdump-compressed format as "dump-guest-memory" > command. It seems that they know such case is out of the scope of makedumpfile. Even if they create a kdump-compressed format dump, they (probably) do not filter while dumping. Therefore for large dumps post-processing with makedumpfile could still make sense, e.g. for transfering the dumps. Because qemu is not aware of kernel parameters this will also fail when "mem=" has been used. Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-02-28 12:41 makedumpfile: get_max_mapnr() from ELF header problem Michael Holzheu 2014-03-03 3:11 ` Atsushi Kumagai @ 2014-03-12 4:15 ` HATAYAMA Daisuke 2014-03-12 6:01 ` Atsushi Kumagai 1 sibling, 1 reply; 15+ messages in thread From: HATAYAMA Daisuke @ 2014-03-12 4:15 UTC (permalink / raw) To: Michael Holzheu; +Cc: Atsushi Kumagai, kexec (2014/02/28 21:41), Michael Holzheu wrote: > Hello Atsushi, > > On s390 we have the following little problem: > > We use hypervisor or stand-alone dump tools to create Linux system > dumps. These tools do not know the kernel parameter line and dump the > full physical memory. > > We use makedumpfile to filter those dumps. > > If a Linux system has specified the "mem=" parameter, the dump tools > still dump the whole phypsical memory. > > Unfortunately in "get_max_mapnr()" makedumpfile uses the ELF header to > get the maxmimum page frame number. Since this is not the correct value > in our case makedumpfile fails to filter the dump. > > We get the following error on s390 with makedumpfile version 1.5.3: > > makedumpfile -c -d 31 vmcore dump.kdump > cyclic buffer size has been changed: 22156083 => 22156032 > Excluding unnecessary pages : [ 21 %] vtop_s390x: Address too big for the number of page table levels. > readmem: Can't convert a virtual address(8000180104670) to physical address. > readmem: type_addr: 0, addr:8000180104670, size:32768 > __exclude_unnecessary_pages: Can't read the buffer of struct page. > Excluding unnecessary pages : [ 23 %] vtop_s390x: Address too big for the number of page table levels. readmem: Can't convert a > virtual address(8000180104670) to physical address. readmem: type_addr: > 0, addr:8000180104670, size:327681.5.5 __exclude_unnecessary_pages: > Can't read the buffer of struct page. > > Since version 1.5.4 makedumpfile seems to loop in __exclude_unnecessary_pages(). > > We thought about several ways to fix this problem but have not found a > good solution up to now. > > Do you have an idea how we could fix that? > > Best Regards, > Michael > At least on x86, makedumpfile appears to work well for dumps generated by sadump and virsh dump. In particular, virsh dump --memory-only generates dump in ELF, whose PT_LOAD entries are generated from RAM list managed by qemu, not managed by kernel. Looking into source code a little, max_mapnr is used only for calculating a size of two bitmaps. I guess there's any s390-specific issue. -- Thanks. HATAYAMA, Daisuke _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-12 4:15 ` HATAYAMA Daisuke @ 2014-03-12 6:01 ` Atsushi Kumagai 2014-03-12 16:18 ` Michael Holzheu 0 siblings, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2014-03-12 6:01 UTC (permalink / raw) To: d.hatayama, holzheu; +Cc: kexec >(2014/02/28 21:41), Michael Holzheu wrote: >> Hello Atsushi, >> >> On s390 we have the following little problem: >> >> We use hypervisor or stand-alone dump tools to create Linux system >> dumps. These tools do not know the kernel parameter line and dump the >> full physical memory. >> >> We use makedumpfile to filter those dumps. >> >> If a Linux system has specified the "mem=" parameter, the dump tools >> still dump the whole phypsical memory. >> >> Unfortunately in "get_max_mapnr()" makedumpfile uses the ELF header to >> get the maxmimum page frame number. Since this is not the correct value >> in our case makedumpfile fails to filter the dump. >> >> We get the following error on s390 with makedumpfile version 1.5.3: >> >> makedumpfile -c -d 31 vmcore dump.kdump >> cyclic buffer size has been changed: 22156083 => 22156032 >> Excluding unnecessary pages : [ 21 %] vtop_s390x: Address too big for the number of page table levels. >> readmem: Can't convert a virtual address(8000180104670) to physical address. >> readmem: type_addr: 0, addr:8000180104670, size:32768 >> __exclude_unnecessary_pages: Can't read the buffer of struct page. >> Excluding unnecessary pages : [ 23 %] vtop_s390x: Address too big for the number of page table levels. readmem: >Can't convert a >> virtual address(8000180104670) to physical address. readmem: type_addr: >> 0, addr:8000180104670, size:327681.5.5 __exclude_unnecessary_pages: >> Can't read the buffer of struct page. >> >> Since version 1.5.4 makedumpfile seems to loop in __exclude_unnecessary_pages(). >> >> We thought about several ways to fix this problem but have not found a >> good solution up to now. >> >> Do you have an idea how we could fix that? >> >> Best Regards, >> Michael >> > >At least on x86, makedumpfile appears to work well for dumps generated by sadump and virsh dump. In particular, virsh >dump --memory-only generates dump in ELF, whose PT_LOAD entries are generated from RAM list managed by qemu, not managed >by kernel. > >Looking into source code a little, max_mapnr is used only for calculating a size of two bitmaps. I guess there's any >s390-specific issue. On second thought, Michael's log looks strange. >> Excluding unnecessary pages : [ 21 %] vtop_s390x: Address too big for the number of page table levels. >> readmem: Can't convert a virtual address(8000180104670) to physical address. >> readmem: type_addr: 0, addr:8000180104670, size:32768 >> __exclude_unnecessary_pages: Can't read the buffer of struct page. This message was shown during translating the virtual address of mem_map to the physical address: __exclude_unnecessary_pages(): if (!readmem(VADDR, mem_map, page_cache + (index_pg * SIZE(page)), SIZE(page) * pfn_mm)) { ERRMSG("Can't read the buffer of struct page.\n"); return FALSE; } However, this should succeed even if mem= was specified because the corresponding page table must exist in the memory image since it was used by kernel actually. The address translation logic may have an issue. Thanks Atsushi Kumagai >-- >Thanks. >HATAYAMA, Daisuke > > >_______________________________________________ >kexec mailing list >kexec@lists.infradead.org >http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-12 6:01 ` Atsushi Kumagai @ 2014-03-12 16:18 ` Michael Holzheu 2014-03-14 8:54 ` Atsushi Kumagai 0 siblings, 1 reply; 15+ messages in thread From: Michael Holzheu @ 2014-03-12 16:18 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec On Wed, 12 Mar 2014 06:01:47 +0000 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > >(2014/02/28 21:41), Michael Holzheu wrote: > >> Hello Atsushi, [snip] > >Looking into source code a little, max_mapnr is used only for calculating a size of two bitmaps. I guess there's any > >s390-specific issue. > > On second thought, Michael's log looks strange. > > >> Excluding unnecessary pages : [ 21 %] vtop_s390x: Address too big for the number of page table levels. > >> readmem: Can't convert a virtual address(8000180104670) to physical address. > >> readmem: type_addr: 0, addr:8000180104670, size:32768 > >> __exclude_unnecessary_pages: Can't read the buffer of struct page. > > This message was shown during translating the virtual address of mem_map > to the physical address: > > __exclude_unnecessary_pages(): > > if (!readmem(VADDR, mem_map, > page_cache + (index_pg * SIZE(page)), > SIZE(page) * pfn_mm)) { > ERRMSG("Can't read the buffer of struct page.\n"); > return FALSE; > } > > However, this should succeed even if mem= was specified because the > corresponding page table must exist in the memory image since it was > used by kernel actually. The address translation logic may have an issue. To be honest I don't really understand what happens when the error occurs. My test is a 1 TiB dump of a Linux system that has set mem=1G. With makedumpfile 1.5.3 I see the following stack backtrace: (gdb) bt #0 vtop_s390x (vaddr=2251803034918936) at ./arch/s390x.c:236 #1 0x000000008001de44 in vaddr_to_paddr_s390x (vaddr=2251803034918936) at ./arch/s390x.c:300 #2 0x000000008001fb50 in readmem (type_addr=0, addr=2251803034918936, bufptr=0x3ffffff6cf0, size=32768) at makedumpfile.c:349 #3 0x0000000080034cf2 in __exclude_unnecessary_pages ( mem_map=2251803034918936, pfn_start=16777216, pfn_end=16842752) at makedumpfile.c:4189 #4 0x0000000080035716 in exclude_unnecessary_pages_cyclic () at makedumpfile.c:4349 #5 0x00000000800358e4 in update_cyclic_region (pfn=0) at makedumpfile.c:4380 #6 0x00000000800384e0 in get_num_dumpable_cyclic () at makedumpfile.c:5060 #7 0x0000000080036850 in create_dump_bitmap () at makedumpfile.c:4585 #8 0x00000000800429c8 in create_dumpfile () at makedumpfile.c:7533 #9 0x00000000800490fc in main (argc=5, argv=0x3fffffff3d8) at makedumpfile.c:8651 Looks like makdumpfile wants to read a virtual address 2251803034918936 (hex 0x80000C0002018) which can't be resolved by the three level kernel page table (max is 4 TiB here). In the __exclude_unnecessary_pages() function the variables have the following values: (gdb) print pfn_end $1 = 16842752 (gdb) print pfn $2 = 16777216 (gdb) print pfn_start $3 = 16777216 (gdb) print mem_map $4 = 2251803034918936 I would appreciate any hints! Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-12 16:18 ` Michael Holzheu @ 2014-03-14 8:54 ` Atsushi Kumagai 2014-03-14 14:19 ` Michael Holzheu 0 siblings, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2014-03-14 8:54 UTC (permalink / raw) To: holzheu; +Cc: d.hatayama, kexec >My test is a 1 TiB dump of a Linux system that has set mem=1G. > >With makedumpfile 1.5.3 I see the following stack backtrace: > >(gdb) bt >#0 vtop_s390x (vaddr=2251803034918936) at ./arch/s390x.c:236 >#1 0x000000008001de44 in vaddr_to_paddr_s390x (vaddr=2251803034918936) > at ./arch/s390x.c:300 >#2 0x000000008001fb50 in readmem (type_addr=0, addr=2251803034918936, > bufptr=0x3ffffff6cf0, size=32768) at makedumpfile.c:349 >#3 0x0000000080034cf2 in __exclude_unnecessary_pages ( > mem_map=2251803034918936, pfn_start=16777216, pfn_end=16842752) > at makedumpfile.c:4189 >#4 0x0000000080035716 in exclude_unnecessary_pages_cyclic () > at makedumpfile.c:4349 >#5 0x00000000800358e4 in update_cyclic_region (pfn=0) at makedumpfile.c:4380 >#6 0x00000000800384e0 in get_num_dumpable_cyclic () at makedumpfile.c:5060 >#7 0x0000000080036850 in create_dump_bitmap () at makedumpfile.c:4585 >#8 0x00000000800429c8 in create_dumpfile () at makedumpfile.c:7533 >#9 0x00000000800490fc in main (argc=5, argv=0x3fffffff3d8) > at makedumpfile.c:8651 > >Looks like makdumpfile wants to read a virtual address 2251803034918936 >(hex 0x80000C0002018) which can't be resolved by the three level kernel >page table (max is 4 TiB here). > >In the __exclude_unnecessary_pages() function the variables have the >following values: > >(gdb) print pfn_end >$1 = 16842752 >(gdb) print pfn >$2 = 16777216 >(gdb) print pfn_start >$3 = 16777216 >(gdb) print mem_map >$4 = 2251803034918936 > > I would appreciate any hints! What were these values when you didn't specify mem=1G ? The mem_map information can be shown with -D option like below: $ makedumpfile -D -cd31 vmcore dumpfile ... Memory type : SPARSEMEM mem_map (0) mem_map : f6d7f000 pfn_start : 0 pfn_end : 20000 mem_map (1) mem_map : f697f000 pfn_start : 20000 pfn_end : 40000 ... Could you show me both log of mem=1G and log of no mem= option ? The difference may help us. BTW, is the command below the actual command line you used ? makedumpfile -c -d 31 vmcore dump.kdump If yes, the dumpfile must have VMOCREINFO since you didn't specify neither -x nor -i option, i.e. the s390 dump tool generates a PT_NOTE entry as VMCOREINFO, really ? Thanks Atsushi Kumagai _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-14 8:54 ` Atsushi Kumagai @ 2014-03-14 14:19 ` Michael Holzheu 2014-03-19 7:14 ` Atsushi Kumagai 0 siblings, 1 reply; 15+ messages in thread From: Michael Holzheu @ 2014-03-14 14:19 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec Hello Atsushi, I debugged my problem a bit further and tried to implement a function that gets the maximum page frame number from the Linux kernel memory management structures. I am no memory management expert, so the following patch probably is not complete, but at least for my setup it worked. Michael --- makedumpfile.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) --- a/makedumpfile.c +++ b/makedumpfile.c @@ -2029,6 +2029,48 @@ pgdat4: return SYMBOL(contig_page_data); } +int +get_max_pfn(void) +{ + unsigned long pgdat, node_start_pfn, node_spanned_pages, max_pfn = 0; + int num_nodes, node; + + if ((node = next_online_node(0)) < 0) { + ERRMSG("Can't get next online node.\n"); + return FALSE; + } + if (!(pgdat = next_online_pgdat(node))) { + ERRMSG("Can't get pgdat list.\n"); + return FALSE; + } + for (num_nodes = 1; num_nodes <= vt.numnodes; num_nodes++) { + if (!readmem(VADDR, pgdat + OFFSET(pglist_data.node_start_pfn), + &node_start_pfn, sizeof node_start_pfn)) { + ERRMSG("Can't get node_start_pfn.\n"); + return FALSE; + } + if (!readmem(VADDR, + pgdat + OFFSET(pglist_data.node_spanned_pages), + &node_spanned_pages, sizeof node_spanned_pages)) { + ERRMSG("Can't get node_spanned_pages.\n"); + return FALSE; + } + max_pfn = MAX(max_pfn, (node_start_pfn + node_spanned_pages)); + if (num_nodes < vt.numnodes) { + if ((node = next_online_node(node + 1)) < 0) { + ERRMSG("Can't get next online node.\n"); + return FALSE; + } else if (!(pgdat = next_online_pgdat(node))) { + ERRMSG("Can't determine pgdat list (node %d).\n", + node); + return FALSE; + } + } + } + info->max_mapnr = max_pfn; + return TRUE; +} + void dump_mem_map(unsigned long long pfn_start, unsigned long long pfn_end, unsigned long mem_map, int num_mm) @@ -2853,6 +2908,9 @@ out: if (!get_numnodes()) return FALSE; + if (!get_max_pfn()) + return FALSE; + if (!get_mem_map()) return FALSE; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-14 14:19 ` Michael Holzheu @ 2014-03-19 7:14 ` Atsushi Kumagai 2014-03-19 18:29 ` Michael Holzheu ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Atsushi Kumagai @ 2014-03-19 7:14 UTC (permalink / raw) To: holzheu; +Cc: d.hatayama, kexec >Hello Atsushi, > >I debugged my problem a bit further and tried to implement >a function that gets the maximum page frame number from the >Linux kernel memory management structures. > >I am no memory management expert, so the following patch probably >is not complete, but at least for my setup it worked. The patch looks good for your case, but I don't think it's a proper approach for this problem. Now, I think this is a problem of get_mm_sparsemem() in makedumpfile. To say in more detail, the problem is "wrong calculating the address of unused mem_map". Looking at the log you sent, some addresses of mem_map corresponding to unused pages look invalid like below: mem_map (256) mem_map : 80000c0002018 pfn_start : 1000000 pfn_end : 1010000 mem_map (257) mem_map : 800001840400000 pfn_start : 1010000 pfn_end : 1020000 ... mem_map (544) mem_map : a82400012f14fffc pfn_start : 2200000 pfn_end : 2210000 ...(and more) However, makedumpfile should calculate such unused mem_map addresses as 0(NOT_MEMMAP_ADDR). Actually it works as expected at least in my environment(x86_64): ... mem_map (16) mem_map : 0 pfn_start : 80000 pfn_end : 88000 mem_map (17) mem_map : 0 pfn_start : 88000 pfn_end : 90000 ... makedumpfile get the address from mem_section.section_mem_map, it will be initialized with zero: [CONFIG_SPARSEMEM_EXTREAM] paging_init() sparse_memory_present_with_active_regions() memory_present() sparse_index_init() sparse_index_alloc() // allocate mem_section with kzalloc() makedumpfile assumes the value of unused mem_section will remain as 0, but I suspect this assumption may be broken in your environment. Moreover, if it's true, the problem will happen when memmap= parameter is specified even if max_mapnr is correct. This is because the unused pages created by memmap= will be placed below max_mapnr and its mem_map will be calculated wrongly by get_mm_sparsemem(). I'll continue to investigate to find better solution for this problem, any comments are helpful. Thanks Atsushi Kumagai >Michael >--- > makedumpfile.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 58 insertions(+) > >--- a/makedumpfile.c >+++ b/makedumpfile.c >@@ -2029,6 +2029,48 @@ pgdat4: > return SYMBOL(contig_page_data); > } > >+int >+get_max_pfn(void) >+{ >+ unsigned long pgdat, node_start_pfn, node_spanned_pages, max_pfn = 0; >+ int num_nodes, node; >+ >+ if ((node = next_online_node(0)) < 0) { >+ ERRMSG("Can't get next online node.\n"); >+ return FALSE; >+ } >+ if (!(pgdat = next_online_pgdat(node))) { >+ ERRMSG("Can't get pgdat list.\n"); >+ return FALSE; >+ } >+ for (num_nodes = 1; num_nodes <= vt.numnodes; num_nodes++) { >+ if (!readmem(VADDR, pgdat + OFFSET(pglist_data.node_start_pfn), >+ &node_start_pfn, sizeof node_start_pfn)) { >+ ERRMSG("Can't get node_start_pfn.\n"); >+ return FALSE; >+ } >+ if (!readmem(VADDR, >+ pgdat + OFFSET(pglist_data.node_spanned_pages), >+ &node_spanned_pages, sizeof node_spanned_pages)) { >+ ERRMSG("Can't get node_spanned_pages.\n"); >+ return FALSE; >+ } >+ max_pfn = MAX(max_pfn, (node_start_pfn + node_spanned_pages)); >+ if (num_nodes < vt.numnodes) { >+ if ((node = next_online_node(node + 1)) < 0) { >+ ERRMSG("Can't get next online node.\n"); >+ return FALSE; >+ } else if (!(pgdat = next_online_pgdat(node))) { >+ ERRMSG("Can't determine pgdat list (node %d).\n", >+ node); >+ return FALSE; >+ } >+ } >+ } >+ info->max_mapnr = max_pfn; >+ return TRUE; >+} >+ > void > dump_mem_map(unsigned long long pfn_start, > unsigned long long pfn_end, unsigned long mem_map, int num_mm) >@@ -2853,6 +2908,9 @@ out: > if (!get_numnodes()) > return FALSE; > >+ if (!get_max_pfn()) >+ return FALSE; >+ > if (!get_mem_map()) > return FALSE; > > > >_______________________________________________ >kexec mailing list >kexec@lists.infradead.org >http://lists.infradead.org/mailman/listinfo/kexec _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-19 7:14 ` Atsushi Kumagai @ 2014-03-19 18:29 ` Michael Holzheu 2014-03-20 10:23 ` Michael Holzheu [not found] ` <20140319180903.2c6e2b72@holzheu> 2 siblings, 0 replies; 15+ messages in thread From: Michael Holzheu @ 2014-03-19 18:29 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec On Wed, 19 Mar 2014 07:14:25 +0000 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > >Hello Atsushi, > > > >I debugged my problem a bit further and tried to implement > >a function that gets the maximum page frame number from the > >Linux kernel memory management structures. > > > >I am no memory management expert, so the following patch probably > >is not complete, but at least for my setup it worked. > > The patch looks good for your case, but I don't think it's a proper > approach for this problem. > > Now, I think this is a problem of get_mm_sparsemem() in makedumpfile. > To say in more detail, the problem is "wrong calculating the address > of unused mem_map". > > Looking at the log you sent, some addresses of mem_map corresponding > to unused pages look invalid like below: > > mem_map (256) > mem_map : 80000c0002018 > pfn_start : 1000000 > pfn_end : 1010000 > mem_map (257) > mem_map : 800001840400000 > pfn_start : 1010000 > pfn_end : 1020000 > ... > mem_map (544) > mem_map : a82400012f14fffc > pfn_start : 2200000 > pfn_end : 2210000 > > ...(and more) > > However, makedumpfile should calculate such unused mem_map addresses > as 0(NOT_MEMMAP_ADDR). Actually it works as expected at least in my > environment(x86_64): > > ... > mem_map (16) > mem_map : 0 > pfn_start : 80000 > pfn_end : 88000 > mem_map (17) > mem_map : 0 > pfn_start : 88000 > pfn_end : 90000 > ... > > makedumpfile get the address from mem_section.section_mem_map, > it will be initialized with zero: > > [CONFIG_SPARSEMEM_EXTREAM] > paging_init() > sparse_memory_present_with_active_regions() > memory_present() > sparse_index_init() > sparse_index_alloc() // allocate mem_section with kzalloc() > > makedumpfile assumes the value of unused mem_section will remain as 0, > but I suspect this assumption may be broken in your environment. > Hello Atshushi, I noticed that my last patch was not complete. It only checked the mem_section[] array for zero entries. But as you noticed, we also have to check the section array that we get from the mem_section entries. So I updated the patch. Michael --- makedumpfile.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) --- a/makedumpfile.c +++ b/makedumpfile.c @@ -2690,11 +2690,14 @@ nr_to_section(unsigned long nr, unsigned { unsigned long addr; - if (is_sparsemem_extreme()) + if (is_sparsemem_extreme()) { + if (mem_sec[SECTION_NR_TO_ROOT(nr)] == 0) + return NOT_KV_ADDR; addr = mem_sec[SECTION_NR_TO_ROOT(nr)] + (nr & SECTION_ROOT_MASK()) * SIZE(mem_section); - else + } else { addr = SYMBOL(mem_section) + (nr * SIZE(mem_section)); + } if (!is_kvaddr(addr)) return NOT_KV_ADDR; @@ -2778,10 +2781,19 @@ get_mm_sparsemem(void) } for (section_nr = 0; section_nr < num_section; section_nr++) { section = nr_to_section(section_nr, mem_sec); - mem_map = section_mem_map_addr(section); - mem_map = sparse_decode_mem_map(mem_map, section_nr); - if (!is_kvaddr(mem_map)) + if (section == NOT_KV_ADDR) { mem_map = NOT_MEMMAP_ADDR; + } else { + mem_map = section_mem_map_addr(section); + if (mem_map == 0) { + mem_map = NOT_MEMMAP_ADDR; + } else { + mem_map = sparse_decode_mem_map(mem_map, + section_nr); + if (!is_kvaddr(mem_map)) + mem_map = NOT_MEMMAP_ADDR; + } + } pfn_start = section_nr * PAGES_PER_SECTION(); pfn_end = pfn_start + PAGES_PER_SECTION(); if (info->max_mapnr < pfn_end) _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-19 7:14 ` Atsushi Kumagai 2014-03-19 18:29 ` Michael Holzheu @ 2014-03-20 10:23 ` Michael Holzheu [not found] ` <20140319180903.2c6e2b72@holzheu> 2 siblings, 0 replies; 15+ messages in thread From: Michael Holzheu @ 2014-03-20 10:23 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec On Wed, 19 Mar 2014 07:14:25 +0000 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: > >Hello Atsushi, > > > >I debugged my problem a bit further and tried to implement > >a function that gets the maximum page frame number from the > >Linux kernel memory management structures. > > > >I am no memory management expert, so the following patch probably > >is not complete, but at least for my setup it worked. > > The patch looks good for your case, but I don't think it's a proper > approach for this problem. Hello Atsushi, If you don't like that solution, what about using the mem_map_data[] array of makedumpfile to adjust "max_mapnr"? The patch below also works fine for my dump. Michael --- makedumpfile.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) --- a/makedumpfile.c +++ b/makedumpfile.c @@ -2829,7 +2829,8 @@ get_mem_map_without_mm(void) int get_mem_map(void) { - int ret; + unsigned long max_pfn = 0; + int ret, i; switch (get_mem_type()) { case SPARSEMEM: @@ -2861,6 +2862,17 @@ get_mem_map(void) ret = FALSE; break; } + /* + * Adjust "max_mapnr" for the case that Linux uses less memory + * than is dumped. For example when "mem=" has been used for the + * dumped system. + */ + for (i = 0; i < info->num_mem_map; i++) { + if (info->mem_map_data[i].mem_map == NOT_MEMMAP_ADDR) + continue; + max_pfn = MAX(max_pfn, info->mem_map_data[i].pfn_end); + } + info->max_mapnr = max_pfn; return ret; } _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20140319180903.2c6e2b72@holzheu>]
* RE: makedumpfile: get_max_mapnr() from ELF header problem [not found] ` <20140319180903.2c6e2b72@holzheu> @ 2014-03-25 1:14 ` Atsushi Kumagai 2014-03-25 15:24 ` Michael Holzheu 0 siblings, 1 reply; 15+ messages in thread From: Atsushi Kumagai @ 2014-03-25 1:14 UTC (permalink / raw) To: holzheu; +Cc: d.hatayama, kexec >> Now, I think this is a problem of get_mm_sparsemem() in makedumpfile. >> To say in more detail, the problem is "wrong calculating the address >> of unused mem_map". >> >> Looking at the log you sent, some addresses of mem_map corresponding >> to unused pages look invalid like below: >> >> mem_map (256) >> mem_map : 80000c0002018 >> pfn_start : 1000000 >> pfn_end : 1010000 >> mem_map (257) >> mem_map : 800001840400000 >> pfn_start : 1010000 >> pfn_end : 1020000 >> ... >> mem_map (544) >> mem_map : a82400012f14fffc >> pfn_start : 2200000 >> pfn_end : 2210000 >> >> ...(and more) >> >> However, makedumpfile should calculate such unused mem_map addresses >> as 0(NOT_MEMMAP_ADDR). Actually it works as expected at least in my >> environment(x86_64): >> >> ... >> mem_map (16) >> mem_map : 0 >> pfn_start : 80000 >> pfn_end : 88000 >> mem_map (17) >> mem_map : 0 >> pfn_start : 88000 >> pfn_end : 90000 >> ... >> >> makedumpfile get the address from mem_section.section_mem_map, >> it will be initialized with zero: >> >> [CONFIG_SPARSEMEM_EXTREAM] >> paging_init() >> sparse_memory_present_with_active_regions() >> memory_present() >> sparse_index_init() >> sparse_index_alloc() // allocate mem_section with kzalloc() >> >> makedumpfile assumes the value of unused mem_section will remain as 0, >> but I suspect this assumption may be broken in your environment. > >No, I think your assumption is true also for my environment. For my >dump the "mem_section" array is zero except for the first entry. > >crash> print/x mem_section >$1 = {0x2fe6f800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...} > >But it looks like get_mm_sparsemem() does not check for zero. >The nr_to_section() function just returns an invalid address >(something between 0 and 4096) for section in case we get zero >from the "mem_section" entry. This is address is then used for >calculating "mem_map": In other architectures, the check by is_kaddr() avoids to read invalid address, but it doesn't do anything in the case of s390 due to the its memory management mechanism: s390x: Fix KVBASE to correct value for s390x architecture. http://lists.infradead.org/pipermail/kexec/2011-March/004930.html Finally I've understood the cause of this issue completely, thanks for your report. >mem_map = section_mem_map_addr(section); >mem_map = sparse_decode_mem_map(mem_map, section_nr); > >With the patch below I could use makedumpfile (1.5.3) successfully >on the 1TB dump with mem=1G. I attached the -D output that is >created by makedumpfile with the patch. > >But compared to my first patch it takes much longer and the resulting >dump is bigger (version 1.5.3): > > | Dump time | Dump size >-------------+-------------+----------- >First patch | 10 sec | 124 MB >Second patch | 87 minutes | 6348 MB > >No idea why the dump is bigger with the second patch. I think the time >is consumed in write_kdump_pages_cyclic() by checking for zero pages >for the whole range: I suppose this difference was resolved with the v2 of the second patch, right? > >5970 for (pfn = start_pfn; pfn < end_pfn; pfn++) { >(gdb) n >5972 if ((num_dumped % per) == 0) >(gdb) n >5978 if (!is_dumpable_cyclic(info->partial_bitmap2, pfn)) >(gdb) n >5981 num_dumped++; >(gdb) n >5983 if (!read_pfn(pfn, buf)) >(gdb) n >5989 if ((info->dump_level & DL_EXCLUDE_ZERO) >(gdb) n >5990 && is_zero_page(buf, info->page_size)) { >(gdb) n >5991 if (!write_cache(cd_header, pd_zero, sizeof(page_desc_t))) >(gdb) n >5993 pfn_zero++; >(gdb) n >5994 continue; > >(gdb) print end_pfn >$3 = 268435456 > >So the first patch would be better for my scenario. What in particular are your >concerns with that patch? I think the v2 second patch is a reasonable patch to fix the bug of get_mm_sparsemem(). Additionally, the latest patch you posted to adjust max_mapnr (which using mem_map_data[]) is acceptable instead of the first patch. So could you re-post the two as a formal patch set? I mean patch descriptions and your signature are needed. Thanks Atsushi Kumagai >Michael > >The following patch adds the zero check for "mem_section" entries >--- > makedumpfile.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > >--- a/makedumpfile.c >+++ b/makedumpfile.c >@@ -2402,11 +2402,14 @@ nr_to_section(unsigned long nr, unsigned > { > unsigned long addr; > >- if (is_sparsemem_extreme()) >+ if (is_sparsemem_extreme()) { >+ if (mem_sec[SECTION_NR_TO_ROOT(nr)] == 0) >+ return NOT_KV_ADDR; > addr = mem_sec[SECTION_NR_TO_ROOT(nr)] + > (nr & SECTION_ROOT_MASK()) * SIZE(mem_section); >- else >+ } else { > addr = SYMBOL(mem_section) + (nr * SIZE(mem_section)); >+ } > > if (!is_kvaddr(addr)) > return NOT_KV_ADDR; >@@ -2490,10 +2493,14 @@ get_mm_sparsemem(void) > } > for (section_nr = 0; section_nr < num_section; section_nr++) { > section = nr_to_section(section_nr, mem_sec); >- mem_map = section_mem_map_addr(section); >- mem_map = sparse_decode_mem_map(mem_map, section_nr); >- if (!is_kvaddr(mem_map)) >+ if (section == NOT_KV_ADDR) { > mem_map = NOT_MEMMAP_ADDR; >+ } else { >+ mem_map = section_mem_map_addr(section); >+ mem_map = sparse_decode_mem_map(mem_map, section_nr); >+ if (!is_kvaddr(mem_map)) >+ mem_map = NOT_MEMMAP_ADDR; >+ } > pfn_start = section_nr * PAGES_PER_SECTION(); > pfn_end = pfn_start + PAGES_PER_SECTION(); > if (info->max_mapnr < pfn_end) _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: makedumpfile: get_max_mapnr() from ELF header problem 2014-03-25 1:14 ` Atsushi Kumagai @ 2014-03-25 15:24 ` Michael Holzheu 0 siblings, 0 replies; 15+ messages in thread From: Michael Holzheu @ 2014-03-25 15:24 UTC (permalink / raw) To: Atsushi Kumagai; +Cc: d.hatayama, kexec On Tue, 25 Mar 2014 01:14:21 +0000 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> wrote: [snip] > >But it looks like get_mm_sparsemem() does not check for zero. > >The nr_to_section() function just returns an invalid address > >(something between 0 and 4096) for section in case we get zero > >from the "mem_section" entry. This is address is then used for > >calculating "mem_map": > > In other architectures, the check by is_kaddr() avoids to > read invalid address, but it doesn't do anything in the case > of s390 due to the its memory management mechanism: > > s390x: Fix KVBASE to correct value for s390x architecture. > http://lists.infradead.org/pipermail/kexec/2011-March/004930.html Right, for s390 the zero page is valid. > Finally I've understood the cause of this issue completely, > thanks for your report. > > >mem_map = section_mem_map_addr(section); > >mem_map = sparse_decode_mem_map(mem_map, section_nr); > > > >With the patch below I could use makedumpfile (1.5.3) successfully > >on the 1TB dump with mem=1G. I attached the -D output that is > >created by makedumpfile with the patch. > > > >But compared to my first patch it takes much longer and the resulting > >dump is bigger (version 1.5.3): > > > > | Dump time | Dump size > >-------------+-------------+----------- > >First patch | 10 sec | 124 MB > >Second patch | 87 minutes | 6348 MB > > > >No idea why the dump is bigger with the second patch. I think the time > >is consumed in write_kdump_pages_cyclic() by checking for zero pages > >for the whole range: > > I suppose this difference was resolved with the v2 of the second patch, > right? Right, with the last patch the dump time and size were ok. [snip] > >So the first patch would be better for my scenario. What in particular are your > >concerns with that patch? > > I think the v2 second patch is a reasonable patch to fix the > bug of get_mm_sparsemem(). > Additionally, the latest patch you posted to adjust max_mapnr > (which using mem_map_data[]) is acceptable instead of the first > patch. > So could you re-post the two as a formal patch set? > I mean patch descriptions and your signature are needed. Ok great! I will resend the patches. Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-03-25 15:25 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-28 12:41 makedumpfile: get_max_mapnr() from ELF header problem Michael Holzheu 2014-03-03 3:11 ` Atsushi Kumagai 2014-03-03 9:44 ` Michael Holzheu 2014-03-11 6:22 ` Atsushi Kumagai 2014-03-11 11:35 ` Michael Holzheu 2014-03-12 4:15 ` HATAYAMA Daisuke 2014-03-12 6:01 ` Atsushi Kumagai 2014-03-12 16:18 ` Michael Holzheu 2014-03-14 8:54 ` Atsushi Kumagai 2014-03-14 14:19 ` Michael Holzheu 2014-03-19 7:14 ` Atsushi Kumagai 2014-03-19 18:29 ` Michael Holzheu 2014-03-20 10:23 ` Michael Holzheu [not found] ` <20140319180903.2c6e2b72@holzheu> 2014-03-25 1:14 ` Atsushi Kumagai 2014-03-25 15:24 ` Michael Holzheu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.