* [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump @ 2013-11-05 13:45 Jingbai Ma 2013-11-05 13:45 ` [PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions Jingbai Ma ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Jingbai Ma @ 2013-11-05 13:45 UTC (permalink / raw) To: ptesarik, d.hatayama, kumagai-atsushi Cc: bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal This patch set intend to exclude unnecessary hugepages from vmcore dump file. This patch requires the kernel patch to export necessary data structures into vmcore: "kexec: export hugepage data structure into vmcoreinfo" http://lists.infradead.org/pipermail/kexec/2013-November/009997.html This patch introduce two new dump levels 32 and 64 to exclude all unused and active hugepages. The level to exclude all unnecessary pages will be 127 now. | cache cache free active Dump | zero without with user free huge huge Level | page private private data page page page -------+---------------------------------------------------------- 0 | 1 | X 2 | X 4 | X X 8 | X 16 | X 32 | X 64 | X X 127 | X X X X X X X example: To exclude all unnecessary pages: makedumpfile -c --message-level 23 -d 127 /proc/vmcore /var/crash/kdump To exclude all unnecessary pages but keep active hugepages: makedumpfile -c --message-level 23 -d 63 /proc/vmcore /var/crash/kdump --- Jingbai Ma (3): makedumpfile: hugepage filtering: add hugepage filtering functions makedumpfile: hugepage filtering: add excluding hugepage messages makedumpfile: hugepage filtering: add new dump levels for manual page makedumpfile.8 | 170 +++++++++++++++++++++++++++-------- makedumpfile.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ makedumpfile.h | 19 ++++ print_info.c | 12 +- print_info.h | 2 5 files changed, 431 insertions(+), 44 deletions(-) -- ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions 2013-11-05 13:45 [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Jingbai Ma @ 2013-11-05 13:45 ` Jingbai Ma 2013-11-05 13:45 ` [PATCH 2/3] makedumpfile: hugepage filtering: add excluding hugepage messages Jingbai Ma ` (2 subsequent siblings) 3 siblings, 0 replies; 25+ messages in thread From: Jingbai Ma @ 2013-11-05 13:45 UTC (permalink / raw) To: ptesarik, d.hatayama, kumagai-atsushi Cc: bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal Add functions to exclude hugepage from vmcore dump. Signed-off-by: Jingbai Ma <jingbai.ma@hp.com> --- makedumpfile.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ makedumpfile.h | 19 ++++ 2 files changed, 289 insertions(+), 2 deletions(-) diff --git a/makedumpfile.c b/makedumpfile.c index b42565c..f0b2531 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -46,6 +46,8 @@ unsigned long long pfn_cache_private; unsigned long long pfn_user; unsigned long long pfn_free; unsigned long long pfn_hwpoison; +unsigned long long pfn_free_huge; +unsigned long long pfn_active_huge; unsigned long long num_dumped; @@ -1038,6 +1040,7 @@ get_symbol_info(void) SYMBOL_INIT(mem_map, "mem_map"); SYMBOL_INIT(vmem_map, "vmem_map"); SYMBOL_INIT(mem_section, "mem_section"); + SYMBOL_INIT(hstates, "hstates"); SYMBOL_INIT(pkmap_count, "pkmap_count"); SYMBOL_INIT_NEXT(pkmap_count_next, "pkmap_count"); SYMBOL_INIT(system_utsname, "system_utsname"); @@ -1174,6 +1177,19 @@ get_structure_info(void) OFFSET_INIT(list_head.prev, "list_head", "prev"); /* + * Get offsets of the hstate's members. + */ + SIZE_INIT(hstate, "hstate"); + OFFSET_INIT(hstate.order, "hstate", "order"); + OFFSET_INIT(hstate.nr_huge_pages, "hstate", "nr_huge_pages"); + OFFSET_INIT(hstate.free_huge_pages, "hstate", "free_huge_pages"); + OFFSET_INIT(hstate.hugepage_activelist, "hstate", + "hugepage_activelist"); + OFFSET_INIT(hstate.hugepage_freelists, "hstate", "hugepage_freelists"); + MEMBER_ARRAY_LENGTH_INIT(hstate.hugepage_freelists, "hstate", + "hugepage_freelists"); + + /* * Get offsets of the node_memblk_s's members. */ SIZE_INIT(node_memblk_s, "node_memblk_s"); @@ -1555,6 +1571,7 @@ write_vmcoreinfo_data(void) WRITE_SYMBOL("mem_map", mem_map); WRITE_SYMBOL("vmem_map", vmem_map); WRITE_SYMBOL("mem_section", mem_section); + WRITE_SYMBOL("hstates", hstates); WRITE_SYMBOL("pkmap_count", pkmap_count); WRITE_SYMBOL("pkmap_count_next", pkmap_count_next); WRITE_SYMBOL("system_utsname", system_utsname); @@ -1590,6 +1607,7 @@ write_vmcoreinfo_data(void) WRITE_STRUCTURE_SIZE("zone", zone); WRITE_STRUCTURE_SIZE("free_area", free_area); WRITE_STRUCTURE_SIZE("list_head", list_head); + WRITE_STRUCTURE_SIZE("hstate", hstate); WRITE_STRUCTURE_SIZE("node_memblk_s", node_memblk_s); WRITE_STRUCTURE_SIZE("nodemask_t", nodemask_t); WRITE_STRUCTURE_SIZE("pageflags", pageflags); @@ -1628,6 +1646,13 @@ write_vmcoreinfo_data(void) WRITE_MEMBER_OFFSET("vm_struct.addr", vm_struct.addr); WRITE_MEMBER_OFFSET("vmap_area.va_start", vmap_area.va_start); WRITE_MEMBER_OFFSET("vmap_area.list", vmap_area.list); + WRITE_MEMBER_OFFSET("hstate.order", hstate.order); + WRITE_MEMBER_OFFSET("hstate.nr_huge_pages", hstate.nr_huge_pages); + WRITE_MEMBER_OFFSET("hstate.free_huge_pages", hstate.free_huge_pages); + WRITE_MEMBER_OFFSET("hstate.hugepage_activelist", + hstate.hugepage_activelist); + WRITE_MEMBER_OFFSET("hstate.hugepage_freelists", + hstate.hugepage_freelists); WRITE_MEMBER_OFFSET("log.ts_nsec", log.ts_nsec); WRITE_MEMBER_OFFSET("log.len", log.len); WRITE_MEMBER_OFFSET("log.text_len", log.text_len); @@ -1647,6 +1672,9 @@ write_vmcoreinfo_data(void) WRITE_ARRAY_LENGTH("zone.free_area", zone.free_area); WRITE_ARRAY_LENGTH("free_area.free_list", free_area.free_list); + WRITE_ARRAY_LENGTH("hstate.hugepage_freelists", + hstate.hugepage_freelists); + WRITE_NUMBER("NR_FREE_PAGES", NR_FREE_PAGES); WRITE_NUMBER("N_ONLINE", N_ONLINE); @@ -1659,6 +1687,8 @@ write_vmcoreinfo_data(void) WRITE_NUMBER("PAGE_BUDDY_MAPCOUNT_VALUE", PAGE_BUDDY_MAPCOUNT_VALUE); + WRITE_NUMBER("HUGE_MAX_HSTATE", HUGE_MAX_HSTATE); + /* * write the source file of 1st kernel */ @@ -1874,6 +1904,7 @@ read_vmcoreinfo(void) READ_SYMBOL("mem_map", mem_map); READ_SYMBOL("vmem_map", vmem_map); READ_SYMBOL("mem_section", mem_section); + READ_SYMBOL("hstates", hstates); READ_SYMBOL("pkmap_count", pkmap_count); READ_SYMBOL("pkmap_count_next", pkmap_count_next); READ_SYMBOL("system_utsname", system_utsname); @@ -1906,6 +1937,7 @@ read_vmcoreinfo(void) READ_STRUCTURE_SIZE("zone", zone); READ_STRUCTURE_SIZE("free_area", free_area); READ_STRUCTURE_SIZE("list_head", list_head); + READ_STRUCTURE_SIZE("hstate", hstate); READ_STRUCTURE_SIZE("node_memblk_s", node_memblk_s); READ_STRUCTURE_SIZE("nodemask_t", nodemask_t); READ_STRUCTURE_SIZE("pageflags", pageflags); @@ -1940,6 +1972,13 @@ read_vmcoreinfo(void) READ_MEMBER_OFFSET("vm_struct.addr", vm_struct.addr); READ_MEMBER_OFFSET("vmap_area.va_start", vmap_area.va_start); READ_MEMBER_OFFSET("vmap_area.list", vmap_area.list); + READ_MEMBER_OFFSET("hstate.order", hstate.order); + READ_MEMBER_OFFSET("hstate.nr_huge_pages", hstate.nr_huge_pages); + READ_MEMBER_OFFSET("hstate.free_huge_pages", hstate.free_huge_pages); + READ_MEMBER_OFFSET("hstate.hugepage_activelist", + hstate.hugepage_activelist); + READ_MEMBER_OFFSET("hstate.hugepage_freelists", + hstate.hugepage_freelists); READ_MEMBER_OFFSET("log.ts_nsec", log.ts_nsec); READ_MEMBER_OFFSET("log.len", log.len); READ_MEMBER_OFFSET("log.text_len", log.text_len); @@ -1950,6 +1989,8 @@ read_vmcoreinfo(void) READ_ARRAY_LENGTH("node_memblk", node_memblk); READ_ARRAY_LENGTH("zone.free_area", zone.free_area); READ_ARRAY_LENGTH("free_area.free_list", free_area.free_list); + READ_ARRAY_LENGTH("hstate.hugepage_freelists", + hstate.hugepage_freelists); READ_ARRAY_LENGTH("node_remap_start_pfn", node_remap_start_pfn); READ_NUMBER("NR_FREE_PAGES", NR_FREE_PAGES); @@ -1966,6 +2007,8 @@ read_vmcoreinfo(void) READ_NUMBER("PAGE_BUDDY_MAPCOUNT_VALUE", PAGE_BUDDY_MAPCOUNT_VALUE); + READ_NUMBER("HUGE_MAX_HSTATE", HUGE_MAX_HSTATE); + return TRUE; } @@ -4040,6 +4083,214 @@ exclude_free_page(void) return TRUE; } +inline int +clear_huge_page(unsigned long long pfn, unsigned int order) +{ + unsigned int i; + + DEBUG_MSG("Exclude huge page. start pfn: %lld, order: %d\n", + pfn, order); + + for (i = 0; i < (1 << order); i++) { + if (!clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { + ERRMSG("Can't clear 2nd bitmap! pfn=0x%llx\n", pfn + i); + return FALSE; + } + } + + return TRUE; +} + +int +_exclude_huge_page(void) +{ + int i, node, freelist_length; + unsigned long curr_hstate, curr_page, head, curr, previous, curr_prev; + struct timeval tv_start; + unsigned long long pfn; + unsigned int order; + unsigned long nr_huge_pages, free_huge_pages, active_huge_pages; + + freelist_length = ARRAY_LENGTH(hstate.hugepage_freelists); + /* Exclude free huge pages */ + if (info->dump_level & (DL_EXCLUDE_FREE_HUGE + | DL_EXCLUDE_ACTIVE_HUGE)) { + gettimeofday(&tv_start, NULL); + for (i = 0; i < NUMBER(HUGE_MAX_HSTATE); i++) { + curr_hstate = SYMBOL(hstates) + SIZE(hstate) * i; + /* Read order */ + if (!readmem(VADDR, + curr_hstate + OFFSET(hstate.order), + &order, sizeof(order))) { + ERRMSG("Can't get hstate.order!"); + return FALSE; + } + /* Read free_huge_pages */ + if (!readmem(VADDR, + curr_hstate + OFFSET(hstate.free_huge_pages), + &free_huge_pages, sizeof(free_huge_pages))) { + ERRMSG("Can't get hstate.free_huge_pages!"); + return FALSE; + } + for (node = 0; node < freelist_length; node++) { + /* head = hstate.hugepage_freelists[node] */ + head = curr_hstate + + OFFSET(hstate.hugepage_freelists) + + SIZE(list_head) * node; + if (!readmem(VADDR, + head + OFFSET(list_head.next), + &curr, sizeof(curr))) { + ERRMSG("Can't get free list!"); + return FALSE; + } + curr_prev = head; + /* Walking free list of the node */ + while (head != curr && curr != 0) { + print_progress(PROGRESS_FREE_HUGE, + pfn_free_huge, free_huge_pages); + if (!readmem(VADDR, + curr + OFFSET(list_head.prev), + &previous, sizeof(previous))) { + ERRMSG("Can't get free list!"); + return FALSE; + } + if (previous != curr_prev) { + ERRMSG("Free list is broken!"); + return FALSE; + } + curr_page = curr - OFFSET(page.lru); + pfn = page_to_pfn(curr_page); + if (!clear_huge_page(pfn, order)) + return FALSE; + pfn_free_huge++; + curr_prev = curr; + if (!readmem(VADDR, + curr + OFFSET(list_head.next), + &curr, sizeof(curr))) { + ERRMSG("Can't get free list!"); + return FALSE; + } + } + } + } + /* + * print [100 %] + */ + print_progress(PROGRESS_FREE_HUGE, 1, 1); + print_execution_time(PROGRESS_FREE_HUGE, &tv_start); + } + + /* Exclude active huge pages */ + if (info->dump_level & DL_EXCLUDE_ACTIVE_HUGE) { + gettimeofday(&tv_start, NULL); + for (i = 0; i < NUMBER(HUGE_MAX_HSTATE); i++) { + curr_hstate = SYMBOL(hstates) + SIZE(hstate) * i; + /* Read order */ + if (!readmem(VADDR, + curr_hstate + OFFSET(hstate.order), + &order, sizeof(order))) { + ERRMSG("Can't get hstate.order!"); + return FALSE; + } + /* Read nr_huge_pages */ + if (!readmem(VADDR, + curr_hstate + OFFSET(hstate.nr_huge_pages), + &nr_huge_pages, sizeof(nr_huge_pages))) { + ERRMSG("Can't get hstate.nr_huge_pages!"); + return FALSE; + } + /* Read free_huge_pages */ + if (!readmem(VADDR, + curr_hstate + OFFSET(hstate.free_huge_pages), + &free_huge_pages, sizeof(free_huge_pages))) { + ERRMSG("Can't get hstate.free_huge_pages!"); + return FALSE; + } + if (nr_huge_pages < free_huge_pages) { + ERRMSG("nr_huge_pages < free_huge_pages!"); + return FALSE; + } + active_huge_pages = nr_huge_pages - free_huge_pages; + /* head = hstate.hugepage_freelists[node] */ + head = curr_hstate + OFFSET(hstate.hugepage_activelist); + if (!readmem(VADDR, head + OFFSET(list_head.next), + &curr, sizeof(curr))) { + ERRMSG("Can't get active list!"); + } + curr_prev = head; + /* Walking active list */ + while (head != curr && curr != 0) { + print_progress(PROGRESS_ACTIVE_HUGE, + pfn_active_huge, + active_huge_pages); + if (!readmem(VADDR, + curr + OFFSET(list_head.prev), + &previous, sizeof(previous))) { + ERRMSG("Can't get active list!"); + return FALSE; + } + if (previous != curr_prev) { + ERRMSG("Active list is broken!"); + return FALSE; + } + curr_page = curr - OFFSET(page.lru); + pfn = page_to_pfn(curr_page); + if (!clear_huge_page(pfn, order)) + return FALSE; + pfn_active_huge++; + curr_prev = curr; + if (!readmem(VADDR, + curr + OFFSET(list_head.next), + &curr, sizeof(curr))) { + ERRMSG("Can't get active list!"); + return FALSE; + } + } + } + /* + * print [100 %] + */ + print_progress(PROGRESS_ACTIVE_HUGE, 1, 1); + print_execution_time(PROGRESS_ACTIVE_HUGE, &tv_start); + } + + DEBUG_MSG("\n"); + DEBUG_MSG("free huge pages : %lld\n", pfn_free_huge); + DEBUG_MSG("active huge pages: %lld\n", pfn_active_huge); + + return TRUE; +} + +int +exclude_huge_page(void) +{ + /* + * Check having necessary information. + */ + if (SYMBOL(hstates) == NOT_FOUND_SYMBOL) + ERRMSG("Can't get necessary symbols for huge pages.\n"); + + if ((SIZE(hstate) == NOT_FOUND_STRUCTURE) + || (OFFSET(hstate.order) == NOT_FOUND_STRUCTURE) + || (OFFSET(hstate.nr_huge_pages) == NOT_FOUND_STRUCTURE) + || (OFFSET(hstate.free_huge_pages) == NOT_FOUND_STRUCTURE) + || (OFFSET(hstate.hugepage_activelist) == NOT_FOUND_STRUCTURE) + || (OFFSET(hstate.hugepage_freelists) == NOT_FOUND_STRUCTURE) + || (ARRAY_LENGTH(hstate.hugepage_freelists) + == NOT_FOUND_STRUCTURE)) { + ERRMSG("Can't get necessary structures for huge pages.\n"); + return FALSE; + } + + /* + * Detect huge pages and update 2nd-bitmap. + */ + if (!_exclude_huge_page()) + return FALSE; + + return TRUE; +} + /* * Let C be a cyclic buffer size and B a bitmap size used for * representing maximum block size managed by buddy allocator. @@ -4532,6 +4783,13 @@ exclude_unnecessary_pages_cyclic(void) return FALSE; /* + * Exclude huge pages. + */ + if (info->dump_level & (DL_EXCLUDE_FREE_HUGE | DL_EXCLUDE_ACTIVE_HUGE)) + if (!exclude_huge_page()) + return FALSE; + + /* * Exclude cache pages, cache private pages, user data pages, * free pages and hwpoison pages. */ @@ -4661,6 +4919,13 @@ create_2nd_bitmap(void) return FALSE; /* + * Exclude huge pages. + */ + if (info->dump_level & (DL_EXCLUDE_FREE_HUGE | DL_EXCLUDE_ACTIVE_HUGE)) + if (!exclude_huge_page()) + return FALSE; + + /* * Exclude Xen user domain. */ if (info->flag_exclude_xen_dom) { @@ -6513,6 +6778,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d */ pfn_zero = pfn_cache = pfn_cache_private = 0; pfn_user = pfn_free = pfn_hwpoison = 0; + pfn_free_huge = pfn_active_huge = 0; pfn_memhole = info->max_mapnr; cd_header->offset @@ -7416,7 +7682,8 @@ print_report(void) pfn_original = info->max_mapnr - pfn_memhole; pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private - + pfn_user + pfn_free + pfn_hwpoison; + + pfn_user + pfn_free + pfn_hwpoison + + pfn_free_huge + pfn_active_huge; shrinking = (pfn_original - pfn_excluded) * 100; shrinking = shrinking / pfn_original; @@ -7429,6 +7696,9 @@ print_report(void) pfn_cache_private); REPORT_MSG(" User process data pages : 0x%016llx\n", pfn_user); REPORT_MSG(" Free pages : 0x%016llx\n", pfn_free); + REPORT_MSG(" Free hugepage pages : 0x%016llx\n", pfn_free_huge); + REPORT_MSG(" Active hugepage pages : 0x%016llx\n", + pfn_active_huge); REPORT_MSG(" Hwpoison pages : 0x%016llx\n", pfn_hwpoison); REPORT_MSG(" Remaining pages : 0x%016llx\n", pfn_original - pfn_excluded); diff --git a/makedumpfile.h b/makedumpfile.h index a5826e0..1a0a5fa 100644 --- a/makedumpfile.h +++ b/makedumpfile.h @@ -178,7 +178,7 @@ isAnon(unsigned long mapping) * Dump Level */ #define MIN_DUMP_LEVEL (0) -#define MAX_DUMP_LEVEL (31) +#define MAX_DUMP_LEVEL (127) #define NUM_ARRAY_DUMP_LEVEL (MAX_DUMP_LEVEL + 1) /* enough to allocate all the dump_level */ #define DL_EXCLUDE_ZERO (0x001) /* Exclude Pages filled with Zeros */ @@ -189,6 +189,9 @@ isAnon(unsigned long mapping) #define DL_EXCLUDE_USER_DATA (0x008) /* Exclude UserProcessData Pages */ #define DL_EXCLUDE_FREE (0x010) /* Exclude Free Pages */ +#define DL_EXCLUDE_FREE_HUGE (0x020) /* Exclude Free Huge Pages */ +#define DL_EXCLUDE_ACTIVE_HUGE (0x040) /* Exclude Active Huge Pages */ + /* * For parse_line() @@ -1098,6 +1101,7 @@ struct symbol_table { unsigned long long mem_map; unsigned long long vmem_map; unsigned long long mem_section; + unsigned long long hstates; unsigned long long pkmap_count; unsigned long long pkmap_count_next; unsigned long long system_utsname; @@ -1174,6 +1178,7 @@ struct size_table { long zone; long free_area; long list_head; + long hstate; long node_memblk_s; long nodemask_t; @@ -1232,6 +1237,13 @@ struct offset_table { struct free_area { long free_list; } free_area; + struct hstate { + long order; + long nr_huge_pages; + long free_huge_pages; + long hugepage_activelist; + long hugepage_freelists; + } hstate; struct list_head { long next; long prev; @@ -1368,6 +1380,9 @@ struct array_table { struct free_area_at { long free_list; } free_area; + struct hstate_at { + long hugepage_freelists; + } hstate; struct kimage_at { long segment; } kimage; @@ -1388,6 +1403,8 @@ struct number_table { long PG_hwpoison; long PAGE_BUDDY_MAPCOUNT_VALUE; + + long HUGE_MAX_HSTATE; }; struct srcfile_table { ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/3] makedumpfile: hugepage filtering: add excluding hugepage messages 2013-11-05 13:45 [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Jingbai Ma 2013-11-05 13:45 ` [PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions Jingbai Ma @ 2013-11-05 13:45 ` Jingbai Ma 2013-11-05 13:46 ` [PATCH 3/3] makedumpfile: hugepage filtering: add new dump levels for manual page Jingbai Ma 2013-11-05 20:26 ` [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Vivek Goyal 3 siblings, 0 replies; 25+ messages in thread From: Jingbai Ma @ 2013-11-05 13:45 UTC (permalink / raw) To: ptesarik, d.hatayama, kumagai-atsushi Cc: bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal Add messages for print_info. Signed-off-by: Jingbai Ma <jingbai.ma@hp.com> --- print_info.c | 12 +++++++----- print_info.h | 2 ++ 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/print_info.c b/print_info.c index 06939e0..978d9fb 100644 --- a/print_info.c +++ b/print_info.c @@ -103,17 +103,19 @@ print_usage(void) MSG(" The maximum of Dump_Level is 31.\n"); MSG(" Note that Dump_Level for Xen dump filtering is 0 or 1.\n"); MSG("\n"); - MSG(" | cache cache\n"); - MSG(" Dump | zero without with user free\n"); - MSG(" Level | page private private data page\n"); - MSG(" -------+---------------------------------------\n"); + MSG(" | cache cache free active\n"); + MSG(" Dump | zero without with user free huge huge\n"); + MSG(" Level | page private private data page page page\n"); + MSG(" -------+----------------------------------------------------------\n"); MSG(" 0 |\n"); MSG(" 1 | X\n"); MSG(" 2 | X\n"); MSG(" 4 | X X\n"); MSG(" 8 | X\n"); MSG(" 16 | X\n"); - MSG(" 31 | X X X X X\n"); + MSG(" 32 | X\n"); + MSG(" 64 | X X\n"); + MSG(" 127 | X X X X X X X\n"); MSG("\n"); MSG(" [-E]:\n"); MSG(" Create DUMPFILE in the ELF format.\n"); diff --git a/print_info.h b/print_info.h index 01e3706..8461df6 100644 --- a/print_info.h +++ b/print_info.h @@ -35,6 +35,8 @@ void print_execution_time(char *step_name, struct timeval *tv_start); #define PROGRESS_HOLES "Checking for memory holes " #define PROGRESS_UNN_PAGES "Excluding unnecessary pages" #define PROGRESS_FREE_PAGES "Excluding free pages " +#define PROGRESS_FREE_HUGE "Excluding free huge pages " +#define PROGRESS_ACTIVE_HUGE "Excluding active huge pages" #define PROGRESS_ZERO_PAGES "Excluding zero pages " #define PROGRESS_XEN_DOMAIN "Excluding xen user domain " ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/3] makedumpfile: hugepage filtering: add new dump levels for manual page 2013-11-05 13:45 [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Jingbai Ma 2013-11-05 13:45 ` [PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions Jingbai Ma 2013-11-05 13:45 ` [PATCH 2/3] makedumpfile: hugepage filtering: add excluding hugepage messages Jingbai Ma @ 2013-11-05 13:46 ` Jingbai Ma 2013-11-05 20:26 ` [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Vivek Goyal 3 siblings, 0 replies; 25+ messages in thread From: Jingbai Ma @ 2013-11-05 13:46 UTC (permalink / raw) To: ptesarik, d.hatayama, kumagai-atsushi Cc: bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal Add new dump levels for makedumpfile manual page. Signed-off-by: Jingbai Ma <jingbai.ma@hp.com> --- makedumpfile.8 | 170 ++++++++++++++++++++++++++++++++++++++++++++------------ 1 files changed, 133 insertions(+), 37 deletions(-) diff --git a/makedumpfile.8 b/makedumpfile.8 index adeb811..70e8732 100644 --- a/makedumpfile.8 +++ b/makedumpfile.8 @@ -164,43 +164,139 @@ by dump_level 11, makedumpfile retries it by dump_level 31. .br # makedumpfile \-d 11,31 \-x vmlinux /proc/vmcore dumpfile - | |cache |cache | | - dump | zero |without|with | user | free - level | page |private|private| data | page -.br -\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\- - 0 | | | | | - 1 | X | | | | - 2 | | X | | | - 3 | X | X | | | - 4 | | X | X | | - 5 | X | X | X | | - 6 | | X | X | | - 7 | X | X | X | | - 8 | | | | X | - 9 | X | | | X | - 10 | | X | | X | - 11 | X | X | | X | - 12 | | X | X | X | - 13 | X | X | X | X | - 14 | | X | X | X | - 15 | X | X | X | X | - 16 | | | | | X - 17 | X | | | | X - 18 | | X | | | X - 19 | X | X | | | X - 20 | | X | X | | X - 21 | X | X | X | | X - 22 | | X | X | | X - 23 | X | X | X | | X - 24 | | | | X | X - 25 | X | | | X | X - 26 | | X | | X | X - 27 | X | X | | X | X - 28 | | X | X | X | X - 29 | X | X | X | X | X - 30 | | X | X | X | X - 31 | X | X | X | X | X + | |cache |cache | | | free | active + dump | zero |without|with | user | free | huge | huge + level | page |private|private| data | page | page | page +.br +\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-+\-\-\-\-\-\-\-\- + 0 | | | | | | | + 1 | X | | | | | | + 2 | | X | | | | | + 3 | X | X | | | | | + 4 | | X | X | | | | + 5 | X | X | X | | | | + 6 | | X | X | | | | + 7 | X | X | X | | | | + 8 | | | | X | | | + 9 | X | | | X | | | + 10 | | X | | X | | | + 11 | X | X | | X | | | + 12 | | X | X | X | | | + 13 | X | X | X | X | | | + 14 | | X | X | X | | | + 15 | X | X | X | X | | | + 16 | | | | | X | | + 17 | X | | | | X | | + 18 | | X | | | X | | + 19 | X | X | | | X | | + 20 | | X | X | | X | | + 21 | X | X | X | | X | | + 22 | | X | X | | X | | + 23 | X | X | X | | X | | + 24 | | | | X | X | | + 25 | X | | | X | X | | + 26 | | X | | X | X | | + 27 | X | X | | X | X | | + 28 | | X | X | X | X | | + 29 | X | X | X | X | X | | + 30 | | X | X | X | X | | + 31 | X | X | X | X | X | | + 32 | | | | | | X | + 33 | X | | | | | X | + 34 | | X | | | | X | + 35 | X | X | | | | X | + 36 | | X | X | | | X | + 37 | X | X | X | | | X | + 38 | | X | X | | | X | + 39 | X | X | X | | | X | + 40 | | | | X | | X | + 41 | X | | | X | | X | + 42 | | X | | X | | X | + 43 | X | X | | X | | X | + 44 | | X | X | X | | X | + 45 | X | X | X | X | | X | + 46 | | X | X | X | | X | + 47 | X | X | X | X | | X | + 48 | | | | | X | X | + 49 | X | | | | X | X | + 50 | | X | | | X | X | + 51 | X | X | | | X | X | + 52 | | X | X | | X | X | + 53 | X | X | X | | X | X | + 54 | | X | X | | X | X | + 55 | X | X | X | | X | X | + 56 | | | | X | X | X | + 57 | X | | | X | X | X | + 58 | | X | | X | X | X | + 59 | X | X | | X | X | X | + 60 | | X | X | X | X | X | + 61 | X | X | X | X | X | X | + 62 | | X | X | X | X | X | + 63 | X | X | X | X | X | X | + 64 | | | | | | X | X + 65 | X | | | | | X | X + 66 | | X | | | | X | X + 67 | X | X | | | | X | X + 68 | | X | X | | | X | X + 69 | X | X | X | | | X | X + 70 | | X | X | | | X | X + 71 | X | X | X | | | X | X + 72 | | | | X | | X | X + 73 | X | | | X | | X | X + 74 | | X | | X | | X | X + 75 | X | X | | X | | X | X + 76 | | X | X | X | | X | X + 77 | X | X | X | X | | X | X + 78 | | X | X | X | | X | X + 79 | X | X | X | X | | X | X + 80 | | | | | X | X | X + 81 | X | | | | X | X | X + 82 | | X | | | X | X | X + 83 | X | X | | | X | X | X + 84 | | X | X | | X | X | X + 85 | X | X | X | | X | X | X + 86 | | X | X | | X | X | X + 87 | X | X | X | | X | X | X + 88 | | | | X | X | X | X + 89 | X | | | X | X | X | X + 90 | | X | | X | X | X | X + 91 | X | X | | X | X | X | X + 92 | | X | X | X | X | X | X + 93 | X | X | X | X | X | X | X + 94 | | X | X | X | X | X | X + 95 | X | X | X | X | X | X | X + 96 | | | | | | X | X + 97 | X | | | | | X | X + 98 | | X | | | | X | X + 99 | X | X | | | | X | X + 100 | | X | X | | | X | X + 101 | X | X | X | | | X | X + 102 | | X | X | | | X | X + 103 | X | X | X | | | X | X + 104 | | | | X | | X | X + 105 | X | | | X | | X | X + 106 | | X | | X | | X | X + 107 | X | X | | X | | X | X + 108 | | X | X | X | | X | X + 109 | X | X | X | X | | X | X + 110 | | X | X | X | | X | X + 111 | X | X | X | X | | X | X + 112 | | | | | X | X | X + 113 | X | | | | X | X | X + 114 | | X | | | X | X | X + 115 | X | X | | | X | X | X + 116 | | X | X | | X | X | X + 117 | X | X | X | | X | X | X + 118 | | X | X | | X | X | X + 119 | X | X | X | | X | X | X + 120 | | | | X | X | X | X + 121 | X | | | X | X | X | X + 122 | | X | | X | X | X | X + 123 | X | X | | X | X | X | X + 124 | | X | X | X | X | X | X + 125 | X | X | X | X | X | X | X + 126 | | X | X | X | X | X | X + 127 | X | X | X | X | X | X | X .TP ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-05 13:45 [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Jingbai Ma ` (2 preceding siblings ...) 2013-11-05 13:46 ` [PATCH 3/3] makedumpfile: hugepage filtering: add new dump levels for manual page Jingbai Ma @ 2013-11-05 20:26 ` Vivek Goyal 2013-11-06 1:47 ` Jingbai Ma 2013-11-06 2:21 ` Atsushi Kumagai 3 siblings, 2 replies; 25+ messages in thread From: Vivek Goyal @ 2013-11-05 20:26 UTC (permalink / raw) To: Jingbai Ma Cc: ptesarik, d.hatayama, kumagai-atsushi, bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: > This patch set intend to exclude unnecessary hugepages from vmcore dump file. > > This patch requires the kernel patch to export necessary data structures into > vmcore: "kexec: export hugepage data structure into vmcoreinfo" > http://lists.infradead.org/pipermail/kexec/2013-November/009997.html > > This patch introduce two new dump levels 32 and 64 to exclude all unused and > active hugepages. The level to exclude all unnecessary pages will be 127 now. Interesting. Why hugepages should be treated any differentely than normal pages? If user asked to filter out free page, then it should be filtered and it should not matter whether it is a huge page or not? Thanks Vivek ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-05 20:26 ` [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Vivek Goyal @ 2013-11-06 1:47 ` Jingbai Ma 2013-11-06 1:53 ` Vivek Goyal 2013-11-06 2:21 ` Atsushi Kumagai 1 sibling, 1 reply; 25+ messages in thread From: Jingbai Ma @ 2013-11-06 1:47 UTC (permalink / raw) To: Vivek Goyal Cc: Jingbai Ma, ptesarik, d.hatayama, kumagai-atsushi, bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm On 11/06/2013 04:26 AM, Vivek Goyal wrote: > On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >> >> This patch requires the kernel patch to export necessary data structures into >> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >> >> This patch introduce two new dump levels 32 and 64 to exclude all unused and >> active hugepages. The level to exclude all unnecessary pages will be 127 now. > > Interesting. Why hugepages should be treated any differentely than normal > pages? > > If user asked to filter out free page, then it should be filtered and > it should not matter whether it is a huge page or not? Yes, free hugepages should be filtered out with other free pages. It sounds reasonable. But for active hugepages, I would offer user more choices/flexibility. (maybe bad). I'm OK to filter active hugepages with other user data page. Any other comments? > > Thanks > Vivek -- Thanks, Jingbai Ma ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-06 1:47 ` Jingbai Ma @ 2013-11-06 1:53 ` Vivek Goyal 0 siblings, 0 replies; 25+ messages in thread From: Vivek Goyal @ 2013-11-06 1:53 UTC (permalink / raw) To: Jingbai Ma Cc: ptesarik, d.hatayama, kumagai-atsushi, bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm On Wed, Nov 06, 2013 at 09:47:49AM +0800, Jingbai Ma wrote: > On 11/06/2013 04:26 AM, Vivek Goyal wrote: > >On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: > >>This patch set intend to exclude unnecessary hugepages from vmcore dump file. > >> > >>This patch requires the kernel patch to export necessary data structures into > >>vmcore: "kexec: export hugepage data structure into vmcoreinfo" > >>http://lists.infradead.org/pipermail/kexec/2013-November/009997.html > >> > >>This patch introduce two new dump levels 32 and 64 to exclude all unused and > >>active hugepages. The level to exclude all unnecessary pages will be 127 now. > > > >Interesting. Why hugepages should be treated any differentely than normal > >pages? > > > >If user asked to filter out free page, then it should be filtered and > >it should not matter whether it is a huge page or not? > > Yes, free hugepages should be filtered out with other free pages. It > sounds reasonable. > > But for active hugepages, I would offer user more > choices/flexibility. (maybe bad). > I'm OK to filter active hugepages with other user data page. > > Any other comments? I really can't see why hugepages are different than regular pages when it comes to filtering. IMO, we really should not create filtering option/levels only for huge pages, until and unless there is a strong use case. Thanks Vivek ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-05 20:26 ` [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Vivek Goyal 2013-11-06 1:47 ` Jingbai Ma @ 2013-11-06 2:21 ` Atsushi Kumagai 2013-11-06 14:23 ` Vivek Goyal 2013-11-07 0:54 ` HATAYAMA Daisuke 1 sibling, 2 replies; 25+ messages in thread From: Atsushi Kumagai @ 2013-11-06 2:21 UTC (permalink / raw) To: vgoyal Cc: jingbai.ma, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, d.hatayama, ebiederm, anderson (2013/11/06 5:27), Vivek Goyal wrote: > On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >> >> This patch requires the kernel patch to export necessary data structures into >> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >> >> This patch introduce two new dump levels 32 and 64 to exclude all unused and >> active hugepages. The level to exclude all unnecessary pages will be 127 now. > > Interesting. Why hugepages should be treated any differentely than normal > pages? > > If user asked to filter out free page, then it should be filtered and > it should not matter whether it is a huge page or not? I'm making a RFC patch of hugepages filtering based on such policy. I attach the prototype version. It's able to filter out also THPs, and suitable for cyclic processing because it depends on mem_map and looking up it can be divided into cycles. This is the same idea as page_is_buddy(). So I think it's better. -- Thanks Atsushi Kumagai From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Date: Wed, 6 Nov 2013 10:10:43 +0900 Subject: [PATCH] [RFC] Exclude hugepages. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> --- makedumpfile.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- makedumpfile.h | 8 ++++ 2 files changed, 125 insertions(+), 5 deletions(-) diff --git a/makedumpfile.c b/makedumpfile.c index 428c53e..75b7123 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -63,6 +63,7 @@ do { \ static void check_cyclic_buffer_overrun(void); static void setup_page_is_buddy(void); +static void setup_page_is_hugepage(void); void initialize_tables(void) @@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) { } static int +page_is_hugepage(unsigned long flags) { + if (NUMBER(PG_head) != NOT_FOUND_NUMBER) { + return isHead(flags); + } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) { + return isTail(flags); + }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) { + return isCompound(flags); + } + return 0; +} + +static int is_mapped_with_mmap(off_t offset) { if (info->flag_usemmap @@ -1107,6 +1120,8 @@ get_symbol_info(void) SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn, "node_remap_start_pfn"); + SYMBOL_INIT(free_huge_page, "free_huge_page"); + return TRUE; } @@ -1214,11 +1229,19 @@ get_structure_info(void) ENUM_NUMBER_INIT(PG_lru, "PG_lru"); ENUM_NUMBER_INIT(PG_private, "PG_private"); + ENUM_NUMBER_INIT(PG_head, "PG_head"); + ENUM_NUMBER_INIT(PG_tail, "PG_tail"); + ENUM_NUMBER_INIT(PG_compound, "PG_compound"); ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache"); ENUM_NUMBER_INIT(PG_buddy, "PG_buddy"); ENUM_NUMBER_INIT(PG_slab, "PG_slab"); ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison"); + if (NUMBER(PG_head) == NOT_FOUND_NUMBER && + NUMBER(PG_compound) == NOT_FOUND_NUMBER) + /* Pre-2.6.26 kernels did not have pageflags */ + NUMBER(PG_compound) = PG_compound_ORIGINAL; + ENUM_TYPE_SIZE_INIT(pageflags, "pageflags"); TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t"); @@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void) WRITE_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr); WRITE_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr); WRITE_SYMBOL("node_remap_start_pfn", node_remap_start_pfn); + WRITE_SYMBOL("free_huge_page", free_huge_page); /* * write the structure size of 1st kernel @@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void) WRITE_NUMBER("PG_lru", PG_lru); WRITE_NUMBER("PG_private", PG_private); + WRITE_NUMBER("PG_head", PG_head); + WRITE_NUMBER("PG_tail", PG_tail); + WRITE_NUMBER("PG_compound", PG_compound); WRITE_NUMBER("PG_swapcache", PG_swapcache); WRITE_NUMBER("PG_buddy", PG_buddy); WRITE_NUMBER("PG_slab", PG_slab); @@ -1932,6 +1959,7 @@ read_vmcoreinfo(void) READ_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr); READ_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr); READ_SYMBOL("node_remap_start_pfn", node_remap_start_pfn); + READ_SYMBOL("free_huge_page", free_huge_page); READ_STRUCTURE_SIZE("page", page); READ_STRUCTURE_SIZE("mem_section", mem_section); @@ -2000,6 +2028,9 @@ read_vmcoreinfo(void) READ_NUMBER("PG_lru", PG_lru); READ_NUMBER("PG_private", PG_private); + READ_NUMBER("PG_head", PG_head); + READ_NUMBER("PG_tail", PG_tail); + READ_NUMBER("PG_compound", PG_compound); READ_NUMBER("PG_swapcache", PG_swapcache); READ_NUMBER("PG_slab", PG_slab); READ_NUMBER("PG_buddy", PG_buddy); @@ -3126,6 +3157,9 @@ out: if (!get_value_for_old_linux()) return FALSE; + /* Get page flags for compound pages */ + setup_page_is_hugepage(); + /* use buddy identification of free pages whether cyclic or not */ /* (this can reduce pages scan of 1TB memory from 60sec to 30sec) */ if (info->dump_level & DL_EXCLUDE_FREE) @@ -4197,6 +4231,23 @@ out: "follow free lists instead of mem_map array.\n"); } +static void +setup_page_is_hugepage(void) +{ + if (NUMBER(PG_head) != NOT_FOUND_NUMBER) { + if (NUMBER(PG_tail) == NOT_FOUND_NUMBER) { + /* If PG_tail is not explicitly saved, then assume + * that it immediately follows PG_head. + */ + NUMBER(PG_tail) = NUMBER(PG_head) + 1; + } + } else if ((NUMBER(PG_compound) != NOT_FOUND_NUMBER) + && (info->dump_level & DL_EXCLUDE_USER_DATA)) { + MSG("Compound page bit could not be determined: "); + MSG("huge pages will NOT be filtered.\n"); + } +} + /* * If using a dumpfile in kdump-compressed format as a source file * instead of /proc/vmcore, 1st-bitmap of a new dumpfile must be @@ -4404,8 +4455,9 @@ __exclude_unnecessary_pages(unsigned long mem_map, unsigned long long pfn_read_start, pfn_read_end, index_pg; unsigned char page_cache[SIZE(page) * PGMM_CACHED]; unsigned char *pcache; - unsigned int _count, _mapcount = 0; + unsigned int _count, _mapcount = 0, compound_order = 0; unsigned long flags, mapping, private = 0; + unsigned long hugetlb_dtor; /* * Refresh the buffer of struct page, when changing mem_map. @@ -4459,6 +4511,27 @@ __exclude_unnecessary_pages(unsigned long mem_map, flags = ULONG(pcache + OFFSET(page.flags)); _count = UINT(pcache + OFFSET(page._count)); mapping = ULONG(pcache + OFFSET(page.mapping)); + + if (index_pg < PGMM_CACHED - 1) { + compound_order = ULONG(pcache + SIZE(page) + OFFSET(page.lru) + + OFFSET(list_head.prev)); + hugetlb_dtor = ULONG(pcache + SIZE(page) + OFFSET(page.lru) + + OFFSET(list_head.next)); + } else if (pfn + 1 < pfn_end) { + unsigned char page_cache_next[SIZE(page)]; + if (!readmem(VADDR, mem_map, page_cache_next, SIZE(page))) { + ERRMSG("Can't read the buffer of struct page.\n"); + return FALSE; + } + compound_order = ULONG(page_cache_next + OFFSET(page.lru) + + OFFSET(list_head.prev)); + hugetlb_dtor = ULONG(page_cache_next + OFFSET(page.lru) + + OFFSET(list_head.next)); + } else { + compound_order = 0; + hugetlb_dtor = 0; + } + if (OFFSET(page._mapcount) != NOT_FOUND_STRUCTURE) _mapcount = UINT(pcache + OFFSET(page._mapcount)); if (OFFSET(page.private) != NOT_FOUND_STRUCTURE) @@ -4497,6 +4570,10 @@ __exclude_unnecessary_pages(unsigned long mem_map, && !isPrivate(flags) && !isAnon(mapping)) { if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) pfn_cache++; + /* + * NOTE: If THP for cache is introduced, the check for + * compound pages is needed here. + */ } /* * Exclude the cache page with the private page. @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, && !isAnon(mapping)) { if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) pfn_cache_private++; + /* + * NOTE: If THP for cache is introduced, the check for + * compound pages is needed here. + */ } /* * Exclude the data page of the user process. */ - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) - && isAnon(mapping)) { - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) - pfn_user++; + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { + /* + * Exclude the anonnymous pages as user pages. + */ + if (isAnon(mapping)) { + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) + pfn_user++; + + /* + * Check the compound page + */ + if (page_is_hugepage(flags) && compound_order > 0) { + int i, nr_pages = 1 << compound_order; + + for (i = 1; i < nr_pages; ++i) { + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) + pfn_user++; + } + pfn += nr_pages - 2; + mem_map += (nr_pages - 1) * SIZE(page); + } + } + /* + * Exclude the hugetlbfs pages as user pages. + */ + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { + int i, nr_pages = 1 << compound_order; + + for (i = 0; i < nr_pages; ++i) { + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) + pfn_user++; + } + pfn += nr_pages - 1; + mem_map += (nr_pages - 1) * SIZE(page); + } } /* * Exclude the hwpoison page. diff --git a/makedumpfile.h b/makedumpfile.h index 3a7e61a..d6ee832 100644 --- a/makedumpfile.h +++ b/makedumpfile.h @@ -74,6 +74,7 @@ int get_mem_type(void); #define PG_lru_ORIGINAL (5) #define PG_slab_ORIGINAL (7) #define PG_private_ORIGINAL (11) /* Has something at ->private */ +#define PG_compound_ORIGINAL (14) /* Is part of a compound page */ #define PG_swapcache_ORIGINAL (15) /* Swap page: swp_entry_t in private */ #define PAGE_BUDDY_MAPCOUNT_VALUE_v2_6_38 (-2) @@ -140,6 +141,9 @@ test_bit(int nr, unsigned long addr) #define isLRU(flags) test_bit(NUMBER(PG_lru), flags) #define isPrivate(flags) test_bit(NUMBER(PG_private), flags) +#define isHead(flags) test_bit(NUMBER(PG_head), flags) +#define isTail(flags) test_bit(NUMBER(PG_tail), flags) +#define isCompound(flags) test_bit(NUMBER(PG_compound), flags) #define isSwapCache(flags) test_bit(NUMBER(PG_swapcache), flags) #define isHWPOISON(flags) (test_bit(NUMBER(PG_hwpoison), flags) \ && (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) @@ -1124,6 +1128,7 @@ struct symbol_table { unsigned long long node_remap_start_vaddr; unsigned long long node_remap_end_vaddr; unsigned long long node_remap_start_pfn; + unsigned long long free_huge_page; /* * for Xen extraction @@ -1383,6 +1388,9 @@ struct number_table { */ long PG_lru; long PG_private; + long PG_head; + long PG_tail; + long PG_compound; long PG_swapcache; long PG_buddy; long PG_slab; -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-06 2:21 ` Atsushi Kumagai @ 2013-11-06 14:23 ` Vivek Goyal 2013-11-07 8:57 ` Jingbai Ma 2013-11-07 0:54 ` HATAYAMA Daisuke 1 sibling, 1 reply; 25+ messages in thread From: Vivek Goyal @ 2013-11-06 14:23 UTC (permalink / raw) To: Atsushi Kumagai Cc: jingbai.ma, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, d.hatayama, ebiederm, anderson On Wed, Nov 06, 2013 at 02:21:39AM +0000, Atsushi Kumagai wrote: > (2013/11/06 5:27), Vivek Goyal wrote: > > On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: > >> This patch set intend to exclude unnecessary hugepages from vmcore dump file. > >> > >> This patch requires the kernel patch to export necessary data structures into > >> vmcore: "kexec: export hugepage data structure into vmcoreinfo" > >> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html > >> > >> This patch introduce two new dump levels 32 and 64 to exclude all unused and > >> active hugepages. The level to exclude all unnecessary pages will be 127 now. > > > > Interesting. Why hugepages should be treated any differentely than normal > > pages? > > > > If user asked to filter out free page, then it should be filtered and > > it should not matter whether it is a huge page or not? > > I'm making a RFC patch of hugepages filtering based on such policy. > > I attach the prototype version. > It's able to filter out also THPs, and suitable for cyclic processing > because it depends on mem_map and looking up it can be divided into > cycles. This is the same idea as page_is_buddy(). > > So I think it's better. Agreed. Being able to treat hugepages in same manner as other pages sounds good. Jingbai, looks good to you? Thanks Vivek > > -- > Thanks > Atsushi Kumagai > > > From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> > Date: Wed, 6 Nov 2013 10:10:43 +0900 > Subject: [PATCH] [RFC] Exclude hugepages. > > Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> > --- > makedumpfile.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > makedumpfile.h | 8 ++++ > 2 files changed, 125 insertions(+), 5 deletions(-) > > diff --git a/makedumpfile.c b/makedumpfile.c > index 428c53e..75b7123 100644 > --- a/makedumpfile.c > +++ b/makedumpfile.c > @@ -63,6 +63,7 @@ do { \ > > static void check_cyclic_buffer_overrun(void); > static void setup_page_is_buddy(void); > +static void setup_page_is_hugepage(void); > > void > initialize_tables(void) > @@ -270,6 +271,18 @@ update_mmap_range(off_t offset, int initial) { > } > > static int > +page_is_hugepage(unsigned long flags) { > + if (NUMBER(PG_head) != NOT_FOUND_NUMBER) { > + return isHead(flags); > + } else if (NUMBER(PG_tail) != NOT_FOUND_NUMBER) { > + return isTail(flags); > + }if (NUMBER(PG_compound) != NOT_FOUND_NUMBER) { > + return isCompound(flags); > + } > + return 0; > +} > + > +static int > is_mapped_with_mmap(off_t offset) { > > if (info->flag_usemmap > @@ -1107,6 +1120,8 @@ get_symbol_info(void) > SYMBOL_ARRAY_LENGTH_INIT(node_remap_start_pfn, > "node_remap_start_pfn"); > > + SYMBOL_INIT(free_huge_page, "free_huge_page"); > + > return TRUE; > } > > @@ -1214,11 +1229,19 @@ get_structure_info(void) > > ENUM_NUMBER_INIT(PG_lru, "PG_lru"); > ENUM_NUMBER_INIT(PG_private, "PG_private"); > + ENUM_NUMBER_INIT(PG_head, "PG_head"); > + ENUM_NUMBER_INIT(PG_tail, "PG_tail"); > + ENUM_NUMBER_INIT(PG_compound, "PG_compound"); > ENUM_NUMBER_INIT(PG_swapcache, "PG_swapcache"); > ENUM_NUMBER_INIT(PG_buddy, "PG_buddy"); > ENUM_NUMBER_INIT(PG_slab, "PG_slab"); > ENUM_NUMBER_INIT(PG_hwpoison, "PG_hwpoison"); > > + if (NUMBER(PG_head) == NOT_FOUND_NUMBER && > + NUMBER(PG_compound) == NOT_FOUND_NUMBER) > + /* Pre-2.6.26 kernels did not have pageflags */ > + NUMBER(PG_compound) = PG_compound_ORIGINAL; > + > ENUM_TYPE_SIZE_INIT(pageflags, "pageflags"); > > TYPEDEF_SIZE_INIT(nodemask_t, "nodemask_t"); > @@ -1603,6 +1626,7 @@ write_vmcoreinfo_data(void) > WRITE_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr); > WRITE_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr); > WRITE_SYMBOL("node_remap_start_pfn", node_remap_start_pfn); > + WRITE_SYMBOL("free_huge_page", free_huge_page); > > /* > * write the structure size of 1st kernel > @@ -1685,6 +1709,9 @@ write_vmcoreinfo_data(void) > > WRITE_NUMBER("PG_lru", PG_lru); > WRITE_NUMBER("PG_private", PG_private); > + WRITE_NUMBER("PG_head", PG_head); > + WRITE_NUMBER("PG_tail", PG_tail); > + WRITE_NUMBER("PG_compound", PG_compound); > WRITE_NUMBER("PG_swapcache", PG_swapcache); > WRITE_NUMBER("PG_buddy", PG_buddy); > WRITE_NUMBER("PG_slab", PG_slab); > @@ -1932,6 +1959,7 @@ read_vmcoreinfo(void) > READ_SYMBOL("node_remap_start_vaddr", node_remap_start_vaddr); > READ_SYMBOL("node_remap_end_vaddr", node_remap_end_vaddr); > READ_SYMBOL("node_remap_start_pfn", node_remap_start_pfn); > + READ_SYMBOL("free_huge_page", free_huge_page); > > READ_STRUCTURE_SIZE("page", page); > READ_STRUCTURE_SIZE("mem_section", mem_section); > @@ -2000,6 +2028,9 @@ read_vmcoreinfo(void) > > READ_NUMBER("PG_lru", PG_lru); > READ_NUMBER("PG_private", PG_private); > + READ_NUMBER("PG_head", PG_head); > + READ_NUMBER("PG_tail", PG_tail); > + READ_NUMBER("PG_compound", PG_compound); > READ_NUMBER("PG_swapcache", PG_swapcache); > READ_NUMBER("PG_slab", PG_slab); > READ_NUMBER("PG_buddy", PG_buddy); > @@ -3126,6 +3157,9 @@ out: > if (!get_value_for_old_linux()) > return FALSE; > > + /* Get page flags for compound pages */ > + setup_page_is_hugepage(); > + > /* use buddy identification of free pages whether cyclic or not */ > /* (this can reduce pages scan of 1TB memory from 60sec to 30sec) */ > if (info->dump_level & DL_EXCLUDE_FREE) > @@ -4197,6 +4231,23 @@ out: > "follow free lists instead of mem_map array.\n"); > } > > +static void > +setup_page_is_hugepage(void) > +{ > + if (NUMBER(PG_head) != NOT_FOUND_NUMBER) { > + if (NUMBER(PG_tail) == NOT_FOUND_NUMBER) { > + /* If PG_tail is not explicitly saved, then assume > + * that it immediately follows PG_head. > + */ > + NUMBER(PG_tail) = NUMBER(PG_head) + 1; > + } > + } else if ((NUMBER(PG_compound) != NOT_FOUND_NUMBER) > + && (info->dump_level & DL_EXCLUDE_USER_DATA)) { > + MSG("Compound page bit could not be determined: "); > + MSG("huge pages will NOT be filtered.\n"); > + } > +} > + > /* > * If using a dumpfile in kdump-compressed format as a source file > * instead of /proc/vmcore, 1st-bitmap of a new dumpfile must be > @@ -4404,8 +4455,9 @@ __exclude_unnecessary_pages(unsigned long mem_map, > unsigned long long pfn_read_start, pfn_read_end, index_pg; > unsigned char page_cache[SIZE(page) * PGMM_CACHED]; > unsigned char *pcache; > - unsigned int _count, _mapcount = 0; > + unsigned int _count, _mapcount = 0, compound_order = 0; > unsigned long flags, mapping, private = 0; > + unsigned long hugetlb_dtor; > > /* > * Refresh the buffer of struct page, when changing mem_map. > @@ -4459,6 +4511,27 @@ __exclude_unnecessary_pages(unsigned long mem_map, > flags = ULONG(pcache + OFFSET(page.flags)); > _count = UINT(pcache + OFFSET(page._count)); > mapping = ULONG(pcache + OFFSET(page.mapping)); > + > + if (index_pg < PGMM_CACHED - 1) { > + compound_order = ULONG(pcache + SIZE(page) + OFFSET(page.lru) > + + OFFSET(list_head.prev)); > + hugetlb_dtor = ULONG(pcache + SIZE(page) + OFFSET(page.lru) > + + OFFSET(list_head.next)); > + } else if (pfn + 1 < pfn_end) { > + unsigned char page_cache_next[SIZE(page)]; > + if (!readmem(VADDR, mem_map, page_cache_next, SIZE(page))) { > + ERRMSG("Can't read the buffer of struct page.\n"); > + return FALSE; > + } > + compound_order = ULONG(page_cache_next + OFFSET(page.lru) > + + OFFSET(list_head.prev)); > + hugetlb_dtor = ULONG(page_cache_next + OFFSET(page.lru) > + + OFFSET(list_head.next)); > + } else { > + compound_order = 0; > + hugetlb_dtor = 0; > + } > + > if (OFFSET(page._mapcount) != NOT_FOUND_STRUCTURE) > _mapcount = UINT(pcache + OFFSET(page._mapcount)); > if (OFFSET(page.private) != NOT_FOUND_STRUCTURE) > @@ -4497,6 +4570,10 @@ __exclude_unnecessary_pages(unsigned long mem_map, > && !isPrivate(flags) && !isAnon(mapping)) { > if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > pfn_cache++; > + /* > + * NOTE: If THP for cache is introduced, the check for > + * compound pages is needed here. > + */ > } > /* > * Exclude the cache page with the private page. > @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, > && !isAnon(mapping)) { > if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > pfn_cache_private++; > + /* > + * NOTE: If THP for cache is introduced, the check for > + * compound pages is needed here. > + */ > } > /* > * Exclude the data page of the user process. > */ > - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) > - && isAnon(mapping)) { > - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > - pfn_user++; > + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { > + /* > + * Exclude the anonnymous pages as user pages. > + */ > + if (isAnon(mapping)) { > + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > + pfn_user++; > + > + /* > + * Check the compound page > + */ > + if (page_is_hugepage(flags) && compound_order > 0) { > + int i, nr_pages = 1 << compound_order; > + > + for (i = 1; i < nr_pages; ++i) { > + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) > + pfn_user++; > + } > + pfn += nr_pages - 2; > + mem_map += (nr_pages - 1) * SIZE(page); > + } > + } > + /* > + * Exclude the hugetlbfs pages as user pages. > + */ > + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { > + int i, nr_pages = 1 << compound_order; > + > + for (i = 0; i < nr_pages; ++i) { > + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) > + pfn_user++; > + } > + pfn += nr_pages - 1; > + mem_map += (nr_pages - 1) * SIZE(page); > + } > } > /* > * Exclude the hwpoison page. > diff --git a/makedumpfile.h b/makedumpfile.h > index 3a7e61a..d6ee832 100644 > --- a/makedumpfile.h > +++ b/makedumpfile.h > @@ -74,6 +74,7 @@ int get_mem_type(void); > #define PG_lru_ORIGINAL (5) > #define PG_slab_ORIGINAL (7) > #define PG_private_ORIGINAL (11) /* Has something at ->private */ > +#define PG_compound_ORIGINAL (14) /* Is part of a compound page */ > #define PG_swapcache_ORIGINAL (15) /* Swap page: swp_entry_t in private */ > > #define PAGE_BUDDY_MAPCOUNT_VALUE_v2_6_38 (-2) > @@ -140,6 +141,9 @@ test_bit(int nr, unsigned long addr) > > #define isLRU(flags) test_bit(NUMBER(PG_lru), flags) > #define isPrivate(flags) test_bit(NUMBER(PG_private), flags) > +#define isHead(flags) test_bit(NUMBER(PG_head), flags) > +#define isTail(flags) test_bit(NUMBER(PG_tail), flags) > +#define isCompound(flags) test_bit(NUMBER(PG_compound), flags) > #define isSwapCache(flags) test_bit(NUMBER(PG_swapcache), flags) > #define isHWPOISON(flags) (test_bit(NUMBER(PG_hwpoison), flags) \ > && (NUMBER(PG_hwpoison) != NOT_FOUND_NUMBER)) > @@ -1124,6 +1128,7 @@ struct symbol_table { > unsigned long long node_remap_start_vaddr; > unsigned long long node_remap_end_vaddr; > unsigned long long node_remap_start_pfn; > + unsigned long long free_huge_page; > > /* > * for Xen extraction > @@ -1383,6 +1388,9 @@ struct number_table { > */ > long PG_lru; > long PG_private; > + long PG_head; > + long PG_tail; > + long PG_compound; > long PG_swapcache; > long PG_buddy; > long PG_slab; > -- > 1.8.0.2 > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-06 14:23 ` Vivek Goyal @ 2013-11-07 8:57 ` Jingbai Ma 2013-11-08 5:12 ` Atsushi Kumagai 0 siblings, 1 reply; 25+ messages in thread From: Jingbai Ma @ 2013-11-07 8:57 UTC (permalink / raw) To: Vivek Goyal, Atsushi Kumagai Cc: jingbai.ma, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, d.hatayama, ebiederm, anderson On 11/06/2013 10:23 PM, Vivek Goyal wrote: > On Wed, Nov 06, 2013 at 02:21:39AM +0000, Atsushi Kumagai wrote: >> (2013/11/06 5:27), Vivek Goyal wrote: >>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>> >>>> This patch requires the kernel patch to export necessary data structures into >>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>> >>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>> >>> Interesting. Why hugepages should be treated any differentely than normal >>> pages? >>> >>> If user asked to filter out free page, then it should be filtered and >>> it should not matter whether it is a huge page or not? >> >> I'm making a RFC patch of hugepages filtering based on such policy. >> >> I attach the prototype version. >> It's able to filter out also THPs, and suitable for cyclic processing >> because it depends on mem_map and looking up it can be divided into >> cycles. This is the same idea as page_is_buddy(). >> >> So I think it's better. > > Agreed. Being able to treat hugepages in same manner as other pages > sounds good. > > Jingbai, looks good to you? It looks good to me. My only concern is by this way, we only can exclude all hugepage together, but can't exclude the free hugepages only. I'm not sure if user need to dump out the activated hugepage only. Kumagai-san, please correct me, if I'm wrong. > > Thanks > Vivek > >> >> -- >> Thanks >> Atsushi Kumagai >> -- Thanks, Jingbai Ma ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-07 8:57 ` Jingbai Ma @ 2013-11-08 5:12 ` Atsushi Kumagai 2013-11-08 5:21 ` HATAYAMA Daisuke 0 siblings, 1 reply; 25+ messages in thread From: Atsushi Kumagai @ 2013-11-08 5:12 UTC (permalink / raw) To: jingbai.ma Cc: vgoyal, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, d.hatayama, anderson, ebiederm Hello Jingbai, (2013/11/07 17:58), Jingbai Ma wrote: > On 11/06/2013 10:23 PM, Vivek Goyal wrote: >> On Wed, Nov 06, 2013 at 02:21:39AM +0000, Atsushi Kumagai wrote: >>> (2013/11/06 5:27), Vivek Goyal wrote: >>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>>> >>>>> This patch requires the kernel patch to export necessary data structures into >>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>>> >>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>>> >>>> Interesting. Why hugepages should be treated any differentely than normal >>>> pages? >>>> >>>> If user asked to filter out free page, then it should be filtered and >>>> it should not matter whether it is a huge page or not? >>> >>> I'm making a RFC patch of hugepages filtering based on such policy. >>> >>> I attach the prototype version. >>> It's able to filter out also THPs, and suitable for cyclic processing >>> because it depends on mem_map and looking up it can be divided into >>> cycles. This is the same idea as page_is_buddy(). >>> >>> So I think it's better. >> >> Agreed. Being able to treat hugepages in same manner as other pages >> sounds good. >> >> Jingbai, looks good to you? > > It looks good to me. > > My only concern is by this way, we only can exclude all hugepage together, but can't exclude the free hugepages only. I'm not sure if user need to dump out the activated hugepage only. > > Kumagai-san, please correct me, if I'm wrong. Yes, my patch treats all allocated hugetlbfs pages as user pages, doesn't distinguish whether the pages are actually used or not. I made so because I guess it's enough for almost all users. We can introduce new dump level after it's needed actually, but I don't think now is the time. To introduce it without demand will make this tool just more complex. Thanks Atsushi Kumagai ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-08 5:12 ` Atsushi Kumagai @ 2013-11-08 5:21 ` HATAYAMA Daisuke 2013-11-08 5:27 ` Jingbai Ma 0 siblings, 1 reply; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-11-08 5:21 UTC (permalink / raw) To: Atsushi Kumagai Cc: jingbai.ma, vgoyal, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm (2013/11/08 14:12), Atsushi Kumagai wrote: > Hello Jingbai, > > (2013/11/07 17:58), Jingbai Ma wrote: >> On 11/06/2013 10:23 PM, Vivek Goyal wrote: >>> On Wed, Nov 06, 2013 at 02:21:39AM +0000, Atsushi Kumagai wrote: >>>> (2013/11/06 5:27), Vivek Goyal wrote: >>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>>>> >>>>>> This patch requires the kernel patch to export necessary data structures into >>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>>>> >>>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>>>> >>>>> Interesting. Why hugepages should be treated any differentely than normal >>>>> pages? >>>>> >>>>> If user asked to filter out free page, then it should be filtered and >>>>> it should not matter whether it is a huge page or not? >>>> >>>> I'm making a RFC patch of hugepages filtering based on such policy. >>>> >>>> I attach the prototype version. >>>> It's able to filter out also THPs, and suitable for cyclic processing >>>> because it depends on mem_map and looking up it can be divided into >>>> cycles. This is the same idea as page_is_buddy(). >>>> >>>> So I think it's better. >>> >>> Agreed. Being able to treat hugepages in same manner as other pages >>> sounds good. >>> >>> Jingbai, looks good to you? >> >> It looks good to me. >> >> My only concern is by this way, we only can exclude all hugepage together, but can't exclude the free hugepages only. I'm not sure if user need to dump out the activated hugepage only. >> >> Kumagai-san, please correct me, if I'm wrong. > > Yes, my patch treats all allocated hugetlbfs pages as user pages, > doesn't distinguish whether the pages are actually used or not. > I made so because I guess it's enough for almost all users. > > We can introduce new dump level after it's needed actually, > but I don't think now is the time. To introduce it without > demand will make this tool just more complex. > Typically, users would allocate huge pages as much as actually they use only, in order not to waste system memory. So, this design seems reasonable. -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-08 5:21 ` HATAYAMA Daisuke @ 2013-11-08 5:27 ` Jingbai Ma 2013-11-11 9:06 ` Petr Tesarik 0 siblings, 1 reply; 25+ messages in thread From: Jingbai Ma @ 2013-11-08 5:27 UTC (permalink / raw) To: HATAYAMA Daisuke Cc: Atsushi Kumagai, jingbai.ma, vgoyal, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote: > (2013/11/08 14:12), Atsushi Kumagai wrote: >> Hello Jingbai, >> >> (2013/11/07 17:58), Jingbai Ma wrote: >>> On 11/06/2013 10:23 PM, Vivek Goyal wrote: >>>> On Wed, Nov 06, 2013 at 02:21:39AM +0000, Atsushi Kumagai wrote: >>>>> (2013/11/06 5:27), Vivek Goyal wrote: >>>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>>>>> >>>>>>> This patch requires the kernel patch to export necessary data structures into >>>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>>>>> >>>>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>>>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>>>>> >>>>>> Interesting. Why hugepages should be treated any differentely than normal >>>>>> pages? >>>>>> >>>>>> If user asked to filter out free page, then it should be filtered and >>>>>> it should not matter whether it is a huge page or not? >>>>> >>>>> I'm making a RFC patch of hugepages filtering based on such policy. >>>>> >>>>> I attach the prototype version. >>>>> It's able to filter out also THPs, and suitable for cyclic processing >>>>> because it depends on mem_map and looking up it can be divided into >>>>> cycles. This is the same idea as page_is_buddy(). >>>>> >>>>> So I think it's better. >>>> >>>> Agreed. Being able to treat hugepages in same manner as other pages >>>> sounds good. >>>> >>>> Jingbai, looks good to you? >>> >>> It looks good to me. >>> >>> My only concern is by this way, we only can exclude all hugepage together, but can't exclude the free hugepages only. I'm not sure if user need to dump out the activated hugepage only. >>> >>> Kumagai-san, please correct me, if I'm wrong. >> >> Yes, my patch treats all allocated hugetlbfs pages as user pages, >> doesn't distinguish whether the pages are actually used or not. >> I made so because I guess it's enough for almost all users. >> >> We can introduce new dump level after it's needed actually, >> but I don't think now is the time. To introduce it without >> demand will make this tool just more complex. >> > > Typically, users would allocate huge pages as much as actually they use only, > in order not to waste system memory. So, this design seems reasonable. > OK, It looks reasonable. Thanks! -- Thanks, Jingbai Ma ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-08 5:27 ` Jingbai Ma @ 2013-11-11 9:06 ` Petr Tesarik 0 siblings, 0 replies; 25+ messages in thread From: Petr Tesarik @ 2013-11-11 9:06 UTC (permalink / raw) To: Jingbai Ma Cc: HATAYAMA Daisuke, Atsushi Kumagai, vgoyal, bhe, tom.vaden, kexec, linux-kernel, lisa.mitchell, anderson, ebiederm On Fri, 08 Nov 2013 13:27:05 +0800 Jingbai Ma <jingbai.ma@hp.com> wrote: > On 11/08/2013 01:21 PM, HATAYAMA Daisuke wrote: > > (2013/11/08 14:12), Atsushi Kumagai wrote: > >> Hello Jingbai, > >> > >> (2013/11/07 17:58), Jingbai Ma wrote: > >>> On 11/06/2013 10:23 PM, Vivek Goyal wrote: > >>>> On Wed, Nov 06, 2013 at 02:21:39AM +0000, Atsushi Kumagai wrote: > >>>>> (2013/11/06 5:27), Vivek Goyal wrote: > >>>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: > >>>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. > >>>>>>> > >>>>>>> This patch requires the kernel patch to export necessary data structures into > >>>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" > >>>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html > >>>>>>> > >>>>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and > >>>>>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. > >>>>>> > >>>>>> Interesting. Why hugepages should be treated any differentely than normal > >>>>>> pages? > >>>>>> > >>>>>> If user asked to filter out free page, then it should be filtered and > >>>>>> it should not matter whether it is a huge page or not? > >>>>> > >>>>> I'm making a RFC patch of hugepages filtering based on such policy. > >>>>> > >>>>> I attach the prototype version. > >>>>> It's able to filter out also THPs, and suitable for cyclic processing > >>>>> because it depends on mem_map and looking up it can be divided into > >>>>> cycles. This is the same idea as page_is_buddy(). > >>>>> > >>>>> So I think it's better. > >>>> > >>>> Agreed. Being able to treat hugepages in same manner as other pages > >>>> sounds good. > >>>> > >>>> Jingbai, looks good to you? > >>> > >>> It looks good to me. > >>> > >>> My only concern is by this way, we only can exclude all hugepage together, but can't exclude the free hugepages only. I'm not sure if user need to dump out the activated hugepage only. > >>> > >>> Kumagai-san, please correct me, if I'm wrong. > >> > >> Yes, my patch treats all allocated hugetlbfs pages as user pages, > >> doesn't distinguish whether the pages are actually used or not. > >> I made so because I guess it's enough for almost all users. > >> > >> We can introduce new dump level after it's needed actually, > >> but I don't think now is the time. To introduce it without > >> demand will make this tool just more complex. > >> > > > > Typically, users would allocate huge pages as much as actually they use only, > > in order not to waste system memory. So, this design seems reasonable. > > > > OK, It looks reasonable. Agreed. Whether a page is a huge page or not is an implementation detail (and with THP even more so). Makedumpfile users should only be concerned about the _meaning_ of what gets filtered, not about implementation details. If we expose too much of the implementation, it may become hard to maintain backward compatibility one day... Thank you very much for all the work! Petr T ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-06 2:21 ` Atsushi Kumagai 2013-11-06 14:23 ` Vivek Goyal @ 2013-11-07 0:54 ` HATAYAMA Daisuke 2013-11-22 7:16 ` HATAYAMA Daisuke 1 sibling, 1 reply; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-11-07 0:54 UTC (permalink / raw) To: Atsushi Kumagai Cc: vgoyal, bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm, jingbai.ma (2013/11/06 11:21), Atsushi Kumagai wrote: > (2013/11/06 5:27), Vivek Goyal wrote: >> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>> >>> This patch requires the kernel patch to export necessary data structures into >>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>> >>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >> >> Interesting. Why hugepages should be treated any differentely than normal >> pages? >> >> If user asked to filter out free page, then it should be filtered and >> it should not matter whether it is a huge page or not? > > I'm making a RFC patch of hugepages filtering based on such policy. > > I attach the prototype version. > It's able to filter out also THPs, and suitable for cyclic processing > because it depends on mem_map and looking up it can be divided into > cycles. This is the same idea as page_is_buddy(). > > So I think it's better. > > @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, > && !isAnon(mapping)) { > if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > pfn_cache_private++; > + /* > + * NOTE: If THP for cache is introduced, the check for > + * compound pages is needed here. > + */ > } > /* > * Exclude the data page of the user process. > */ > - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) > - && isAnon(mapping)) { > - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > - pfn_user++; > + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { > + /* > + * Exclude the anonnymous pages as user pages. > + */ > + if (isAnon(mapping)) { > + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > + pfn_user++; > + > + /* > + * Check the compound page > + */ > + if (page_is_hugepage(flags) && compound_order > 0) { > + int i, nr_pages = 1 << compound_order; > + > + for (i = 1; i < nr_pages; ++i) { > + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) > + pfn_user++; > + } > + pfn += nr_pages - 2; > + mem_map += (nr_pages - 1) * SIZE(page); > + } > + } > + /* > + * Exclude the hugetlbfs pages as user pages. > + */ > + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { > + int i, nr_pages = 1 << compound_order; > + > + for (i = 0; i < nr_pages; ++i) { > + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) > + pfn_user++; > + } > + pfn += nr_pages - 1; > + mem_map += (nr_pages - 1) * SIZE(page); > + } > } > /* > * Exclude the hwpoison page. I'm concerned about the case that filtering is not performed to part of mem_map entries not belonging to the current cyclic range. If maximum value of compound_order is larger than maximum value of CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by ARRAY_LENGTH(zone.free_area), it's necessary to align info->bufsize_cyclic with larger one in check_cyclic_buffer_overrun(). -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-07 0:54 ` HATAYAMA Daisuke @ 2013-11-22 7:16 ` HATAYAMA Daisuke 2013-11-28 7:08 ` Atsushi Kumagai 0 siblings, 1 reply; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-11-22 7:16 UTC (permalink / raw) To: Atsushi Kumagai Cc: bhe, tom.vaden, kexec, jingbai.ma, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal (2013/11/07 9:54), HATAYAMA Daisuke wrote: > (2013/11/06 11:21), Atsushi Kumagai wrote: >> (2013/11/06 5:27), Vivek Goyal wrote: >>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>> >>>> This patch requires the kernel patch to export necessary data structures into >>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>> >>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>> >>> Interesting. Why hugepages should be treated any differentely than normal >>> pages? >>> >>> If user asked to filter out free page, then it should be filtered and >>> it should not matter whether it is a huge page or not? >> >> I'm making a RFC patch of hugepages filtering based on such policy. >> >> I attach the prototype version. >> It's able to filter out also THPs, and suitable for cyclic processing >> because it depends on mem_map and looking up it can be divided into >> cycles. This is the same idea as page_is_buddy(). >> >> So I think it's better. >> > >> @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, >> && !isAnon(mapping)) { >> if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >> pfn_cache_private++; >> + /* >> + * NOTE: If THP for cache is introduced, the check for >> + * compound pages is needed here. >> + */ >> } >> /* >> * Exclude the data page of the user process. >> */ >> - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) >> - && isAnon(mapping)) { >> - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >> - pfn_user++; >> + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { >> + /* >> + * Exclude the anonnymous pages as user pages. >> + */ >> + if (isAnon(mapping)) { >> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >> + pfn_user++; >> + >> + /* >> + * Check the compound page >> + */ >> + if (page_is_hugepage(flags) && compound_order > 0) { >> + int i, nr_pages = 1 << compound_order; >> + >> + for (i = 1; i < nr_pages; ++i) { >> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) >> + pfn_user++; >> + } >> + pfn += nr_pages - 2; >> + mem_map += (nr_pages - 1) * SIZE(page); >> + } >> + } >> + /* >> + * Exclude the hugetlbfs pages as user pages. >> + */ >> + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { >> + int i, nr_pages = 1 << compound_order; >> + >> + for (i = 0; i < nr_pages; ++i) { >> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) >> + pfn_user++; >> + } >> + pfn += nr_pages - 1; >> + mem_map += (nr_pages - 1) * SIZE(page); >> + } >> } >> /* >> * Exclude the hwpoison page. > > I'm concerned about the case that filtering is not performed to part of mem_map > entries not belonging to the current cyclic range. > > If maximum value of compound_order is larger than maximum value of > CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by ARRAY_LENGTH(zone.free_area), > it's necessary to align info->bufsize_cyclic with larger one in > check_cyclic_buffer_overrun(). > ping, in case you overlooked this... -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-22 7:16 ` HATAYAMA Daisuke @ 2013-11-28 7:08 ` Atsushi Kumagai 2013-11-28 7:48 ` HATAYAMA Daisuke 0 siblings, 1 reply; 25+ messages in thread From: Atsushi Kumagai @ 2013-11-28 7:08 UTC (permalink / raw) To: HATAYAMA Daisuke Cc: bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, vgoyal, anderson, ebiederm, jingbai.ma On 2013/11/22 16:18:20, kexec <kexec-bounces@lists.infradead.org> wrote: > (2013/11/07 9:54), HATAYAMA Daisuke wrote: > > (2013/11/06 11:21), Atsushi Kumagai wrote: > >> (2013/11/06 5:27), Vivek Goyal wrote: > >>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: > >>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. > >>>> > >>>> This patch requires the kernel patch to export necessary data structures into > >>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" > >>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html > >>>> > >>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and > >>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. > >>> > >>> Interesting. Why hugepages should be treated any differentely than normal > >>> pages? > >>> > >>> If user asked to filter out free page, then it should be filtered and > >>> it should not matter whether it is a huge page or not? > >> > >> I'm making a RFC patch of hugepages filtering based on such policy. > >> > >> I attach the prototype version. > >> It's able to filter out also THPs, and suitable for cyclic processing > >> because it depends on mem_map and looking up it can be divided into > >> cycles. This is the same idea as page_is_buddy(). > >> > >> So I think it's better. > >> > > > >> @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, > >> && !isAnon(mapping)) { > >> if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > >> pfn_cache_private++; > >> + /* > >> + * NOTE: If THP for cache is introduced, the check for > >> + * compound pages is needed here. > >> + */ > >> } > >> /* > >> * Exclude the data page of the user process. > >> */ > >> - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) > >> - && isAnon(mapping)) { > >> - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > >> - pfn_user++; > >> + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { > >> + /* > >> + * Exclude the anonnymous pages as user pages. > >> + */ > >> + if (isAnon(mapping)) { > >> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) > >> + pfn_user++; > >> + > >> + /* > >> + * Check the compound page > >> + */ > >> + if (page_is_hugepage(flags) && compound_order > 0) { > >> + int i, nr_pages = 1 << compound_order; > >> + > >> + for (i = 1; i < nr_pages; ++i) { > >> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) > >> + pfn_user++; > >> + } > >> + pfn += nr_pages - 2; > >> + mem_map += (nr_pages - 1) * SIZE(page); > >> + } > >> + } > >> + /* > >> + * Exclude the hugetlbfs pages as user pages. > >> + */ > >> + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { > >> + int i, nr_pages = 1 << compound_order; > >> + > >> + for (i = 0; i < nr_pages; ++i) { > >> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) > >> + pfn_user++; > >> + } > >> + pfn += nr_pages - 1; > >> + mem_map += (nr_pages - 1) * SIZE(page); > >> + } > >> } > >> /* > >> * Exclude the hwpoison page. > > > > I'm concerned about the case that filtering is not performed to part of mem_map > > entries not belonging to the current cyclic range. > > > > If maximum value of compound_order is larger than maximum value of > > CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by ARRAY_LENGTH(zone.free_area), > > it's necessary to align info->bufsize_cyclic with larger one in > > check_cyclic_buffer_overrun(). > > > > ping, in case you overlooked this... Sorry for the delayed response, I prioritize the release of v1.5.5 now. Thanks for your advice, check_cyclic_buffer_overrun() should be fixed as you said. In addition, I'm considering other way to address such case, that is to bring the number of "overflowed pages" to the next cycle and exclude them at the top of __exclude_unnecessary_pages() like below: /* * The pages which should be excluded still remain. */ if (remainder >= 1) { int i; unsigned long tmp; for (i = 0; i < remainder; ++i) { if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { pfn_user++; tmp++; } } pfn += tmp; remainder -= tmp; mem_map += (tmp - 1) * SIZE(page); continue; } If this way works well, then aligning info->buf_size_cyclic will be unnecessary. Thanks Atsushi Kumagai > -- > Thanks. > HATAYAMA, Daisuke > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-28 7:08 ` Atsushi Kumagai @ 2013-11-28 7:48 ` HATAYAMA Daisuke 0 siblings, 0 replies; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-11-28 7:48 UTC (permalink / raw) To: Atsushi Kumagai Cc: bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, vgoyal, anderson, ebiederm, jingbai.ma (2013/11/28 16:08), Atsushi Kumagai wrote: > On 2013/11/22 16:18:20, kexec <kexec-bounces@lists.infradead.org> wrote: >> (2013/11/07 9:54), HATAYAMA Daisuke wrote: >>> (2013/11/06 11:21), Atsushi Kumagai wrote: >>>> (2013/11/06 5:27), Vivek Goyal wrote: >>>>> On Tue, Nov 05, 2013 at 09:45:32PM +0800, Jingbai Ma wrote: >>>>>> This patch set intend to exclude unnecessary hugepages from vmcore dump file. >>>>>> >>>>>> This patch requires the kernel patch to export necessary data structures into >>>>>> vmcore: "kexec: export hugepage data structure into vmcoreinfo" >>>>>> http://lists.infradead.org/pipermail/kexec/2013-November/009997.html >>>>>> >>>>>> This patch introduce two new dump levels 32 and 64 to exclude all unused and >>>>>> active hugepages. The level to exclude all unnecessary pages will be 127 now. >>>>> >>>>> Interesting. Why hugepages should be treated any differentely than normal >>>>> pages? >>>>> >>>>> If user asked to filter out free page, then it should be filtered and >>>>> it should not matter whether it is a huge page or not? >>>> >>>> I'm making a RFC patch of hugepages filtering based on such policy. >>>> >>>> I attach the prototype version. >>>> It's able to filter out also THPs, and suitable for cyclic processing >>>> because it depends on mem_map and looking up it can be divided into >>>> cycles. This is the same idea as page_is_buddy(). >>>> >>>> So I think it's better. >>>> >>> >>>> @@ -4506,14 +4583,49 @@ __exclude_unnecessary_pages(unsigned long mem_map, >>>> && !isAnon(mapping)) { >>>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >>>> pfn_cache_private++; >>>> + /* >>>> + * NOTE: If THP for cache is introduced, the check for >>>> + * compound pages is needed here. >>>> + */ >>>> } >>>> /* >>>> * Exclude the data page of the user process. >>>> */ >>>> - else if ((info->dump_level & DL_EXCLUDE_USER_DATA) >>>> - && isAnon(mapping)) { >>>> - if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >>>> - pfn_user++; >>>> + else if (info->dump_level & DL_EXCLUDE_USER_DATA) { >>>> + /* >>>> + * Exclude the anonnymous pages as user pages. >>>> + */ >>>> + if (isAnon(mapping)) { >>>> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn)) >>>> + pfn_user++; >>>> + >>>> + /* >>>> + * Check the compound page >>>> + */ >>>> + if (page_is_hugepage(flags) && compound_order > 0) { >>>> + int i, nr_pages = 1 << compound_order; >>>> + >>>> + for (i = 1; i < nr_pages; ++i) { >>>> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) >>>> + pfn_user++; >>>> + } >>>> + pfn += nr_pages - 2; >>>> + mem_map += (nr_pages - 1) * SIZE(page); >>>> + } >>>> + } >>>> + /* >>>> + * Exclude the hugetlbfs pages as user pages. >>>> + */ >>>> + else if (hugetlb_dtor == SYMBOL(free_huge_page)) { >>>> + int i, nr_pages = 1 << compound_order; >>>> + >>>> + for (i = 0; i < nr_pages; ++i) { >>>> + if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) >>>> + pfn_user++; >>>> + } >>>> + pfn += nr_pages - 1; >>>> + mem_map += (nr_pages - 1) * SIZE(page); >>>> + } >>>> } >>>> /* >>>> * Exclude the hwpoison page. >>> >>> I'm concerned about the case that filtering is not performed to part of mem_map >>> entries not belonging to the current cyclic range. >>> >>> If maximum value of compound_order is larger than maximum value of >>> CONFIG_FORCE_MAX_ZONEORDER, which makedumpfile obtains by ARRAY_LENGTH(zone.free_area), >>> it's necessary to align info->bufsize_cyclic with larger one in >>> check_cyclic_buffer_overrun(). >>> >> >> ping, in case you overlooked this... > > Sorry for the delayed response, I prioritize the release of v1.5.5 now. > > Thanks for your advice, check_cyclic_buffer_overrun() should be fixed > as you said. In addition, I'm considering other way to address such case, > that is to bring the number of "overflowed pages" to the next cycle and > exclude them at the top of __exclude_unnecessary_pages() like below: > > /* > * The pages which should be excluded still remain. > */ > if (remainder >= 1) { > int i; > unsigned long tmp; > for (i = 0; i < remainder; ++i) { > if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { > pfn_user++; > tmp++; > } > } > pfn += tmp; > remainder -= tmp; > mem_map += (tmp - 1) * SIZE(page); > continue; > } > > If this way works well, then aligning info->buf_size_cyclic will be > unnecessary. > I selected the current implementation of changing cyclic buffer size becuase I thought it was simpler than carrying over remaining filtered pages to next cycle in that there was no need to add extra code in filtering processing. I guess the reason why you think this is better now is how to detect maximum order of huge page is hard in some way, right? -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump @ 2013-11-29 3:02 Atsushi Kumagai 2013-11-29 3:21 ` HATAYAMA Daisuke 0 siblings, 1 reply; 25+ messages in thread From: Atsushi Kumagai @ 2013-11-29 3:02 UTC (permalink / raw) To: HATAYAMA Daisuke Cc: bhe, tom.vaden, kexec, jingbai.ma, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal On 2013/11/28 16:50:21, kexec <kexec-bounces@lists.infradead.org> wrote: > >> ping, in case you overlooked this... > > > > Sorry for the delayed response, I prioritize the release of v1.5.5 now. > > > > Thanks for your advice, check_cyclic_buffer_overrun() should be fixed > > as you said. In addition, I'm considering other way to address such case, > > that is to bring the number of "overflowed pages" to the next cycle and > > exclude them at the top of __exclude_unnecessary_pages() like below: > > > > /* > > * The pages which should be excluded still remain. > > */ > > if (remainder >= 1) { > > int i; > > unsigned long tmp; > > for (i = 0; i < remainder; ++i) { > > if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { > > pfn_user++; > > tmp++; > > } > > } > > pfn += tmp; > > remainder -= tmp; > > mem_map += (tmp - 1) * SIZE(page); > > continue; > > } > > > > If this way works well, then aligning info->buf_size_cyclic will be > > unnecessary. > > > > I selected the current implementation of changing cyclic buffer size becuase > I thought it was simpler than carrying over remaining filtered pages to next cycle > in that there was no need to add extra code in filtering processing. > > I guess the reason why you think this is better now is how to detect maximum order of > huge page is hard in some way, right? The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER, so I don't say it's hard. However, the carrying over method doesn't depend on such kernel symbols, so I think it's robuster. Thanks Atsushi Kumagai > -- > Thanks. > HATAYAMA, Daisuke > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-29 3:02 Atsushi Kumagai @ 2013-11-29 3:21 ` HATAYAMA Daisuke 2013-11-29 4:23 ` Atsushi Kumagai 0 siblings, 1 reply; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-11-29 3:21 UTC (permalink / raw) To: Atsushi Kumagai Cc: bhe, tom.vaden, kexec, jingbai.ma, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal (2013/11/29 12:02), Atsushi Kumagai wrote: > On 2013/11/28 16:50:21, kexec <kexec-bounces@lists.infradead.org> wrote: >>>> ping, in case you overlooked this... >>> >>> Sorry for the delayed response, I prioritize the release of v1.5.5 now. >>> >>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed >>> as you said. In addition, I'm considering other way to address such case, >>> that is to bring the number of "overflowed pages" to the next cycle and >>> exclude them at the top of __exclude_unnecessary_pages() like below: >>> >>> /* >>> * The pages which should be excluded still remain. >>> */ >>> if (remainder >= 1) { >>> int i; >>> unsigned long tmp; >>> for (i = 0; i < remainder; ++i) { >>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { >>> pfn_user++; >>> tmp++; >>> } >>> } >>> pfn += tmp; >>> remainder -= tmp; >>> mem_map += (tmp - 1) * SIZE(page); >>> continue; >>> } >>> >>> If this way works well, then aligning info->buf_size_cyclic will be >>> unnecessary. >>> >> >> I selected the current implementation of changing cyclic buffer size becuase >> I thought it was simpler than carrying over remaining filtered pages to next cycle >> in that there was no need to add extra code in filtering processing. >> >> I guess the reason why you think this is better now is how to detect maximum order of >> huge page is hard in some way, right? > > The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER, > so I don't say it's hard. However, the carrying over method doesn't depend on > such kernel symbols, so I think it's robuster. > Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of free page filtering in __exclude_unnecessary_pages(). Could you do that too? -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-29 3:21 ` HATAYAMA Daisuke @ 2013-11-29 4:23 ` Atsushi Kumagai 2013-11-29 4:56 ` HATAYAMA Daisuke 0 siblings, 1 reply; 25+ messages in thread From: Atsushi Kumagai @ 2013-11-29 4:23 UTC (permalink / raw) To: HATAYAMA Daisuke Cc: bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, vgoyal, anderson, ebiederm, jingbai.ma On 2013/11/29 12:24:45, kexec <kexec-bounces@lists.infradead.org> wrote: > (2013/11/29 12:02), Atsushi Kumagai wrote: > > On 2013/11/28 16:50:21, kexec <kexec-bounces@lists.infradead.org> wrote: > >>>> ping, in case you overlooked this... > >>> > >>> Sorry for the delayed response, I prioritize the release of v1.5.5 now. > >>> > >>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed > >>> as you said. In addition, I'm considering other way to address such case, > >>> that is to bring the number of "overflowed pages" to the next cycle and > >>> exclude them at the top of __exclude_unnecessary_pages() like below: > >>> > >>> /* > >>> * The pages which should be excluded still remain. > >>> */ > >>> if (remainder >= 1) { > >>> int i; > >>> unsigned long tmp; > >>> for (i = 0; i < remainder; ++i) { > >>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { > >>> pfn_user++; > >>> tmp++; > >>> } > >>> } > >>> pfn += tmp; > >>> remainder -= tmp; > >>> mem_map += (tmp - 1) * SIZE(page); > >>> continue; > >>> } > >>> > >>> If this way works well, then aligning info->buf_size_cyclic will be > >>> unnecessary. > >>> > >> > >> I selected the current implementation of changing cyclic buffer size becuase > >> I thought it was simpler than carrying over remaining filtered pages to next cycle > >> in that there was no need to add extra code in filtering processing. > >> > >> I guess the reason why you think this is better now is how to detect maximum order of > >> huge page is hard in some way, right? > > > > The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER, > > so I don't say it's hard. However, the carrying over method doesn't depend on > > such kernel symbols, so I think it's robuster. > > > > Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of free page > filtering in __exclude_unnecessary_pages(). Could you do that too? Sure, I'll modify it too. Thanks Atsushi Kumagai ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-11-29 4:23 ` Atsushi Kumagai @ 2013-11-29 4:56 ` HATAYAMA Daisuke 0 siblings, 0 replies; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-11-29 4:56 UTC (permalink / raw) To: Atsushi Kumagai Cc: bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, vgoyal, anderson, ebiederm, jingbai.ma (2013/11/29 13:23), Atsushi Kumagai wrote: > On 2013/11/29 12:24:45, kexec <kexec-bounces@lists.infradead.org> wrote: >> (2013/11/29 12:02), Atsushi Kumagai wrote: >>> On 2013/11/28 16:50:21, kexec <kexec-bounces@lists.infradead.org> wrote: >>>>>> ping, in case you overlooked this... >>>>> >>>>> Sorry for the delayed response, I prioritize the release of v1.5.5 now. >>>>> >>>>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed >>>>> as you said. In addition, I'm considering other way to address such case, >>>>> that is to bring the number of "overflowed pages" to the next cycle and >>>>> exclude them at the top of __exclude_unnecessary_pages() like below: >>>>> >>>>> /* >>>>> * The pages which should be excluded still remain. >>>>> */ >>>>> if (remainder >= 1) { >>>>> int i; >>>>> unsigned long tmp; >>>>> for (i = 0; i < remainder; ++i) { >>>>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { >>>>> pfn_user++; >>>>> tmp++; >>>>> } >>>>> } >>>>> pfn += tmp; >>>>> remainder -= tmp; >>>>> mem_map += (tmp - 1) * SIZE(page); >>>>> continue; >>>>> } >>>>> >>>>> If this way works well, then aligning info->buf_size_cyclic will be >>>>> unnecessary. >>>>> >>>> >>>> I selected the current implementation of changing cyclic buffer size becuase >>>> I thought it was simpler than carrying over remaining filtered pages to next cycle >>>> in that there was no need to add extra code in filtering processing. >>>> >>>> I guess the reason why you think this is better now is how to detect maximum order of >>>> huge page is hard in some way, right? >>> >>> The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER, >>> so I don't say it's hard. However, the carrying over method doesn't depend on >>> such kernel symbols, so I think it's robuster. >>> >> >> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of free page >> filtering in __exclude_unnecessary_pages(). Could you do that too? > > Sure, I'll modify it too. > This is a suggestion from different point of view... In general, data on crash dump can be corrupted. Thus, order contained in a page descriptor can also be corrupted. For example, if the corrupted value were a huge number, wide range of pages after buddy page would be filtered falsely. So, actually we should sanity check data in crash dump before using them for application level feature. I've picked up order contained in page descriptor, so there would be other data used in makedumpfile that are not checked. Unlike diskdump, we no longer need to care about kernel/hardware level data integrity outside of user-land, but we still care about data its own integrity. On the other hand, if we do it, we might face some difficulty, for example, hardness of maintenance or performance bottleneck; it might be the reason why we don't see sanity check in makedumpfile now. -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump @ 2013-12-03 8:05 Atsushi Kumagai 2013-12-03 9:05 ` HATAYAMA Daisuke 0 siblings, 1 reply; 25+ messages in thread From: Atsushi Kumagai @ 2013-12-03 8:05 UTC (permalink / raw) To: HATAYAMA Daisuke Cc: bhe, tom.vaden, kexec, jingbai.ma, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal On 2013/11/29 13:57:21, kexec <kexec-bounces@lists.infradead.org> wrote: > (2013/11/29 13:23), Atsushi Kumagai wrote: > > On 2013/11/29 12:24:45, kexec <kexec-bounces@lists.infradead.org> wrote: > >> (2013/11/29 12:02), Atsushi Kumagai wrote: > >>> On 2013/11/28 16:50:21, kexec <kexec-bounces@lists.infradead.org> wrote: > >>>>>> ping, in case you overlooked this... > >>>>> > >>>>> Sorry for the delayed response, I prioritize the release of v1.5.5 now. > >>>>> > >>>>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed > >>>>> as you said. In addition, I'm considering other way to address such case, > >>>>> that is to bring the number of "overflowed pages" to the next cycle and > >>>>> exclude them at the top of __exclude_unnecessary_pages() like below: > >>>>> > >>>>> /* > >>>>> * The pages which should be excluded still remain. > >>>>> */ > >>>>> if (remainder >= 1) { > >>>>> int i; > >>>>> unsigned long tmp; > >>>>> for (i = 0; i < remainder; ++i) { > >>>>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { > >>>>> pfn_user++; > >>>>> tmp++; > >>>>> } > >>>>> } > >>>>> pfn += tmp; > >>>>> remainder -= tmp; > >>>>> mem_map += (tmp - 1) * SIZE(page); > >>>>> continue; > >>>>> } > >>>>> > >>>>> If this way works well, then aligning info->buf_size_cyclic will be > >>>>> unnecessary. > >>>>> > >>>> > >>>> I selected the current implementation of changing cyclic buffer size becuase > >>>> I thought it was simpler than carrying over remaining filtered pages to next cycle > >>>> in that there was no need to add extra code in filtering processing. > >>>> > >>>> I guess the reason why you think this is better now is how to detect maximum order of > >>>> huge page is hard in some way, right? > >>> > >>> The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER, > >>> so I don't say it's hard. However, the carrying over method doesn't depend on > >>> such kernel symbols, so I think it's robuster. > >>> > >> > >> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of free page > >> filtering in __exclude_unnecessary_pages(). Could you do that too? > > > > Sure, I'll modify it too. > > > > This is a suggestion from different point of view... > > In general, data on crash dump can be corrupted. Thus, order contained in a page > descriptor can also be corrupted. For example, if the corrupted value were a huge > number, wide range of pages after buddy page would be filtered falsely. > > So, actually we should sanity check data in crash dump before using them for application > level feature. I've picked up order contained in page descriptor, so there would be other > data used in makedumpfile that are not checked. What you said is reasonable, but how will you do such sanity check ? Certain standard values are necessary for sanity check, how will you prepare such values ? (Get them from kernel source and hard-code them in makedumpfile ?) > Unlike diskdump, we no longer need to care about kernel/hardware level data integrity > outside of user-land, but we still care about data its own integrity. > > On the other hand, if we do it, we might face some difficulty, for example, hardness of > maintenance or performance bottleneck; it might be the reason why we don't see sanity > check in makedumpfile now. There are many values which should be checked, e.g. page.flags, page._count, page.mapping, list_head.next and so on. If we introduce sanity check for them, the issues you mentioned will be appear distinctly. So I think makedumpfile has to trust crash dump in practice. Thanks Atsushi Kumagai > -- > Thanks. > HATAYAMA, Daisuke > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-12-03 8:05 Atsushi Kumagai @ 2013-12-03 9:05 ` HATAYAMA Daisuke 2013-12-04 6:08 ` Atsushi Kumagai 0 siblings, 1 reply; 25+ messages in thread From: HATAYAMA Daisuke @ 2013-12-03 9:05 UTC (permalink / raw) To: Atsushi Kumagai Cc: bhe, tom.vaden, kexec, jingbai.ma, ptesarik, linux-kernel, lisa.mitchell, anderson, ebiederm, vgoyal (2013/12/03 17:05), Atsushi Kumagai wrote: > On 2013/11/29 13:57:21, kexec <kexec-bounces@lists.infradead.org> wrote: >> (2013/11/29 13:23), Atsushi Kumagai wrote: >>> On 2013/11/29 12:24:45, kexec <kexec-bounces@lists.infradead.org> wrote: >>>> (2013/11/29 12:02), Atsushi Kumagai wrote: >>>>> On 2013/11/28 16:50:21, kexec <kexec-bounces@lists.infradead.org> wrote: >>>>>>>> ping, in case you overlooked this... >>>>>>> >>>>>>> Sorry for the delayed response, I prioritize the release of v1.5.5 now. >>>>>>> >>>>>>> Thanks for your advice, check_cyclic_buffer_overrun() should be fixed >>>>>>> as you said. In addition, I'm considering other way to address such case, >>>>>>> that is to bring the number of "overflowed pages" to the next cycle and >>>>>>> exclude them at the top of __exclude_unnecessary_pages() like below: >>>>>>> >>>>>>> /* >>>>>>> * The pages which should be excluded still remain. >>>>>>> */ >>>>>>> if (remainder >= 1) { >>>>>>> int i; >>>>>>> unsigned long tmp; >>>>>>> for (i = 0; i < remainder; ++i) { >>>>>>> if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i)) { >>>>>>> pfn_user++; >>>>>>> tmp++; >>>>>>> } >>>>>>> } >>>>>>> pfn += tmp; >>>>>>> remainder -= tmp; >>>>>>> mem_map += (tmp - 1) * SIZE(page); >>>>>>> continue; >>>>>>> } >>>>>>> >>>>>>> If this way works well, then aligning info->buf_size_cyclic will be >>>>>>> unnecessary. >>>>>>> >>>>>> >>>>>> I selected the current implementation of changing cyclic buffer size becuase >>>>>> I thought it was simpler than carrying over remaining filtered pages to next cycle >>>>>> in that there was no need to add extra code in filtering processing. >>>>>> >>>>>> I guess the reason why you think this is better now is how to detect maximum order of >>>>>> huge page is hard in some way, right? >>>>> >>>>> The maximum order will be gotten from HUGETLB_PAGE_ORDER or HPAGE_PMD_ORDER, >>>>> so I don't say it's hard. However, the carrying over method doesn't depend on >>>>> such kernel symbols, so I think it's robuster. >>>>> >>>> >>>> Then, it's better to remove check_cyclic_buffer_overrun() and rewrite part of free page >>>> filtering in __exclude_unnecessary_pages(). Could you do that too? >>> >>> Sure, I'll modify it too. >>> >> >> This is a suggestion from different point of view... >> >> In general, data on crash dump can be corrupted. Thus, order contained in a page >> descriptor can also be corrupted. For example, if the corrupted value were a huge >> number, wide range of pages after buddy page would be filtered falsely. >> >> So, actually we should sanity check data in crash dump before using them for application >> level feature. I've picked up order contained in page descriptor, so there would be other >> data used in makedumpfile that are not checked. > > What you said is reasonable, but how will you do such sanity check ? > Certain standard values are necessary for sanity check, how will > you prepare such values ? > (Get them from kernel source and hard-code them in makedumpfile ?) > >> Unlike diskdump, we no longer need to care about kernel/hardware level data integrity >> outside of user-land, but we still care about data its own integrity. >> >> On the other hand, if we do it, we might face some difficulty, for example, hardness of >> maintenance or performance bottleneck; it might be the reason why we don't see sanity >> check in makedumpfile now. > > There are many values which should be checked, e.g. page.flags, page._count, > page.mapping, list_head.next and so on. > If we introduce sanity check for them, the issues you mentioned will be appear > distinctly. > > So I think makedumpfile has to trust crash dump in practice. > Yes, I don't mean such very drastic checking; I understand hardness because I often handle/write this kind of code; I don't want to fight tremendously many dependencies... So we need to concentrate on things that can affect makedumpfile's behavior significantly, e.g. infinite loop caused by broken linked list objects, buffer overrun cauesd by large values from broken data, etc. We should be able to deal with them by carefully handling dump data against makedumpfile's runtime data structure, e.g., buffer size. Is it OK to consider this is a policy of makedumpfile for data corruption? -- Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump 2013-12-03 9:05 ` HATAYAMA Daisuke @ 2013-12-04 6:08 ` Atsushi Kumagai 0 siblings, 0 replies; 25+ messages in thread From: Atsushi Kumagai @ 2013-12-04 6:08 UTC (permalink / raw) To: HATAYAMA Daisuke Cc: bhe, tom.vaden, kexec, ptesarik, linux-kernel, lisa.mitchell, vgoyal, anderson, ebiederm, jingbai.ma On 2013/12/03 18:06:13, kexec <kexec-bounces@lists.infradead.org> wrote: > >> This is a suggestion from different point of view... > >> > >> In general, data on crash dump can be corrupted. Thus, order contained in a page > >> descriptor can also be corrupted. For example, if the corrupted value were a huge > >> number, wide range of pages after buddy page would be filtered falsely. > >> > >> So, actually we should sanity check data in crash dump before using them for application > >> level feature. I've picked up order contained in page descriptor, so there would be other > >> data used in makedumpfile that are not checked. > > > > What you said is reasonable, but how will you do such sanity check ? > > Certain standard values are necessary for sanity check, how will > > you prepare such values ? > > (Get them from kernel source and hard-code them in makedumpfile ?) > > > >> Unlike diskdump, we no longer need to care about kernel/hardware level data integrity > >> outside of user-land, but we still care about data its own integrity. > >> > >> On the other hand, if we do it, we might face some difficulty, for example, hardness of > >> maintenance or performance bottleneck; it might be the reason why we don't see sanity > >> check in makedumpfile now. > > > > There are many values which should be checked, e.g. page.flags, page._count, > > page.mapping, list_head.next and so on. > > If we introduce sanity check for them, the issues you mentioned will be appear > > distinctly. > > > > So I think makedumpfile has to trust crash dump in practice. > > > > Yes, I don't mean such very drastic checking; I understand hardness because I often > handle/write this kind of code; I don't want to fight tremendously many dependencies... > > So we need to concentrate on things that can affect makedumpfile's behavior significantly, > e.g. infinite loop caused by broken linked list objects, buffer overrun cauesd by large values > from broken data, etc. We should be able to deal with them by carefully handling > dump data against makedumpfile's runtime data structure, e.g., buffer size. > > Is it OK to consider this is a policy of makedumpfile for data corruption? Right. Of course, if there is a very simple and effective check for a dump data, then we can take it. Thanks Atsushi Kumagai > -- > Thanks. > HATAYAMA, Daisuke > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2013-12-04 6:12 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-11-05 13:45 [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Jingbai Ma 2013-11-05 13:45 ` [PATCH 1/3] makedumpfile: hugepage filtering: add hugepage filtering functions Jingbai Ma 2013-11-05 13:45 ` [PATCH 2/3] makedumpfile: hugepage filtering: add excluding hugepage messages Jingbai Ma 2013-11-05 13:46 ` [PATCH 3/3] makedumpfile: hugepage filtering: add new dump levels for manual page Jingbai Ma 2013-11-05 20:26 ` [PATCH 0/3] makedumpfile: hugepage filtering for vmcore dump Vivek Goyal 2013-11-06 1:47 ` Jingbai Ma 2013-11-06 1:53 ` Vivek Goyal 2013-11-06 2:21 ` Atsushi Kumagai 2013-11-06 14:23 ` Vivek Goyal 2013-11-07 8:57 ` Jingbai Ma 2013-11-08 5:12 ` Atsushi Kumagai 2013-11-08 5:21 ` HATAYAMA Daisuke 2013-11-08 5:27 ` Jingbai Ma 2013-11-11 9:06 ` Petr Tesarik 2013-11-07 0:54 ` HATAYAMA Daisuke 2013-11-22 7:16 ` HATAYAMA Daisuke 2013-11-28 7:08 ` Atsushi Kumagai 2013-11-28 7:48 ` HATAYAMA Daisuke 2013-11-29 3:02 Atsushi Kumagai 2013-11-29 3:21 ` HATAYAMA Daisuke 2013-11-29 4:23 ` Atsushi Kumagai 2013-11-29 4:56 ` HATAYAMA Daisuke 2013-12-03 8:05 Atsushi Kumagai 2013-12-03 9:05 ` HATAYAMA Daisuke 2013-12-04 6:08 ` Atsushi Kumagai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).