All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Makedumpfile: vmcore size estimate
@ 2014-06-11 12:39 Baoquan He
  2014-06-11 13:57 ` Baoquan He
  2014-06-23 12:36 ` Vivek Goyal
  0 siblings, 2 replies; 16+ messages in thread
From: Baoquan He @ 2014-06-11 12:39 UTC (permalink / raw)
  To: kexec; +Cc: kumagai-atsushi, Baoquan He, vgoyal

User want to get a rough estimate of vmcore size, then they can decide
how much storage space is reserved for vmcore dumping. This can help them
to deploy their machines better, possibly hundreds of machines.

In this draft patch, a new configuration option is added,
    "--vmcore-estimate"
User can execute below command to get a dumped kcore. Since kcore is a
elf file to map the whole memory of current kernel, it's  equal to the
memory of crash kernel though it's not exact. Content of kcore is dynamic
though /proc/vmcore is fixed once crash happened. But for vmcore size
estimate, it is better enough.

sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump

Questions:
1. Or we can get the dumpable page numbers only, then calculate the estimated
vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
lzo dump, we assume the compression ratio is 45%, then the estimate size is
equal to: (dumpable page numbers) * 4096* 45%.

This is easier but too rough, does anybody like this better compared with the
real dumping implemented in this draft patch.

2. If dump the /proc/kcore, there's still a bug I can't fixed. When elf dump,
in function write_elf_header()  it will pre-calculate a num_loads_dumpfile which
is the number of program segment which will be dumped. However during dumping,
the content of /proc/kcore is dynamic, the final num_loads_dumpfile may change
when call write_elf_pages_cyclic/write_elf_pages(). This will cause the final
dumped elf file has a bad file format. When you execute
"readelf -a /var/crash/kcore-dump", you will be a little surprised.

3. This is not a formal patch, if the final solution is decided, I will post a
patch, maybe a patchset. If you have suggestions about the code or implementation,
please post your comment.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 elf_info.c     | 136 ++++++++++++++++--
 elf_info.h     |  17 +++
 makedumpfile.c | 438 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
 makedumpfile.h |   5 +
 4 files changed, 560 insertions(+), 36 deletions(-)

diff --git a/elf_info.c b/elf_info.c
index b277f69..1b05ad1 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -36,16 +36,9 @@
 
 #define XEN_ELFNOTE_CRASH_INFO	(0x1000001)
 
-struct pt_load_segment {
-	off_t			file_offset;
-	unsigned long long	phys_start;
-	unsigned long long	phys_end;
-	unsigned long long	virt_start;
-	unsigned long long	virt_end;
-};
 
 static int			nr_cpus;             /* number of cpu */
-static off_t			max_file_offset;
+off_t			max_file_offset;
 
 /*
  * File information about /proc/vmcore:
@@ -60,9 +53,9 @@ static int			flags_memory;
 /*
  * PT_LOAD information about /proc/vmcore:
  */
-static unsigned int		num_pt_loads;
-static struct pt_load_segment	*pt_loads;
-static off_t			offset_pt_load_memory;
+unsigned int		num_pt_loads;
+struct pt_load_segment	*pt_loads;
+off_t			offset_pt_load_memory;
 
 /*
  * PT_NOTE information about /proc/vmcore:
@@ -395,7 +388,49 @@ get_pt_note_info(void)
 	return TRUE;
 }
 
+#define UNINITIALIZED  ((ulong)(-1))
 
+#define SEEK_ERROR       (-1)
+#define READ_ERROR       (-2)
+int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
+{
+	int i;
+	ulong kvaddr;
+	Elf64_Nhdr *note64;
+	off_t offset;
+	char note[MAX_SIZE_NHDR];
+	int size_desc;
+	off_t offset_desc;
+
+	offset = UNINITIALIZED;
+	kvaddr = (ulong)vmcoreinfo_addr | PAGE_OFFSET;
+
+	for (i = 0; i < num_pt_loads; ++i) {
+		struct pt_load_segment *p = &pt_loads[i];
+		if ((kvaddr >= p->virt_start) && (kvaddr < p->virt_end)) {
+			offset = (off_t)(kvaddr - p->virt_start) +
+			(off_t)p->file_offset;
+			break;
+		}
+	}
+
+	if (offset == UNINITIALIZED)
+		return SEEK_ERROR;
+
+        if (lseek(fd_memory, offset, SEEK_SET) != offset)
+		perror("lseek");
+
+	if (read(fd_memory, note, MAX_SIZE_NHDR) != MAX_SIZE_NHDR)
+		return READ_ERROR;
+
+	note64 = (Elf64_Nhdr *)note;
+	size_desc   = note_descsz(note);
+	offset_desc = offset + offset_note_desc(note);
+
+	set_vmcoreinfo(offset_desc, size_desc);
+
+	return 0;
+}
 /*
  * External functions.
  */
@@ -681,6 +716,55 @@ get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr)
 	return TRUE;
 }
 
+int
+get_elf_loads(int fd, char *filename)
+{
+	int i, j, phnum, elf_format;
+	Elf64_Phdr phdr;
+
+	/*
+	 * Check ELF64 or ELF32.
+	 */
+	elf_format = check_elf_format(fd, filename, &phnum, &num_pt_loads);
+	if (elf_format == ELF64)
+		flags_memory |= MEMORY_ELF64;
+	else if (elf_format != ELF32)
+		return FALSE;
+
+	if (!num_pt_loads) {
+		ERRMSG("Can't get the number of PT_LOAD.\n");
+		return FALSE;
+	}
+
+	/*
+	 * The below file information will be used as /proc/vmcore.
+	 */
+	fd_memory   = fd;
+	name_memory = filename;
+
+	pt_loads = calloc(sizeof(struct pt_load_segment), num_pt_loads);
+	if (pt_loads == NULL) {
+		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
+		    strerror(errno));
+		return FALSE;
+	}
+	for (i = 0, j = 0; i < phnum; i++) {
+		if (!get_phdr_memory(i, &phdr))
+			return FALSE;
+
+		if (phdr.p_type != PT_LOAD)
+			continue;
+
+		if (j >= num_pt_loads)
+			return FALSE;
+		if(!dump_Elf_load(&phdr, j))
+			return FALSE;
+		j++;
+	}
+
+	return TRUE;
+}
+
 /*
  * Get ELF information about /proc/vmcore.
  */
@@ -826,6 +910,36 @@ get_phdr_memory(int index, Elf64_Phdr *phdr)
 	return TRUE;
 }
 
+int
+get_phdr_load(int index, Elf64_Phdr *phdr)
+{
+	Elf32_Phdr phdr32;
+
+	if (is_elf64_memory()) { /* ELF64 */
+		phdr->p_type = PT_LOAD;
+		phdr->p_vaddr = pt_loads[index].virt_start;
+		phdr->p_paddr = pt_loads[index].phys_start;
+		phdr->p_memsz  = pt_loads[index].phys_end - pt_loads[index].phys_start;
+		phdr->p_filesz = phdr->p_memsz;
+		phdr->p_offset = pt_loads[index].file_offset;
+	} else {
+		if (!get_elf32_phdr(fd_memory, name_memory, index, &phdr32)) {
+			ERRMSG("Can't find Phdr %d.\n", index);
+			return FALSE;
+		}
+		memset(phdr, 0, sizeof(Elf64_Phdr));
+		phdr->p_type   = phdr32.p_type;
+		phdr->p_flags  = phdr32.p_flags;
+		phdr->p_offset = phdr32.p_offset;
+		phdr->p_vaddr  = phdr32.p_vaddr;
+		phdr->p_paddr  = phdr32.p_paddr;
+		phdr->p_filesz = phdr32.p_filesz;
+		phdr->p_memsz  = phdr32.p_memsz;
+		phdr->p_align  = phdr32.p_align;
+	}
+	return TRUE;
+}
+
 off_t
 get_offset_pt_load_memory(void)
 {
diff --git a/elf_info.h b/elf_info.h
index 801faff..0c67d74 100644
--- a/elf_info.h
+++ b/elf_info.h
@@ -27,6 +27,19 @@
 
 #define MAX_SIZE_NHDR	MAX(sizeof(Elf64_Nhdr), sizeof(Elf32_Nhdr))
 
+struct pt_load_segment {
+	off_t			file_offset;
+	unsigned long long	phys_start;
+	unsigned long long	phys_end;
+	unsigned long long	virt_start;
+	unsigned long long	virt_end;
+};
+
+extern off_t			max_file_offset;
+extern unsigned int		num_pt_loads;
+extern struct pt_load_segment	*pt_loads;
+
+extern off_t			offset_pt_load_memory;
 
 off_t paddr_to_offset(unsigned long long paddr);
 off_t paddr_to_offset2(unsigned long long paddr, off_t hint);
@@ -44,11 +57,14 @@ int get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr);
 int get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr);
 int get_elf_info(int fd, char *filename);
 void free_elf_info(void);
+int get_elf_loads(int fd, char *filename);
 
 int is_elf64_memory(void);
 int is_xen_memory(void);
 
 int get_phnum_memory(void);
+
+int get_phdr_load(int index, Elf64_Phdr *phdr);
 int get_phdr_memory(int index, Elf64_Phdr *phdr);
 off_t get_offset_pt_load_memory(void);
 int get_pt_load(int idx,
@@ -68,6 +84,7 @@ void get_pt_note(off_t *offset, unsigned long *size);
 int has_vmcoreinfo(void);
 void set_vmcoreinfo(off_t offset, unsigned long size);
 void get_vmcoreinfo(off_t *offset, unsigned long *size);
+int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len);
 
 int has_vmcoreinfo_xen(void);
 void get_vmcoreinfo_xen(off_t *offset, unsigned long *size);
diff --git a/makedumpfile.c b/makedumpfile.c
index 34db997..ac02747 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -5146,6 +5146,7 @@ create_dump_bitmap(void)
 
 	if (info->flag_cyclic) {
 
+		printf("create_dump_bitmap flag_cyclic\n");
 		if (info->flag_elf_dumpfile) {
 			if (!prepare_bitmap_buffer_cyclic())
 				goto out;
@@ -5189,14 +5190,23 @@ get_loads_dumpfile(void)
 
 	initialize_2nd_bitmap(&bitmap2);
 
-	if (!(phnum = get_phnum_memory()))
-		return FALSE;
-
-	for (i = 0; i < phnum; i++) {
-		if (!get_phdr_memory(i, &load))
+	if (info->flag_vmcore_estimate) {
+		phnum = num_pt_loads;
+	} else {
+		if (!(phnum = get_phnum_memory()))
 			return FALSE;
-		if (load.p_type != PT_LOAD)
-			continue;
+	}
+
+	for (i = 0; i < num_pt_loads; i++) {
+		if (info->flag_vmcore_estimate) {
+			get_phdr_load(i , &load);
+		} else {
+			if (!get_phdr_memory(i, &load))
+				return FALSE;
+
+			if (load.p_type != PT_LOAD)
+				continue;
+		}
 
 		pfn_start = paddr_to_pfn(load.p_paddr);
 		pfn_end   = paddr_to_pfn(load.p_paddr + load.p_memsz);
@@ -5734,17 +5744,26 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
 	off_seg_load    = info->offset_load_dumpfile;
 	cd_page->offset = info->offset_load_dumpfile;
 
-	if (!(phnum = get_phnum_memory()))
-		return FALSE;
+	if (info->flag_vmcore_estimate) {
+		phnum = num_pt_loads;
+	} else { 
+		if (!(phnum = get_phnum_memory()))
+			return FALSE;
+	}
 
 	gettimeofday(&tv_start, NULL);
 
 	for (i = 0; i < phnum; i++) {
-		if (!get_phdr_memory(i, &load))
-			return FALSE;
+		if (info->flag_vmcore_estimate) {
+			memset(&load, 0, sizeof(load));
+			get_phdr_load(i , &load);
+		} else {
+			if (!get_phdr_memory(i, &load))
+				return FALSE;
 
-		if (load.p_type != PT_LOAD)
-			continue;
+			if (load.p_type != PT_LOAD)
+				continue;
+		}
 
 		off_memory= load.p_offset;
 		paddr     = load.p_paddr;
@@ -5923,14 +5942,24 @@ get_loads_dumpfile_cyclic(void)
 	Elf64_Phdr load;
 	struct cycle cycle = {0};
 
-	if (!(phnum = get_phnum_memory()))
-		return FALSE;
+	if (info->flag_vmcore_estimate) {
+		phnum = num_pt_loads;
+	} else {
+		if (!(phnum = get_phnum_memory()))
+			return FALSE;
+	}
 
 	for (i = 0; i < phnum; i++) {
-		if (!get_phdr_memory(i, &load))
-			return FALSE;
-		if (load.p_type != PT_LOAD)
-			continue;
+		if (info->flag_vmcore_estimate) {
+			memset(&load, 0, sizeof(load) );
+			get_phdr_load(i , &load);
+		} else {
+			if (!get_phdr_memory(i, &load))
+				return FALSE;
+
+			if (load.p_type != PT_LOAD)
+				continue;
+		}
 
 		pfn_start = paddr_to_pfn(load.p_paddr);
 		pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
@@ -6016,17 +6045,26 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	pfn_user = pfn_free = pfn_hwpoison = 0;
 	pfn_memhole = info->max_mapnr;
 
-	if (!(phnum = get_phnum_memory()))
-		return FALSE;
+	if (info->flag_vmcore_estimate) {
+		phnum = num_pt_loads;
+	} else { 
+		if (!(phnum = get_phnum_memory()))
+			return FALSE;
+	}
 
 	gettimeofday(&tv_start, NULL);
 
 	for (i = 0; i < phnum; i++) {
-		if (!get_phdr_memory(i, &load))
-			return FALSE;
+		if (info->flag_vmcore_estimate) {
+			memset(&load, 0, sizeof(load));
+			get_phdr_load(i , &load);
+		} else {
+			if (!get_phdr_memory(i, &load))
+				return FALSE;
 
-		if (load.p_type != PT_LOAD)
-			continue;
+			if (load.p_type != PT_LOAD)
+				continue;
+		}
 
 		off_memory= load.p_offset;
 		paddr = load.p_paddr;
@@ -8929,6 +8967,13 @@ check_param_for_creating_dumpfile(int argc, char *argv[])
 		 */
 		info->name_memory   = argv[optind];
 
+	} else if ((argc == optind + 2) && info->flag_vmcore_estimate) {
+		/*
+		 * Parameters for get the /proc/kcore to estimate
+		 * the size of dumped vmcore
+		 */
+		info->name_memory   = argv[optind];
+		info->name_dumpfile = argv[optind+1];
 	} else
 		return FALSE;
 
@@ -9011,6 +9056,332 @@ out:
 	return free_size;
 }
 
+struct memory_range {
+        unsigned long long start, end;
+};
+
+#define CRASH_RESERVED_MEM_NR   8
+static struct memory_range crash_reserved_mem[CRASH_RESERVED_MEM_NR];
+static int crash_reserved_mem_nr;
+
+/*
+ * iomem_for_each_line()
+ *
+ * Iterate over each line in the file returned by proc_iomem(). If match is
+ * NULL or if the line matches with our match-pattern then call the
+ * callback if non-NULL.
+ *
+ * Return the number of lines matched.
+ */
+int iomem_for_each_line(char *match,
+			      int (*callback)(void *data,
+					      int nr,
+					      char *str,
+					      unsigned long base,
+					      unsigned long length),
+			      void *data)
+{
+	const char iomem[] = "/proc/iomem";
+	char line[MAX_LINE];
+	FILE *fp;
+	unsigned long long start, end, size;
+	char *str;
+	int consumed;
+	int count;
+	int nr = 0;
+
+	fp = fopen(iomem, "r");
+	if (!fp) {
+		ERRMSG("Cannot open %s\n", iomem);
+		exit(1);
+	}
+
+	while(fgets(line, sizeof(line), fp) != 0) {
+		count = sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed);
+		if (count != 2)
+			continue;
+		str = line + consumed;
+		size = end - start + 1;
+		if (!match || memcmp(str, match, strlen(match)) == 0) {
+			if (callback
+			    && callback(data, nr, str, start, size) < 0) {
+				break;
+			}
+			nr++;
+		}
+	}
+
+	fclose(fp);
+
+	return nr;
+}
+
+static int crashkernel_mem_callback(void *data, int nr,
+                                          char *str,
+                                          unsigned long base,
+                                          unsigned long length)
+{
+        if (nr >= CRASH_RESERVED_MEM_NR)
+                return 1;
+
+        crash_reserved_mem[nr].start = base;
+        crash_reserved_mem[nr].end   = base + length - 1;
+        return 0;
+}
+
+int is_crashkernel_mem_reserved(void)
+{
+        int ret;
+
+        ret = iomem_for_each_line("Crash kernel\n",
+                                        crashkernel_mem_callback, NULL);
+        crash_reserved_mem_nr = ret;
+
+        return !!crash_reserved_mem_nr;
+}
+
+/* Returns the physical address of start of crash notes buffer for a kernel. */
+static int get_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
+{
+	char line[MAX_LINE];
+	int count;
+	FILE *fp;
+	unsigned long long temp, temp2;
+
+	*addr = 0;
+	*len = 0;
+
+	if (!(fp = fopen("/sys/kernel/vmcoreinfo", "r")))
+		return -1;
+
+	if (!fgets(line, sizeof(line), fp))
+		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
+	count = sscanf(line, "%Lx %Lx", &temp, &temp2);
+	if (count != 2)
+		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
+
+	*addr = (uint64_t) temp;
+	*len = (uint64_t) temp2;
+
+	fclose(fp);
+	return 0;
+}
+
+
+static int exclude_segment(struct pt_load_segment **pt_loads, unsigned int	*num_pt_loads, uint64_t start, uint64_t end)
+{
+        int i, j, tidx = -1;
+	unsigned long long	vstart, vend, kvstart, kvend;
+        struct pt_load_segment temp_seg = {0};
+	kvstart = (ulong)start | PAGE_OFFSET;
+	kvend = (ulong)end | PAGE_OFFSET;
+	unsigned long size;
+
+        for (i = 0; i < (*num_pt_loads); i++) {
+                vstart = (*pt_loads)[i].virt_start;
+                vend = (*pt_loads)[i].virt_end;
+                if (kvstart <  vend && kvend > vstart) {
+                        if (kvstart != vstart && kvend != vend) {
+				/* Split load segment */
+				temp_seg.phys_start = end +1;
+				temp_seg.phys_end = (*pt_loads)[i].phys_end;
+				temp_seg.virt_start = kvend + 1;
+				temp_seg.virt_end = vend;
+				temp_seg.file_offset = (*pt_loads)[i].file_offset + temp_seg.virt_start - (*pt_loads)[i].virt_start;
+
+				(*pt_loads)[i].virt_end = kvstart - 1;
+				(*pt_loads)[i].phys_end =  start -1;
+
+				tidx = i+1;
+                        } else if (kvstart != vstart) {
+				(*pt_loads)[i].phys_end = start - 1;
+				(*pt_loads)[i].virt_end = kvstart - 1;
+                        } else {
+				(*pt_loads)[i].phys_start = end + 1;
+				(*pt_loads)[i].virt_start = kvend + 1;
+                        }
+                }
+        }
+        /* Insert split load segment, if any. */
+	if (tidx >= 0) {
+		size = (*num_pt_loads + 1) * sizeof((*pt_loads)[0]);
+		(*pt_loads) = realloc((*pt_loads), size);
+		if  (!(*pt_loads) ) {
+		    ERRMSG("Cannot realloc %ld bytes: %s\n",
+		            size + 0UL, strerror(errno));
+			exit(1);
+		}
+		for (j = (*num_pt_loads - 1); j >= tidx; j--)
+		        (*pt_loads)[j+1] = (*pt_loads)[j];
+		(*pt_loads)[tidx] = temp_seg;
+		(*num_pt_loads)++;
+        }
+        return 0;
+}
+
+static int
+process_dump_load(struct pt_load_segment	*pls)
+{
+	unsigned long long paddr;
+
+	paddr = vaddr_to_paddr(pls->virt_start);
+	pls->phys_start  = paddr;
+	pls->phys_end    = paddr + (pls->virt_end - pls->virt_start);
+	MSG("process_dump_load\n");
+	MSG("  phys_start : %llx\n", pls->phys_start);
+	MSG("  phys_end   : %llx\n", pls->phys_end);
+	MSG("  virt_start : %llx\n", pls->virt_start);
+	MSG("  virt_end   : %llx\n", pls->virt_end);
+
+	return TRUE;
+}
+
+int get_kcore_dump_loads()
+{
+	struct pt_load_segment	*pls;
+	int i, j, loads=0;
+	unsigned long long paddr;
+
+	for (i = 0; i < num_pt_loads; ++i) {
+		struct pt_load_segment *p = &pt_loads[i];
+		if (is_vmalloc_addr(p->virt_start))
+			continue;
+		loads++;
+	}
+
+	pls = calloc(sizeof(struct pt_load_segment), j);
+	if (pls == NULL) {
+		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
+		    strerror(errno));
+		return FALSE;
+	}
+
+	for (i = 0, j=0; i < num_pt_loads; ++i) {
+		struct pt_load_segment *p = &pt_loads[i];
+		if (is_vmalloc_addr(p->virt_start))
+			continue;
+		if (j >= loads)
+			return FALSE;
+
+		if (j == 0) {
+			offset_pt_load_memory = p->file_offset;
+			if (offset_pt_load_memory == 0) {
+				ERRMSG("Can't get the offset of page data.\n");
+				return FALSE;
+			}
+		}
+
+		pls[j] = *p;
+		process_dump_load(&pls[j]);
+		j++;
+	}
+
+	free(pt_loads);
+	pt_loads = pls;
+	num_pt_loads = loads;
+
+	for (i=0; i<crash_reserved_mem_nr; i++)
+	{
+		exclude_segment(&pt_loads, &num_pt_loads, crash_reserved_mem[i].start, crash_reserved_mem[i].end);
+	}
+
+	max_file_offset = 0;
+	for (i = 0; i < num_pt_loads; ++i) {
+		struct pt_load_segment *p = &pt_loads[i];
+		max_file_offset = MAX(max_file_offset,
+				      p->file_offset + p->phys_end - p->phys_start);
+	}
+
+	for (i = 0; i < num_pt_loads; ++i) {
+		struct pt_load_segment *p = &pt_loads[i];
+		MSG("LOAD (%d)\n", i);
+		MSG("  phys_start : %llx\n", p->phys_start);
+		MSG("  phys_end   : %llx\n", p->phys_end);
+		MSG("  virt_start : %llx\n", p->virt_start);
+		MSG("  virt_end   : %llx\n", p->virt_end);
+	}
+
+	return TRUE;
+}
+
+int get_page_offset()
+{
+	struct utsname utsname;
+	if (uname(&utsname)) {
+		ERRMSG("Cannot get name and information about current kernel : %s", strerror(errno));
+		return FALSE;
+	}
+
+	info->kernel_version = get_kernel_version(utsname.release);
+	get_versiondep_info_x86_64();
+	return TRUE;
+}
+
+int vmcore_estimate(void)
+{
+	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
+	int num_retry, status;
+
+	if (!is_crashkernel_mem_reserved()) {
+		ERRMSG("No memory is reserved for crashkenrel!\n");
+		exit(1);
+	}
+
+	get_page_offset();
+
+#if 1
+	if (!open_dump_memory())
+		return FALSE;
+#endif
+
+	if (info->flag_vmcore_estimate) {
+		if (!get_elf_loads(info->fd_memory, info->name_memory))
+			return FALSE;
+	}
+
+	if (get_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
+		return FALSE;
+
+	if (set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
+		return FALSE;
+
+	if (!get_kcore_dump_loads())
+		return FALSE;
+
+#if 1
+	if (!initial())
+		return FALSE;
+#endif
+
+retry:
+	if (!create_dump_bitmap())
+		return FALSE;
+
+	if ((status = writeout_dumpfile()) == FALSE)
+		return FALSE;
+
+	if (status == NOSPACE) {
+		/*
+		 * If specifying the other dump_level, makedumpfile tries
+		 * to create a dumpfile with it again.
+		 */
+		num_retry++;
+		if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
+			return FALSE;
+		MSG("Retry to create a dumpfile by dump_level(%d).\n",
+		    info->dump_level);
+		if (!delete_dumpfile())
+			return FALSE;
+		goto retry;
+	}
+	print_report();
+
+	clear_filter_info();
+	if (!close_files_for_creating_dumpfile())
+		return FALSE;
+
+	return TRUE;
+}
 
 /*
  * Choose the lesser value of the two below as the size of cyclic buffer.
@@ -9063,6 +9434,7 @@ static struct option longopts[] = {
 	{"cyclic-buffer", required_argument, NULL, OPT_CYCLIC_BUFFER},
 	{"eppic", required_argument, NULL, OPT_EPPIC},
 	{"non-mmap", no_argument, NULL, OPT_NON_MMAP},
+	{"vmcore-estimate", no_argument, NULL, OPT_VMCORE_ESTIMATE},
 	{0, 0, 0, 0}
 };
 
@@ -9154,6 +9526,9 @@ main(int argc, char *argv[])
 		case OPT_DUMP_DMESG:
 			info->flag_dmesg = 1;
 			break;
+		case OPT_VMCORE_ESTIMATE:
+			info->flag_vmcore_estimate = 1;
+			break;
 		case OPT_COMPRESS_SNAPPY:
 			info->flag_compress = DUMP_DH_COMPRESSED_SNAPPY;
 			break;
@@ -9294,6 +9669,19 @@ main(int argc, char *argv[])
 
 		MSG("\n");
 		MSG("The dmesg log is saved to %s.\n", info->name_dumpfile);
+	} else if (info->flag_vmcore_estimate) {
+#if 1
+		if (!check_param_for_creating_dumpfile(argc, argv)) {
+			MSG("Commandline parameter is invalid.\n");
+			MSG("Try `makedumpfile --help' for more information.\n");
+			goto out;
+		}
+#endif
+		if (!vmcore_estimate())
+			goto out;
+
+		MSG("\n");
+		MSG("vmcore size estimate successfully.\n");
 	} else {
 		if (!check_param_for_creating_dumpfile(argc, argv)) {
 			MSG("Commandline parameter is invalid.\n");
diff --git a/makedumpfile.h b/makedumpfile.h
index 9402f05..c401337 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -216,6 +216,9 @@ isAnon(unsigned long mapping)
 #define FILENAME_STDOUT		"STDOUT"
 #define MAP_REGION		(4096*1024)
 
+#define MAX_LINE	160
+
+
 /*
  * Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
  */
@@ -910,6 +913,7 @@ struct DumpInfo {
 	int		flag_force;	     /* overwrite existing stuff */
 	int		flag_exclude_xen_dom;/* exclude Domain-U from xen-kdump */
 	int             flag_dmesg;          /* dump the dmesg log out of the vmcore file */
+	int             flag_vmcore_estimate;          /* estimate the size  of vmcore in current system */
 	int		flag_use_printk_log; /* did we read printk_log symbol name? */
 	int		flag_nospace;	     /* the flag of "No space on device" error */
 	int		flag_vmemmap;        /* kernel supports vmemmap address space */
@@ -1764,6 +1768,7 @@ struct elf_prstatus {
 #define OPT_CYCLIC_BUFFER       OPT_START+11
 #define OPT_EPPIC               OPT_START+12
 #define OPT_NON_MMAP            OPT_START+13
+#define OPT_VMCORE_ESTIMATE            OPT_START+14
 
 /*
  * Function Prototype.
-- 
1.8.5.3


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-11 12:39 [PATCH] Makedumpfile: vmcore size estimate Baoquan He
@ 2014-06-11 13:57 ` Baoquan He
  2014-06-20  1:07   ` Atsushi Kumagai
  2014-06-23 12:36 ` Vivek Goyal
  1 sibling, 1 reply; 16+ messages in thread
From: Baoquan He @ 2014-06-11 13:57 UTC (permalink / raw)
  To: kexec; +Cc: kumagai-atsushi, vgoyal

Forget to mention only x86-64 is processed in this patch.

On 06/11/14 at 08:39pm, Baoquan He wrote:
> User want to get a rough estimate of vmcore size, then they can decide
> how much storage space is reserved for vmcore dumping. This can help them
> to deploy their machines better, possibly hundreds of machines.
> 
> In this draft patch, a new configuration option is added,
>     "--vmcore-estimate"
> User can execute below command to get a dumped kcore. Since kcore is a
> elf file to map the whole memory of current kernel, it's  equal to the
> memory of crash kernel though it's not exact. Content of kcore is dynamic
> though /proc/vmcore is fixed once crash happened. But for vmcore size
> estimate, it is better enough.
> 
> sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
> 
> Questions:
> 1. Or we can get the dumpable page numbers only, then calculate the estimated
> vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
> lzo dump, we assume the compression ratio is 45%, then the estimate size is
> equal to: (dumpable page numbers) * 4096* 45%.
> 
> This is easier but too rough, does anybody like this better compared with the
> real dumping implemented in this draft patch.
> 
> 2. If dump the /proc/kcore, there's still a bug I can't fixed. When elf dump,
> in function write_elf_header()  it will pre-calculate a num_loads_dumpfile which
> is the number of program segment which will be dumped. However during dumping,
> the content of /proc/kcore is dynamic, the final num_loads_dumpfile may change
> when call write_elf_pages_cyclic/write_elf_pages(). This will cause the final
> dumped elf file has a bad file format. When you execute
> "readelf -a /var/crash/kcore-dump", you will be a little surprised.
> 
> 3. This is not a formal patch, if the final solution is decided, I will post a
> patch, maybe a patchset. If you have suggestions about the code or implementation,
> please post your comment.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  elf_info.c     | 136 ++++++++++++++++--
>  elf_info.h     |  17 +++
>  makedumpfile.c | 438 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  makedumpfile.h |   5 +
>  4 files changed, 560 insertions(+), 36 deletions(-)
> 
> diff --git a/elf_info.c b/elf_info.c
> index b277f69..1b05ad1 100644
> --- a/elf_info.c
> +++ b/elf_info.c
> @@ -36,16 +36,9 @@
>  
>  #define XEN_ELFNOTE_CRASH_INFO	(0x1000001)
>  
> -struct pt_load_segment {
> -	off_t			file_offset;
> -	unsigned long long	phys_start;
> -	unsigned long long	phys_end;
> -	unsigned long long	virt_start;
> -	unsigned long long	virt_end;
> -};
>  
>  static int			nr_cpus;             /* number of cpu */
> -static off_t			max_file_offset;
> +off_t			max_file_offset;
>  
>  /*
>   * File information about /proc/vmcore:
> @@ -60,9 +53,9 @@ static int			flags_memory;
>  /*
>   * PT_LOAD information about /proc/vmcore:
>   */
> -static unsigned int		num_pt_loads;
> -static struct pt_load_segment	*pt_loads;
> -static off_t			offset_pt_load_memory;
> +unsigned int		num_pt_loads;
> +struct pt_load_segment	*pt_loads;
> +off_t			offset_pt_load_memory;
>  
>  /*
>   * PT_NOTE information about /proc/vmcore:
> @@ -395,7 +388,49 @@ get_pt_note_info(void)
>  	return TRUE;
>  }
>  
> +#define UNINITIALIZED  ((ulong)(-1))
>  
> +#define SEEK_ERROR       (-1)
> +#define READ_ERROR       (-2)
> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
> +{
> +	int i;
> +	ulong kvaddr;
> +	Elf64_Nhdr *note64;
> +	off_t offset;
> +	char note[MAX_SIZE_NHDR];
> +	int size_desc;
> +	off_t offset_desc;
> +
> +	offset = UNINITIALIZED;
> +	kvaddr = (ulong)vmcoreinfo_addr | PAGE_OFFSET;
> +
> +	for (i = 0; i < num_pt_loads; ++i) {
> +		struct pt_load_segment *p = &pt_loads[i];
> +		if ((kvaddr >= p->virt_start) && (kvaddr < p->virt_end)) {
> +			offset = (off_t)(kvaddr - p->virt_start) +
> +			(off_t)p->file_offset;
> +			break;
> +		}
> +	}
> +
> +	if (offset == UNINITIALIZED)
> +		return SEEK_ERROR;
> +
> +        if (lseek(fd_memory, offset, SEEK_SET) != offset)
> +		perror("lseek");
> +
> +	if (read(fd_memory, note, MAX_SIZE_NHDR) != MAX_SIZE_NHDR)
> +		return READ_ERROR;
> +
> +	note64 = (Elf64_Nhdr *)note;
> +	size_desc   = note_descsz(note);
> +	offset_desc = offset + offset_note_desc(note);
> +
> +	set_vmcoreinfo(offset_desc, size_desc);
> +
> +	return 0;
> +}
>  /*
>   * External functions.
>   */
> @@ -681,6 +716,55 @@ get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr)
>  	return TRUE;
>  }
>  
> +int
> +get_elf_loads(int fd, char *filename)
> +{
> +	int i, j, phnum, elf_format;
> +	Elf64_Phdr phdr;
> +
> +	/*
> +	 * Check ELF64 or ELF32.
> +	 */
> +	elf_format = check_elf_format(fd, filename, &phnum, &num_pt_loads);
> +	if (elf_format == ELF64)
> +		flags_memory |= MEMORY_ELF64;
> +	else if (elf_format != ELF32)
> +		return FALSE;
> +
> +	if (!num_pt_loads) {
> +		ERRMSG("Can't get the number of PT_LOAD.\n");
> +		return FALSE;
> +	}
> +
> +	/*
> +	 * The below file information will be used as /proc/vmcore.
> +	 */
> +	fd_memory   = fd;
> +	name_memory = filename;
> +
> +	pt_loads = calloc(sizeof(struct pt_load_segment), num_pt_loads);
> +	if (pt_loads == NULL) {
> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
> +		    strerror(errno));
> +		return FALSE;
> +	}
> +	for (i = 0, j = 0; i < phnum; i++) {
> +		if (!get_phdr_memory(i, &phdr))
> +			return FALSE;
> +
> +		if (phdr.p_type != PT_LOAD)
> +			continue;
> +
> +		if (j >= num_pt_loads)
> +			return FALSE;
> +		if(!dump_Elf_load(&phdr, j))
> +			return FALSE;
> +		j++;
> +	}
> +
> +	return TRUE;
> +}
> +
>  /*
>   * Get ELF information about /proc/vmcore.
>   */
> @@ -826,6 +910,36 @@ get_phdr_memory(int index, Elf64_Phdr *phdr)
>  	return TRUE;
>  }
>  
> +int
> +get_phdr_load(int index, Elf64_Phdr *phdr)
> +{
> +	Elf32_Phdr phdr32;
> +
> +	if (is_elf64_memory()) { /* ELF64 */
> +		phdr->p_type = PT_LOAD;
> +		phdr->p_vaddr = pt_loads[index].virt_start;
> +		phdr->p_paddr = pt_loads[index].phys_start;
> +		phdr->p_memsz  = pt_loads[index].phys_end - pt_loads[index].phys_start;
> +		phdr->p_filesz = phdr->p_memsz;
> +		phdr->p_offset = pt_loads[index].file_offset;
> +	} else {
> +		if (!get_elf32_phdr(fd_memory, name_memory, index, &phdr32)) {
> +			ERRMSG("Can't find Phdr %d.\n", index);
> +			return FALSE;
> +		}
> +		memset(phdr, 0, sizeof(Elf64_Phdr));
> +		phdr->p_type   = phdr32.p_type;
> +		phdr->p_flags  = phdr32.p_flags;
> +		phdr->p_offset = phdr32.p_offset;
> +		phdr->p_vaddr  = phdr32.p_vaddr;
> +		phdr->p_paddr  = phdr32.p_paddr;
> +		phdr->p_filesz = phdr32.p_filesz;
> +		phdr->p_memsz  = phdr32.p_memsz;
> +		phdr->p_align  = phdr32.p_align;
> +	}
> +	return TRUE;
> +}
> +
>  off_t
>  get_offset_pt_load_memory(void)
>  {
> diff --git a/elf_info.h b/elf_info.h
> index 801faff..0c67d74 100644
> --- a/elf_info.h
> +++ b/elf_info.h
> @@ -27,6 +27,19 @@
>  
>  #define MAX_SIZE_NHDR	MAX(sizeof(Elf64_Nhdr), sizeof(Elf32_Nhdr))
>  
> +struct pt_load_segment {
> +	off_t			file_offset;
> +	unsigned long long	phys_start;
> +	unsigned long long	phys_end;
> +	unsigned long long	virt_start;
> +	unsigned long long	virt_end;
> +};
> +
> +extern off_t			max_file_offset;
> +extern unsigned int		num_pt_loads;
> +extern struct pt_load_segment	*pt_loads;
> +
> +extern off_t			offset_pt_load_memory;
>  
>  off_t paddr_to_offset(unsigned long long paddr);
>  off_t paddr_to_offset2(unsigned long long paddr, off_t hint);
> @@ -44,11 +57,14 @@ int get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr);
>  int get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr);
>  int get_elf_info(int fd, char *filename);
>  void free_elf_info(void);
> +int get_elf_loads(int fd, char *filename);
>  
>  int is_elf64_memory(void);
>  int is_xen_memory(void);
>  
>  int get_phnum_memory(void);
> +
> +int get_phdr_load(int index, Elf64_Phdr *phdr);
>  int get_phdr_memory(int index, Elf64_Phdr *phdr);
>  off_t get_offset_pt_load_memory(void);
>  int get_pt_load(int idx,
> @@ -68,6 +84,7 @@ void get_pt_note(off_t *offset, unsigned long *size);
>  int has_vmcoreinfo(void);
>  void set_vmcoreinfo(off_t offset, unsigned long size);
>  void get_vmcoreinfo(off_t *offset, unsigned long *size);
> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len);
>  
>  int has_vmcoreinfo_xen(void);
>  void get_vmcoreinfo_xen(off_t *offset, unsigned long *size);
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 34db997..ac02747 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -5146,6 +5146,7 @@ create_dump_bitmap(void)
>  
>  	if (info->flag_cyclic) {
>  
> +		printf("create_dump_bitmap flag_cyclic\n");
>  		if (info->flag_elf_dumpfile) {
>  			if (!prepare_bitmap_buffer_cyclic())
>  				goto out;
> @@ -5189,14 +5190,23 @@ get_loads_dumpfile(void)
>  
>  	initialize_2nd_bitmap(&bitmap2);
>  
> -	if (!(phnum = get_phnum_memory()))
> -		return FALSE;
> -
> -	for (i = 0; i < phnum; i++) {
> -		if (!get_phdr_memory(i, &load))
> +	if (info->flag_vmcore_estimate) {
> +		phnum = num_pt_loads;
> +	} else {
> +		if (!(phnum = get_phnum_memory()))
>  			return FALSE;
> -		if (load.p_type != PT_LOAD)
> -			continue;
> +	}
> +
> +	for (i = 0; i < num_pt_loads; i++) {
> +		if (info->flag_vmcore_estimate) {
> +			get_phdr_load(i , &load);
> +		} else {
> +			if (!get_phdr_memory(i, &load))
> +				return FALSE;
> +
> +			if (load.p_type != PT_LOAD)
> +				continue;
> +		}
>  
>  		pfn_start = paddr_to_pfn(load.p_paddr);
>  		pfn_end   = paddr_to_pfn(load.p_paddr + load.p_memsz);
> @@ -5734,17 +5744,26 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
>  	off_seg_load    = info->offset_load_dumpfile;
>  	cd_page->offset = info->offset_load_dumpfile;
>  
> -	if (!(phnum = get_phnum_memory()))
> -		return FALSE;
> +	if (info->flag_vmcore_estimate) {
> +		phnum = num_pt_loads;
> +	} else { 
> +		if (!(phnum = get_phnum_memory()))
> +			return FALSE;
> +	}
>  
>  	gettimeofday(&tv_start, NULL);
>  
>  	for (i = 0; i < phnum; i++) {
> -		if (!get_phdr_memory(i, &load))
> -			return FALSE;
> +		if (info->flag_vmcore_estimate) {
> +			memset(&load, 0, sizeof(load));
> +			get_phdr_load(i , &load);
> +		} else {
> +			if (!get_phdr_memory(i, &load))
> +				return FALSE;
>  
> -		if (load.p_type != PT_LOAD)
> -			continue;
> +			if (load.p_type != PT_LOAD)
> +				continue;
> +		}
>  
>  		off_memory= load.p_offset;
>  		paddr     = load.p_paddr;
> @@ -5923,14 +5942,24 @@ get_loads_dumpfile_cyclic(void)
>  	Elf64_Phdr load;
>  	struct cycle cycle = {0};
>  
> -	if (!(phnum = get_phnum_memory()))
> -		return FALSE;
> +	if (info->flag_vmcore_estimate) {
> +		phnum = num_pt_loads;
> +	} else {
> +		if (!(phnum = get_phnum_memory()))
> +			return FALSE;
> +	}
>  
>  	for (i = 0; i < phnum; i++) {
> -		if (!get_phdr_memory(i, &load))
> -			return FALSE;
> -		if (load.p_type != PT_LOAD)
> -			continue;
> +		if (info->flag_vmcore_estimate) {
> +			memset(&load, 0, sizeof(load) );
> +			get_phdr_load(i , &load);
> +		} else {
> +			if (!get_phdr_memory(i, &load))
> +				return FALSE;
> +
> +			if (load.p_type != PT_LOAD)
> +				continue;
> +		}
>  
>  		pfn_start = paddr_to_pfn(load.p_paddr);
>  		pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
> @@ -6016,17 +6045,26 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
>  	pfn_user = pfn_free = pfn_hwpoison = 0;
>  	pfn_memhole = info->max_mapnr;
>  
> -	if (!(phnum = get_phnum_memory()))
> -		return FALSE;
> +	if (info->flag_vmcore_estimate) {
> +		phnum = num_pt_loads;
> +	} else { 
> +		if (!(phnum = get_phnum_memory()))
> +			return FALSE;
> +	}
>  
>  	gettimeofday(&tv_start, NULL);
>  
>  	for (i = 0; i < phnum; i++) {
> -		if (!get_phdr_memory(i, &load))
> -			return FALSE;
> +		if (info->flag_vmcore_estimate) {
> +			memset(&load, 0, sizeof(load));
> +			get_phdr_load(i , &load);
> +		} else {
> +			if (!get_phdr_memory(i, &load))
> +				return FALSE;
>  
> -		if (load.p_type != PT_LOAD)
> -			continue;
> +			if (load.p_type != PT_LOAD)
> +				continue;
> +		}
>  
>  		off_memory= load.p_offset;
>  		paddr = load.p_paddr;
> @@ -8929,6 +8967,13 @@ check_param_for_creating_dumpfile(int argc, char *argv[])
>  		 */
>  		info->name_memory   = argv[optind];
>  
> +	} else if ((argc == optind + 2) && info->flag_vmcore_estimate) {
> +		/*
> +		 * Parameters for get the /proc/kcore to estimate
> +		 * the size of dumped vmcore
> +		 */
> +		info->name_memory   = argv[optind];
> +		info->name_dumpfile = argv[optind+1];
>  	} else
>  		return FALSE;
>  
> @@ -9011,6 +9056,332 @@ out:
>  	return free_size;
>  }
>  
> +struct memory_range {
> +        unsigned long long start, end;
> +};
> +
> +#define CRASH_RESERVED_MEM_NR   8
> +static struct memory_range crash_reserved_mem[CRASH_RESERVED_MEM_NR];
> +static int crash_reserved_mem_nr;
> +
> +/*
> + * iomem_for_each_line()
> + *
> + * Iterate over each line in the file returned by proc_iomem(). If match is
> + * NULL or if the line matches with our match-pattern then call the
> + * callback if non-NULL.
> + *
> + * Return the number of lines matched.
> + */
> +int iomem_for_each_line(char *match,
> +			      int (*callback)(void *data,
> +					      int nr,
> +					      char *str,
> +					      unsigned long base,
> +					      unsigned long length),
> +			      void *data)
> +{
> +	const char iomem[] = "/proc/iomem";
> +	char line[MAX_LINE];
> +	FILE *fp;
> +	unsigned long long start, end, size;
> +	char *str;
> +	int consumed;
> +	int count;
> +	int nr = 0;
> +
> +	fp = fopen(iomem, "r");
> +	if (!fp) {
> +		ERRMSG("Cannot open %s\n", iomem);
> +		exit(1);
> +	}
> +
> +	while(fgets(line, sizeof(line), fp) != 0) {
> +		count = sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed);
> +		if (count != 2)
> +			continue;
> +		str = line + consumed;
> +		size = end - start + 1;
> +		if (!match || memcmp(str, match, strlen(match)) == 0) {
> +			if (callback
> +			    && callback(data, nr, str, start, size) < 0) {
> +				break;
> +			}
> +			nr++;
> +		}
> +	}
> +
> +	fclose(fp);
> +
> +	return nr;
> +}
> +
> +static int crashkernel_mem_callback(void *data, int nr,
> +                                          char *str,
> +                                          unsigned long base,
> +                                          unsigned long length)
> +{
> +        if (nr >= CRASH_RESERVED_MEM_NR)
> +                return 1;
> +
> +        crash_reserved_mem[nr].start = base;
> +        crash_reserved_mem[nr].end   = base + length - 1;
> +        return 0;
> +}
> +
> +int is_crashkernel_mem_reserved(void)
> +{
> +        int ret;
> +
> +        ret = iomem_for_each_line("Crash kernel\n",
> +                                        crashkernel_mem_callback, NULL);
> +        crash_reserved_mem_nr = ret;
> +
> +        return !!crash_reserved_mem_nr;
> +}
> +
> +/* Returns the physical address of start of crash notes buffer for a kernel. */
> +static int get_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
> +{
> +	char line[MAX_LINE];
> +	int count;
> +	FILE *fp;
> +	unsigned long long temp, temp2;
> +
> +	*addr = 0;
> +	*len = 0;
> +
> +	if (!(fp = fopen("/sys/kernel/vmcoreinfo", "r")))
> +		return -1;
> +
> +	if (!fgets(line, sizeof(line), fp))
> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
> +	count = sscanf(line, "%Lx %Lx", &temp, &temp2);
> +	if (count != 2)
> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
> +
> +	*addr = (uint64_t) temp;
> +	*len = (uint64_t) temp2;
> +
> +	fclose(fp);
> +	return 0;
> +}
> +
> +
> +static int exclude_segment(struct pt_load_segment **pt_loads, unsigned int	*num_pt_loads, uint64_t start, uint64_t end)
> +{
> +        int i, j, tidx = -1;
> +	unsigned long long	vstart, vend, kvstart, kvend;
> +        struct pt_load_segment temp_seg = {0};
> +	kvstart = (ulong)start | PAGE_OFFSET;
> +	kvend = (ulong)end | PAGE_OFFSET;
> +	unsigned long size;
> +
> +        for (i = 0; i < (*num_pt_loads); i++) {
> +                vstart = (*pt_loads)[i].virt_start;
> +                vend = (*pt_loads)[i].virt_end;
> +                if (kvstart <  vend && kvend > vstart) {
> +                        if (kvstart != vstart && kvend != vend) {
> +				/* Split load segment */
> +				temp_seg.phys_start = end +1;
> +				temp_seg.phys_end = (*pt_loads)[i].phys_end;
> +				temp_seg.virt_start = kvend + 1;
> +				temp_seg.virt_end = vend;
> +				temp_seg.file_offset = (*pt_loads)[i].file_offset + temp_seg.virt_start - (*pt_loads)[i].virt_start;
> +
> +				(*pt_loads)[i].virt_end = kvstart - 1;
> +				(*pt_loads)[i].phys_end =  start -1;
> +
> +				tidx = i+1;
> +                        } else if (kvstart != vstart) {
> +				(*pt_loads)[i].phys_end = start - 1;
> +				(*pt_loads)[i].virt_end = kvstart - 1;
> +                        } else {
> +				(*pt_loads)[i].phys_start = end + 1;
> +				(*pt_loads)[i].virt_start = kvend + 1;
> +                        }
> +                }
> +        }
> +        /* Insert split load segment, if any. */
> +	if (tidx >= 0) {
> +		size = (*num_pt_loads + 1) * sizeof((*pt_loads)[0]);
> +		(*pt_loads) = realloc((*pt_loads), size);
> +		if  (!(*pt_loads) ) {
> +		    ERRMSG("Cannot realloc %ld bytes: %s\n",
> +		            size + 0UL, strerror(errno));
> +			exit(1);
> +		}
> +		for (j = (*num_pt_loads - 1); j >= tidx; j--)
> +		        (*pt_loads)[j+1] = (*pt_loads)[j];
> +		(*pt_loads)[tidx] = temp_seg;
> +		(*num_pt_loads)++;
> +        }
> +        return 0;
> +}
> +
> +static int
> +process_dump_load(struct pt_load_segment	*pls)
> +{
> +	unsigned long long paddr;
> +
> +	paddr = vaddr_to_paddr(pls->virt_start);
> +	pls->phys_start  = paddr;
> +	pls->phys_end    = paddr + (pls->virt_end - pls->virt_start);
> +	MSG("process_dump_load\n");
> +	MSG("  phys_start : %llx\n", pls->phys_start);
> +	MSG("  phys_end   : %llx\n", pls->phys_end);
> +	MSG("  virt_start : %llx\n", pls->virt_start);
> +	MSG("  virt_end   : %llx\n", pls->virt_end);
> +
> +	return TRUE;
> +}
> +
> +int get_kcore_dump_loads()
> +{
> +	struct pt_load_segment	*pls;
> +	int i, j, loads=0;
> +	unsigned long long paddr;
> +
> +	for (i = 0; i < num_pt_loads; ++i) {
> +		struct pt_load_segment *p = &pt_loads[i];
> +		if (is_vmalloc_addr(p->virt_start))
> +			continue;
> +		loads++;
> +	}
> +
> +	pls = calloc(sizeof(struct pt_load_segment), j);
> +	if (pls == NULL) {
> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
> +		    strerror(errno));
> +		return FALSE;
> +	}
> +
> +	for (i = 0, j=0; i < num_pt_loads; ++i) {
> +		struct pt_load_segment *p = &pt_loads[i];
> +		if (is_vmalloc_addr(p->virt_start))
> +			continue;
> +		if (j >= loads)
> +			return FALSE;
> +
> +		if (j == 0) {
> +			offset_pt_load_memory = p->file_offset;
> +			if (offset_pt_load_memory == 0) {
> +				ERRMSG("Can't get the offset of page data.\n");
> +				return FALSE;
> +			}
> +		}
> +
> +		pls[j] = *p;
> +		process_dump_load(&pls[j]);
> +		j++;
> +	}
> +
> +	free(pt_loads);
> +	pt_loads = pls;
> +	num_pt_loads = loads;
> +
> +	for (i=0; i<crash_reserved_mem_nr; i++)
> +	{
> +		exclude_segment(&pt_loads, &num_pt_loads, crash_reserved_mem[i].start, crash_reserved_mem[i].end);
> +	}
> +
> +	max_file_offset = 0;
> +	for (i = 0; i < num_pt_loads; ++i) {
> +		struct pt_load_segment *p = &pt_loads[i];
> +		max_file_offset = MAX(max_file_offset,
> +				      p->file_offset + p->phys_end - p->phys_start);
> +	}
> +
> +	for (i = 0; i < num_pt_loads; ++i) {
> +		struct pt_load_segment *p = &pt_loads[i];
> +		MSG("LOAD (%d)\n", i);
> +		MSG("  phys_start : %llx\n", p->phys_start);
> +		MSG("  phys_end   : %llx\n", p->phys_end);
> +		MSG("  virt_start : %llx\n", p->virt_start);
> +		MSG("  virt_end   : %llx\n", p->virt_end);
> +	}
> +
> +	return TRUE;
> +}
> +
> +int get_page_offset()
> +{
> +	struct utsname utsname;
> +	if (uname(&utsname)) {
> +		ERRMSG("Cannot get name and information about current kernel : %s", strerror(errno));
> +		return FALSE;
> +	}
> +
> +	info->kernel_version = get_kernel_version(utsname.release);
> +	get_versiondep_info_x86_64();
> +	return TRUE;
> +}
> +
> +int vmcore_estimate(void)
> +{
> +	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
> +	int num_retry, status;
> +
> +	if (!is_crashkernel_mem_reserved()) {
> +		ERRMSG("No memory is reserved for crashkenrel!\n");
> +		exit(1);
> +	}
> +
> +	get_page_offset();
> +
> +#if 1
> +	if (!open_dump_memory())
> +		return FALSE;
> +#endif
> +
> +	if (info->flag_vmcore_estimate) {
> +		if (!get_elf_loads(info->fd_memory, info->name_memory))
> +			return FALSE;
> +	}
> +
> +	if (get_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> +		return FALSE;
> +
> +	if (set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> +		return FALSE;
> +
> +	if (!get_kcore_dump_loads())
> +		return FALSE;
> +
> +#if 1
> +	if (!initial())
> +		return FALSE;
> +#endif
> +
> +retry:
> +	if (!create_dump_bitmap())
> +		return FALSE;
> +
> +	if ((status = writeout_dumpfile()) == FALSE)
> +		return FALSE;
> +
> +	if (status == NOSPACE) {
> +		/*
> +		 * If specifying the other dump_level, makedumpfile tries
> +		 * to create a dumpfile with it again.
> +		 */
> +		num_retry++;
> +		if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
> +			return FALSE;
> +		MSG("Retry to create a dumpfile by dump_level(%d).\n",
> +		    info->dump_level);
> +		if (!delete_dumpfile())
> +			return FALSE;
> +		goto retry;
> +	}
> +	print_report();
> +
> +	clear_filter_info();
> +	if (!close_files_for_creating_dumpfile())
> +		return FALSE;
> +
> +	return TRUE;
> +}
>  
>  /*
>   * Choose the lesser value of the two below as the size of cyclic buffer.
> @@ -9063,6 +9434,7 @@ static struct option longopts[] = {
>  	{"cyclic-buffer", required_argument, NULL, OPT_CYCLIC_BUFFER},
>  	{"eppic", required_argument, NULL, OPT_EPPIC},
>  	{"non-mmap", no_argument, NULL, OPT_NON_MMAP},
> +	{"vmcore-estimate", no_argument, NULL, OPT_VMCORE_ESTIMATE},
>  	{0, 0, 0, 0}
>  };
>  
> @@ -9154,6 +9526,9 @@ main(int argc, char *argv[])
>  		case OPT_DUMP_DMESG:
>  			info->flag_dmesg = 1;
>  			break;
> +		case OPT_VMCORE_ESTIMATE:
> +			info->flag_vmcore_estimate = 1;
> +			break;
>  		case OPT_COMPRESS_SNAPPY:
>  			info->flag_compress = DUMP_DH_COMPRESSED_SNAPPY;
>  			break;
> @@ -9294,6 +9669,19 @@ main(int argc, char *argv[])
>  
>  		MSG("\n");
>  		MSG("The dmesg log is saved to %s.\n", info->name_dumpfile);
> +	} else if (info->flag_vmcore_estimate) {
> +#if 1
> +		if (!check_param_for_creating_dumpfile(argc, argv)) {
> +			MSG("Commandline parameter is invalid.\n");
> +			MSG("Try `makedumpfile --help' for more information.\n");
> +			goto out;
> +		}
> +#endif
> +		if (!vmcore_estimate())
> +			goto out;
> +
> +		MSG("\n");
> +		MSG("vmcore size estimate successfully.\n");
>  	} else {
>  		if (!check_param_for_creating_dumpfile(argc, argv)) {
>  			MSG("Commandline parameter is invalid.\n");
> diff --git a/makedumpfile.h b/makedumpfile.h
> index 9402f05..c401337 100644
> --- a/makedumpfile.h
> +++ b/makedumpfile.h
> @@ -216,6 +216,9 @@ isAnon(unsigned long mapping)
>  #define FILENAME_STDOUT		"STDOUT"
>  #define MAP_REGION		(4096*1024)
>  
> +#define MAX_LINE	160
> +
> +
>  /*
>   * Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
>   */
> @@ -910,6 +913,7 @@ struct DumpInfo {
>  	int		flag_force;	     /* overwrite existing stuff */
>  	int		flag_exclude_xen_dom;/* exclude Domain-U from xen-kdump */
>  	int             flag_dmesg;          /* dump the dmesg log out of the vmcore file */
> +	int             flag_vmcore_estimate;          /* estimate the size  of vmcore in current system */
>  	int		flag_use_printk_log; /* did we read printk_log symbol name? */
>  	int		flag_nospace;	     /* the flag of "No space on device" error */
>  	int		flag_vmemmap;        /* kernel supports vmemmap address space */
> @@ -1764,6 +1768,7 @@ struct elf_prstatus {
>  #define OPT_CYCLIC_BUFFER       OPT_START+11
>  #define OPT_EPPIC               OPT_START+12
>  #define OPT_NON_MMAP            OPT_START+13
> +#define OPT_VMCORE_ESTIMATE            OPT_START+14
>  
>  /*
>   * Function Prototype.
> -- 
> 1.8.5.3
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-11 13:57 ` Baoquan He
@ 2014-06-20  1:07   ` Atsushi Kumagai
  2014-06-20  1:58     ` bhe
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Atsushi Kumagai @ 2014-06-20  1:07 UTC (permalink / raw)
  To: bhe; +Cc: kexec, vgoyal

Hello Baoquan,

>Forget to mention only x86-64 is processed in this patch.
>
>On 06/11/14 at 08:39pm, Baoquan He wrote:
>> User want to get a rough estimate of vmcore size, then they can decide
>> how much storage space is reserved for vmcore dumping. This can help them
>> to deploy their machines better, possibly hundreds of machines.

You suggested this feature before, but I don't still agree with this.

No one can guarantee that the vmcore size will be below the estimated
size every time. However, if makedumpfile provides "--vmcore-estimate",
some users may trust it completely and disk overflow might happen. 
Ideally, users should prepare the disk which can store the possible
maximum size of vmcore. Of course they can reduce the disk size on their
responsibility, but makedumpfile can't help it as official feature.


Thanks
Atsushi Kumagai

>> In this draft patch, a new configuration option is added,
>>     "--vmcore-estimate"
>> User can execute below command to get a dumped kcore. Since kcore is a
>> elf file to map the whole memory of current kernel, it's  equal to the
>> memory of crash kernel though it's not exact. Content of kcore is dynamic
>> though /proc/vmcore is fixed once crash happened. But for vmcore size
>> estimate, it is better enough.
>>
>> sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
>>
>> Questions:
>> 1. Or we can get the dumpable page numbers only, then calculate the estimated
>> vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
>> lzo dump, we assume the compression ratio is 45%, then the estimate size is
>> equal to: (dumpable page numbers) * 4096* 45%.
>>
>> This is easier but too rough, does anybody like this better compared with the
>> real dumping implemented in this draft patch.
>>
>> 2. If dump the /proc/kcore, there's still a bug I can't fixed. When elf dump,
>> in function write_elf_header()  it will pre-calculate a num_loads_dumpfile which
>> is the number of program segment which will be dumped. However during dumping,
>> the content of /proc/kcore is dynamic, the final num_loads_dumpfile may change
>> when call write_elf_pages_cyclic/write_elf_pages(). This will cause the final
>> dumped elf file has a bad file format. When you execute
>> "readelf -a /var/crash/kcore-dump", you will be a little surprised.
>>
>> 3. This is not a formal patch, if the final solution is decided, I will post a
>> patch, maybe a patchset. If you have suggestions about the code or implementation,
>> please post your comment.
>>
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> ---
>>  elf_info.c     | 136 ++++++++++++++++--
>>  elf_info.h     |  17 +++
>>  makedumpfile.c | 438 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
>>  makedumpfile.h |   5 +
>>  4 files changed, 560 insertions(+), 36 deletions(-)
>>
>> diff --git a/elf_info.c b/elf_info.c
>> index b277f69..1b05ad1 100644
>> --- a/elf_info.c
>> +++ b/elf_info.c
>> @@ -36,16 +36,9 @@
>>
>>  #define XEN_ELFNOTE_CRASH_INFO	(0x1000001)
>>
>> -struct pt_load_segment {
>> -	off_t			file_offset;
>> -	unsigned long long	phys_start;
>> -	unsigned long long	phys_end;
>> -	unsigned long long	virt_start;
>> -	unsigned long long	virt_end;
>> -};
>>
>>  static int			nr_cpus;             /* number of cpu */
>> -static off_t			max_file_offset;
>> +off_t			max_file_offset;
>>
>>  /*
>>   * File information about /proc/vmcore:
>> @@ -60,9 +53,9 @@ static int			flags_memory;
>>  /*
>>   * PT_LOAD information about /proc/vmcore:
>>   */
>> -static unsigned int		num_pt_loads;
>> -static struct pt_load_segment	*pt_loads;
>> -static off_t			offset_pt_load_memory;
>> +unsigned int		num_pt_loads;
>> +struct pt_load_segment	*pt_loads;
>> +off_t			offset_pt_load_memory;
>>
>>  /*
>>   * PT_NOTE information about /proc/vmcore:
>> @@ -395,7 +388,49 @@ get_pt_note_info(void)
>>  	return TRUE;
>>  }
>>
>> +#define UNINITIALIZED  ((ulong)(-1))
>>
>> +#define SEEK_ERROR       (-1)
>> +#define READ_ERROR       (-2)
>> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
>> +{
>> +	int i;
>> +	ulong kvaddr;
>> +	Elf64_Nhdr *note64;
>> +	off_t offset;
>> +	char note[MAX_SIZE_NHDR];
>> +	int size_desc;
>> +	off_t offset_desc;
>> +
>> +	offset = UNINITIALIZED;
>> +	kvaddr = (ulong)vmcoreinfo_addr | PAGE_OFFSET;
>> +
>> +	for (i = 0; i < num_pt_loads; ++i) {
>> +		struct pt_load_segment *p = &pt_loads[i];
>> +		if ((kvaddr >= p->virt_start) && (kvaddr < p->virt_end)) {
>> +			offset = (off_t)(kvaddr - p->virt_start) +
>> +			(off_t)p->file_offset;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (offset == UNINITIALIZED)
>> +		return SEEK_ERROR;
>> +
>> +        if (lseek(fd_memory, offset, SEEK_SET) != offset)
>> +		perror("lseek");
>> +
>> +	if (read(fd_memory, note, MAX_SIZE_NHDR) != MAX_SIZE_NHDR)
>> +		return READ_ERROR;
>> +
>> +	note64 = (Elf64_Nhdr *)note;
>> +	size_desc   = note_descsz(note);
>> +	offset_desc = offset + offset_note_desc(note);
>> +
>> +	set_vmcoreinfo(offset_desc, size_desc);
>> +
>> +	return 0;
>> +}
>>  /*
>>   * External functions.
>>   */
>> @@ -681,6 +716,55 @@ get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr)
>>  	return TRUE;
>>  }
>>
>> +int
>> +get_elf_loads(int fd, char *filename)
>> +{
>> +	int i, j, phnum, elf_format;
>> +	Elf64_Phdr phdr;
>> +
>> +	/*
>> +	 * Check ELF64 or ELF32.
>> +	 */
>> +	elf_format = check_elf_format(fd, filename, &phnum, &num_pt_loads);
>> +	if (elf_format == ELF64)
>> +		flags_memory |= MEMORY_ELF64;
>> +	else if (elf_format != ELF32)
>> +		return FALSE;
>> +
>> +	if (!num_pt_loads) {
>> +		ERRMSG("Can't get the number of PT_LOAD.\n");
>> +		return FALSE;
>> +	}
>> +
>> +	/*
>> +	 * The below file information will be used as /proc/vmcore.
>> +	 */
>> +	fd_memory   = fd;
>> +	name_memory = filename;
>> +
>> +	pt_loads = calloc(sizeof(struct pt_load_segment), num_pt_loads);
>> +	if (pt_loads == NULL) {
>> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
>> +		    strerror(errno));
>> +		return FALSE;
>> +	}
>> +	for (i = 0, j = 0; i < phnum; i++) {
>> +		if (!get_phdr_memory(i, &phdr))
>> +			return FALSE;
>> +
>> +		if (phdr.p_type != PT_LOAD)
>> +			continue;
>> +
>> +		if (j >= num_pt_loads)
>> +			return FALSE;
>> +		if(!dump_Elf_load(&phdr, j))
>> +			return FALSE;
>> +		j++;
>> +	}
>> +
>> +	return TRUE;
>> +}
>> +
>>  /*
>>   * Get ELF information about /proc/vmcore.
>>   */
>> @@ -826,6 +910,36 @@ get_phdr_memory(int index, Elf64_Phdr *phdr)
>>  	return TRUE;
>>  }
>>
>> +int
>> +get_phdr_load(int index, Elf64_Phdr *phdr)
>> +{
>> +	Elf32_Phdr phdr32;
>> +
>> +	if (is_elf64_memory()) { /* ELF64 */
>> +		phdr->p_type = PT_LOAD;
>> +		phdr->p_vaddr = pt_loads[index].virt_start;
>> +		phdr->p_paddr = pt_loads[index].phys_start;
>> +		phdr->p_memsz  = pt_loads[index].phys_end - pt_loads[index].phys_start;
>> +		phdr->p_filesz = phdr->p_memsz;
>> +		phdr->p_offset = pt_loads[index].file_offset;
>> +	} else {
>> +		if (!get_elf32_phdr(fd_memory, name_memory, index, &phdr32)) {
>> +			ERRMSG("Can't find Phdr %d.\n", index);
>> +			return FALSE;
>> +		}
>> +		memset(phdr, 0, sizeof(Elf64_Phdr));
>> +		phdr->p_type   = phdr32.p_type;
>> +		phdr->p_flags  = phdr32.p_flags;
>> +		phdr->p_offset = phdr32.p_offset;
>> +		phdr->p_vaddr  = phdr32.p_vaddr;
>> +		phdr->p_paddr  = phdr32.p_paddr;
>> +		phdr->p_filesz = phdr32.p_filesz;
>> +		phdr->p_memsz  = phdr32.p_memsz;
>> +		phdr->p_align  = phdr32.p_align;
>> +	}
>> +	return TRUE;
>> +}
>> +
>>  off_t
>>  get_offset_pt_load_memory(void)
>>  {
>> diff --git a/elf_info.h b/elf_info.h
>> index 801faff..0c67d74 100644
>> --- a/elf_info.h
>> +++ b/elf_info.h
>> @@ -27,6 +27,19 @@
>>
>>  #define MAX_SIZE_NHDR	MAX(sizeof(Elf64_Nhdr), sizeof(Elf32_Nhdr))
>>
>> +struct pt_load_segment {
>> +	off_t			file_offset;
>> +	unsigned long long	phys_start;
>> +	unsigned long long	phys_end;
>> +	unsigned long long	virt_start;
>> +	unsigned long long	virt_end;
>> +};
>> +
>> +extern off_t			max_file_offset;
>> +extern unsigned int		num_pt_loads;
>> +extern struct pt_load_segment	*pt_loads;
>> +
>> +extern off_t			offset_pt_load_memory;
>>
>>  off_t paddr_to_offset(unsigned long long paddr);
>>  off_t paddr_to_offset2(unsigned long long paddr, off_t hint);
>> @@ -44,11 +57,14 @@ int get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr);
>>  int get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr);
>>  int get_elf_info(int fd, char *filename);
>>  void free_elf_info(void);
>> +int get_elf_loads(int fd, char *filename);
>>
>>  int is_elf64_memory(void);
>>  int is_xen_memory(void);
>>
>>  int get_phnum_memory(void);
>> +
>> +int get_phdr_load(int index, Elf64_Phdr *phdr);
>>  int get_phdr_memory(int index, Elf64_Phdr *phdr);
>>  off_t get_offset_pt_load_memory(void);
>>  int get_pt_load(int idx,
>> @@ -68,6 +84,7 @@ void get_pt_note(off_t *offset, unsigned long *size);
>>  int has_vmcoreinfo(void);
>>  void set_vmcoreinfo(off_t offset, unsigned long size);
>>  void get_vmcoreinfo(off_t *offset, unsigned long *size);
>> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len);
>>
>>  int has_vmcoreinfo_xen(void);
>>  void get_vmcoreinfo_xen(off_t *offset, unsigned long *size);
>> diff --git a/makedumpfile.c b/makedumpfile.c
>> index 34db997..ac02747 100644
>> --- a/makedumpfile.c
>> +++ b/makedumpfile.c
>> @@ -5146,6 +5146,7 @@ create_dump_bitmap(void)
>>
>>  	if (info->flag_cyclic) {
>>
>> +		printf("create_dump_bitmap flag_cyclic\n");
>>  		if (info->flag_elf_dumpfile) {
>>  			if (!prepare_bitmap_buffer_cyclic())
>>  				goto out;
>> @@ -5189,14 +5190,23 @@ get_loads_dumpfile(void)
>>
>>  	initialize_2nd_bitmap(&bitmap2);
>>
>> -	if (!(phnum = get_phnum_memory()))
>> -		return FALSE;
>> -
>> -	for (i = 0; i < phnum; i++) {
>> -		if (!get_phdr_memory(i, &load))
>> +	if (info->flag_vmcore_estimate) {
>> +		phnum = num_pt_loads;
>> +	} else {
>> +		if (!(phnum = get_phnum_memory()))
>>  			return FALSE;
>> -		if (load.p_type != PT_LOAD)
>> -			continue;
>> +	}
>> +
>> +	for (i = 0; i < num_pt_loads; i++) {
>> +		if (info->flag_vmcore_estimate) {
>> +			get_phdr_load(i , &load);
>> +		} else {
>> +			if (!get_phdr_memory(i, &load))
>> +				return FALSE;
>> +
>> +			if (load.p_type != PT_LOAD)
>> +				continue;
>> +		}
>>
>>  		pfn_start = paddr_to_pfn(load.p_paddr);
>>  		pfn_end   = paddr_to_pfn(load.p_paddr + load.p_memsz);
>> @@ -5734,17 +5744,26 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
>>  	off_seg_load    = info->offset_load_dumpfile;
>>  	cd_page->offset = info->offset_load_dumpfile;
>>
>> -	if (!(phnum = get_phnum_memory()))
>> -		return FALSE;
>> +	if (info->flag_vmcore_estimate) {
>> +		phnum = num_pt_loads;
>> +	} else {
>> +		if (!(phnum = get_phnum_memory()))
>> +			return FALSE;
>> +	}
>>
>>  	gettimeofday(&tv_start, NULL);
>>
>>  	for (i = 0; i < phnum; i++) {
>> -		if (!get_phdr_memory(i, &load))
>> -			return FALSE;
>> +		if (info->flag_vmcore_estimate) {
>> +			memset(&load, 0, sizeof(load));
>> +			get_phdr_load(i , &load);
>> +		} else {
>> +			if (!get_phdr_memory(i, &load))
>> +				return FALSE;
>>
>> -		if (load.p_type != PT_LOAD)
>> -			continue;
>> +			if (load.p_type != PT_LOAD)
>> +				continue;
>> +		}
>>
>>  		off_memory= load.p_offset;
>>  		paddr     = load.p_paddr;
>> @@ -5923,14 +5942,24 @@ get_loads_dumpfile_cyclic(void)
>>  	Elf64_Phdr load;
>>  	struct cycle cycle = {0};
>>
>> -	if (!(phnum = get_phnum_memory()))
>> -		return FALSE;
>> +	if (info->flag_vmcore_estimate) {
>> +		phnum = num_pt_loads;
>> +	} else {
>> +		if (!(phnum = get_phnum_memory()))
>> +			return FALSE;
>> +	}
>>
>>  	for (i = 0; i < phnum; i++) {
>> -		if (!get_phdr_memory(i, &load))
>> -			return FALSE;
>> -		if (load.p_type != PT_LOAD)
>> -			continue;
>> +		if (info->flag_vmcore_estimate) {
>> +			memset(&load, 0, sizeof(load) );
>> +			get_phdr_load(i , &load);
>> +		} else {
>> +			if (!get_phdr_memory(i, &load))
>> +				return FALSE;
>> +
>> +			if (load.p_type != PT_LOAD)
>> +				continue;
>> +		}
>>
>>  		pfn_start = paddr_to_pfn(load.p_paddr);
>>  		pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
>> @@ -6016,17 +6045,26 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
>>  	pfn_user = pfn_free = pfn_hwpoison = 0;
>>  	pfn_memhole = info->max_mapnr;
>>
>> -	if (!(phnum = get_phnum_memory()))
>> -		return FALSE;
>> +	if (info->flag_vmcore_estimate) {
>> +		phnum = num_pt_loads;
>> +	} else {
>> +		if (!(phnum = get_phnum_memory()))
>> +			return FALSE;
>> +	}
>>
>>  	gettimeofday(&tv_start, NULL);
>>
>>  	for (i = 0; i < phnum; i++) {
>> -		if (!get_phdr_memory(i, &load))
>> -			return FALSE;
>> +		if (info->flag_vmcore_estimate) {
>> +			memset(&load, 0, sizeof(load));
>> +			get_phdr_load(i , &load);
>> +		} else {
>> +			if (!get_phdr_memory(i, &load))
>> +				return FALSE;
>>
>> -		if (load.p_type != PT_LOAD)
>> -			continue;
>> +			if (load.p_type != PT_LOAD)
>> +				continue;
>> +		}
>>
>>  		off_memory= load.p_offset;
>>  		paddr = load.p_paddr;
>> @@ -8929,6 +8967,13 @@ check_param_for_creating_dumpfile(int argc, char *argv[])
>>  		 */
>>  		info->name_memory   = argv[optind];
>>
>> +	} else if ((argc == optind + 2) && info->flag_vmcore_estimate) {
>> +		/*
>> +		 * Parameters for get the /proc/kcore to estimate
>> +		 * the size of dumped vmcore
>> +		 */
>> +		info->name_memory   = argv[optind];
>> +		info->name_dumpfile = argv[optind+1];
>>  	} else
>>  		return FALSE;
>>
>> @@ -9011,6 +9056,332 @@ out:
>>  	return free_size;
>>  }
>>
>> +struct memory_range {
>> +        unsigned long long start, end;
>> +};
>> +
>> +#define CRASH_RESERVED_MEM_NR   8
>> +static struct memory_range crash_reserved_mem[CRASH_RESERVED_MEM_NR];
>> +static int crash_reserved_mem_nr;
>> +
>> +/*
>> + * iomem_for_each_line()
>> + *
>> + * Iterate over each line in the file returned by proc_iomem(). If match is
>> + * NULL or if the line matches with our match-pattern then call the
>> + * callback if non-NULL.
>> + *
>> + * Return the number of lines matched.
>> + */
>> +int iomem_for_each_line(char *match,
>> +			      int (*callback)(void *data,
>> +					      int nr,
>> +					      char *str,
>> +					      unsigned long base,
>> +					      unsigned long length),
>> +			      void *data)
>> +{
>> +	const char iomem[] = "/proc/iomem";
>> +	char line[MAX_LINE];
>> +	FILE *fp;
>> +	unsigned long long start, end, size;
>> +	char *str;
>> +	int consumed;
>> +	int count;
>> +	int nr = 0;
>> +
>> +	fp = fopen(iomem, "r");
>> +	if (!fp) {
>> +		ERRMSG("Cannot open %s\n", iomem);
>> +		exit(1);
>> +	}
>> +
>> +	while(fgets(line, sizeof(line), fp) != 0) {
>> +		count = sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed);
>> +		if (count != 2)
>> +			continue;
>> +		str = line + consumed;
>> +		size = end - start + 1;
>> +		if (!match || memcmp(str, match, strlen(match)) == 0) {
>> +			if (callback
>> +			    && callback(data, nr, str, start, size) < 0) {
>> +				break;
>> +			}
>> +			nr++;
>> +		}
>> +	}
>> +
>> +	fclose(fp);
>> +
>> +	return nr;
>> +}
>> +
>> +static int crashkernel_mem_callback(void *data, int nr,
>> +                                          char *str,
>> +                                          unsigned long base,
>> +                                          unsigned long length)
>> +{
>> +        if (nr >= CRASH_RESERVED_MEM_NR)
>> +                return 1;
>> +
>> +        crash_reserved_mem[nr].start = base;
>> +        crash_reserved_mem[nr].end   = base + length - 1;
>> +        return 0;
>> +}
>> +
>> +int is_crashkernel_mem_reserved(void)
>> +{
>> +        int ret;
>> +
>> +        ret = iomem_for_each_line("Crash kernel\n",
>> +                                        crashkernel_mem_callback, NULL);
>> +        crash_reserved_mem_nr = ret;
>> +
>> +        return !!crash_reserved_mem_nr;
>> +}
>> +
>> +/* Returns the physical address of start of crash notes buffer for a kernel. */
>> +static int get_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
>> +{
>> +	char line[MAX_LINE];
>> +	int count;
>> +	FILE *fp;
>> +	unsigned long long temp, temp2;
>> +
>> +	*addr = 0;
>> +	*len = 0;
>> +
>> +	if (!(fp = fopen("/sys/kernel/vmcoreinfo", "r")))
>> +		return -1;
>> +
>> +	if (!fgets(line, sizeof(line), fp))
>> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
>> +	count = sscanf(line, "%Lx %Lx", &temp, &temp2);
>> +	if (count != 2)
>> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
>> +
>> +	*addr = (uint64_t) temp;
>> +	*len = (uint64_t) temp2;
>> +
>> +	fclose(fp);
>> +	return 0;
>> +}
>> +
>> +
>> +static int exclude_segment(struct pt_load_segment **pt_loads, unsigned int	*num_pt_loads, uint64_t start,
>uint64_t end)
>> +{
>> +        int i, j, tidx = -1;
>> +	unsigned long long	vstart, vend, kvstart, kvend;
>> +        struct pt_load_segment temp_seg = {0};
>> +	kvstart = (ulong)start | PAGE_OFFSET;
>> +	kvend = (ulong)end | PAGE_OFFSET;
>> +	unsigned long size;
>> +
>> +        for (i = 0; i < (*num_pt_loads); i++) {
>> +                vstart = (*pt_loads)[i].virt_start;
>> +                vend = (*pt_loads)[i].virt_end;
>> +                if (kvstart <  vend && kvend > vstart) {
>> +                        if (kvstart != vstart && kvend != vend) {
>> +				/* Split load segment */
>> +				temp_seg.phys_start = end +1;
>> +				temp_seg.phys_end = (*pt_loads)[i].phys_end;
>> +				temp_seg.virt_start = kvend + 1;
>> +				temp_seg.virt_end = vend;
>> +				temp_seg.file_offset = (*pt_loads)[i].file_offset + temp_seg.virt_start -
>(*pt_loads)[i].virt_start;
>> +
>> +				(*pt_loads)[i].virt_end = kvstart - 1;
>> +				(*pt_loads)[i].phys_end =  start -1;
>> +
>> +				tidx = i+1;
>> +                        } else if (kvstart != vstart) {
>> +				(*pt_loads)[i].phys_end = start - 1;
>> +				(*pt_loads)[i].virt_end = kvstart - 1;
>> +                        } else {
>> +				(*pt_loads)[i].phys_start = end + 1;
>> +				(*pt_loads)[i].virt_start = kvend + 1;
>> +                        }
>> +                }
>> +        }
>> +        /* Insert split load segment, if any. */
>> +	if (tidx >= 0) {
>> +		size = (*num_pt_loads + 1) * sizeof((*pt_loads)[0]);
>> +		(*pt_loads) = realloc((*pt_loads), size);
>> +		if  (!(*pt_loads) ) {
>> +		    ERRMSG("Cannot realloc %ld bytes: %s\n",
>> +		            size + 0UL, strerror(errno));
>> +			exit(1);
>> +		}
>> +		for (j = (*num_pt_loads - 1); j >= tidx; j--)
>> +		        (*pt_loads)[j+1] = (*pt_loads)[j];
>> +		(*pt_loads)[tidx] = temp_seg;
>> +		(*num_pt_loads)++;
>> +        }
>> +        return 0;
>> +}
>> +
>> +static int
>> +process_dump_load(struct pt_load_segment	*pls)
>> +{
>> +	unsigned long long paddr;
>> +
>> +	paddr = vaddr_to_paddr(pls->virt_start);
>> +	pls->phys_start  = paddr;
>> +	pls->phys_end    = paddr + (pls->virt_end - pls->virt_start);
>> +	MSG("process_dump_load\n");
>> +	MSG("  phys_start : %llx\n", pls->phys_start);
>> +	MSG("  phys_end   : %llx\n", pls->phys_end);
>> +	MSG("  virt_start : %llx\n", pls->virt_start);
>> +	MSG("  virt_end   : %llx\n", pls->virt_end);
>> +
>> +	return TRUE;
>> +}
>> +
>> +int get_kcore_dump_loads()
>> +{
>> +	struct pt_load_segment	*pls;
>> +	int i, j, loads=0;
>> +	unsigned long long paddr;
>> +
>> +	for (i = 0; i < num_pt_loads; ++i) {
>> +		struct pt_load_segment *p = &pt_loads[i];
>> +		if (is_vmalloc_addr(p->virt_start))
>> +			continue;
>> +		loads++;
>> +	}
>> +
>> +	pls = calloc(sizeof(struct pt_load_segment), j);
>> +	if (pls == NULL) {
>> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
>> +		    strerror(errno));
>> +		return FALSE;
>> +	}
>> +
>> +	for (i = 0, j=0; i < num_pt_loads; ++i) {
>> +		struct pt_load_segment *p = &pt_loads[i];
>> +		if (is_vmalloc_addr(p->virt_start))
>> +			continue;
>> +		if (j >= loads)
>> +			return FALSE;
>> +
>> +		if (j == 0) {
>> +			offset_pt_load_memory = p->file_offset;
>> +			if (offset_pt_load_memory == 0) {
>> +				ERRMSG("Can't get the offset of page data.\n");
>> +				return FALSE;
>> +			}
>> +		}
>> +
>> +		pls[j] = *p;
>> +		process_dump_load(&pls[j]);
>> +		j++;
>> +	}
>> +
>> +	free(pt_loads);
>> +	pt_loads = pls;
>> +	num_pt_loads = loads;
>> +
>> +	for (i=0; i<crash_reserved_mem_nr; i++)
>> +	{
>> +		exclude_segment(&pt_loads, &num_pt_loads, crash_reserved_mem[i].start, crash_reserved_mem[i].end);
>> +	}
>> +
>> +	max_file_offset = 0;
>> +	for (i = 0; i < num_pt_loads; ++i) {
>> +		struct pt_load_segment *p = &pt_loads[i];
>> +		max_file_offset = MAX(max_file_offset,
>> +				      p->file_offset + p->phys_end - p->phys_start);
>> +	}
>> +
>> +	for (i = 0; i < num_pt_loads; ++i) {
>> +		struct pt_load_segment *p = &pt_loads[i];
>> +		MSG("LOAD (%d)\n", i);
>> +		MSG("  phys_start : %llx\n", p->phys_start);
>> +		MSG("  phys_end   : %llx\n", p->phys_end);
>> +		MSG("  virt_start : %llx\n", p->virt_start);
>> +		MSG("  virt_end   : %llx\n", p->virt_end);
>> +	}
>> +
>> +	return TRUE;
>> +}
>> +
>> +int get_page_offset()
>> +{
>> +	struct utsname utsname;
>> +	if (uname(&utsname)) {
>> +		ERRMSG("Cannot get name and information about current kernel : %s", strerror(errno));
>> +		return FALSE;
>> +	}
>> +
>> +	info->kernel_version = get_kernel_version(utsname.release);
>> +	get_versiondep_info_x86_64();
>> +	return TRUE;
>> +}
>> +
>> +int vmcore_estimate(void)
>> +{
>> +	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
>> +	int num_retry, status;
>> +
>> +	if (!is_crashkernel_mem_reserved()) {
>> +		ERRMSG("No memory is reserved for crashkenrel!\n");
>> +		exit(1);
>> +	}
>> +
>> +	get_page_offset();
>> +
>> +#if 1
>> +	if (!open_dump_memory())
>> +		return FALSE;
>> +#endif
>> +
>> +	if (info->flag_vmcore_estimate) {
>> +		if (!get_elf_loads(info->fd_memory, info->name_memory))
>> +			return FALSE;
>> +	}
>> +
>> +	if (get_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
>> +		return FALSE;
>> +
>> +	if (set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
>> +		return FALSE;
>> +
>> +	if (!get_kcore_dump_loads())
>> +		return FALSE;
>> +
>> +#if 1
>> +	if (!initial())
>> +		return FALSE;
>> +#endif
>> +
>> +retry:
>> +	if (!create_dump_bitmap())
>> +		return FALSE;
>> +
>> +	if ((status = writeout_dumpfile()) == FALSE)
>> +		return FALSE;
>> +
>> +	if (status == NOSPACE) {
>> +		/*
>> +		 * If specifying the other dump_level, makedumpfile tries
>> +		 * to create a dumpfile with it again.
>> +		 */
>> +		num_retry++;
>> +		if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
>> +			return FALSE;
>> +		MSG("Retry to create a dumpfile by dump_level(%d).\n",
>> +		    info->dump_level);
>> +		if (!delete_dumpfile())
>> +			return FALSE;
>> +		goto retry;
>> +	}
>> +	print_report();
>> +
>> +	clear_filter_info();
>> +	if (!close_files_for_creating_dumpfile())
>> +		return FALSE;
>> +
>> +	return TRUE;
>> +}
>>
>>  /*
>>   * Choose the lesser value of the two below as the size of cyclic buffer.
>> @@ -9063,6 +9434,7 @@ static struct option longopts[] = {
>>  	{"cyclic-buffer", required_argument, NULL, OPT_CYCLIC_BUFFER},
>>  	{"eppic", required_argument, NULL, OPT_EPPIC},
>>  	{"non-mmap", no_argument, NULL, OPT_NON_MMAP},
>> +	{"vmcore-estimate", no_argument, NULL, OPT_VMCORE_ESTIMATE},
>>  	{0, 0, 0, 0}
>>  };
>>
>> @@ -9154,6 +9526,9 @@ main(int argc, char *argv[])
>>  		case OPT_DUMP_DMESG:
>>  			info->flag_dmesg = 1;
>>  			break;
>> +		case OPT_VMCORE_ESTIMATE:
>> +			info->flag_vmcore_estimate = 1;
>> +			break;
>>  		case OPT_COMPRESS_SNAPPY:
>>  			info->flag_compress = DUMP_DH_COMPRESSED_SNAPPY;
>>  			break;
>> @@ -9294,6 +9669,19 @@ main(int argc, char *argv[])
>>
>>  		MSG("\n");
>>  		MSG("The dmesg log is saved to %s.\n", info->name_dumpfile);
>> +	} else if (info->flag_vmcore_estimate) {
>> +#if 1
>> +		if (!check_param_for_creating_dumpfile(argc, argv)) {
>> +			MSG("Commandline parameter is invalid.\n");
>> +			MSG("Try `makedumpfile --help' for more information.\n");
>> +			goto out;
>> +		}
>> +#endif
>> +		if (!vmcore_estimate())
>> +			goto out;
>> +
>> +		MSG("\n");
>> +		MSG("vmcore size estimate successfully.\n");
>>  	} else {
>>  		if (!check_param_for_creating_dumpfile(argc, argv)) {
>>  			MSG("Commandline parameter is invalid.\n");
>> diff --git a/makedumpfile.h b/makedumpfile.h
>> index 9402f05..c401337 100644
>> --- a/makedumpfile.h
>> +++ b/makedumpfile.h
>> @@ -216,6 +216,9 @@ isAnon(unsigned long mapping)
>>  #define FILENAME_STDOUT		"STDOUT"
>>  #define MAP_REGION		(4096*1024)
>>
>> +#define MAX_LINE	160
>> +
>> +
>>  /*
>>   * Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
>>   */
>> @@ -910,6 +913,7 @@ struct DumpInfo {
>>  	int		flag_force;	     /* overwrite existing stuff */
>>  	int		flag_exclude_xen_dom;/* exclude Domain-U from xen-kdump */
>>  	int             flag_dmesg;          /* dump the dmesg log out of the vmcore file */
>> +	int             flag_vmcore_estimate;          /* estimate the size  of vmcore in current system */
>>  	int		flag_use_printk_log; /* did we read printk_log symbol name? */
>>  	int		flag_nospace;	     /* the flag of "No space on device" error */
>>  	int		flag_vmemmap;        /* kernel supports vmemmap address space */
>> @@ -1764,6 +1768,7 @@ struct elf_prstatus {
>>  #define OPT_CYCLIC_BUFFER       OPT_START+11
>>  #define OPT_EPPIC               OPT_START+12
>>  #define OPT_NON_MMAP            OPT_START+13
>> +#define OPT_VMCORE_ESTIMATE            OPT_START+14
>>
>>  /*
>>   * Function Prototype.
>> --
>> 1.8.5.3
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-20  1:07   ` Atsushi Kumagai
@ 2014-06-20  1:58     ` bhe
  2014-06-20  2:33     ` bhe
  2014-06-23 12:57     ` Vivek Goyal
  2 siblings, 0 replies; 16+ messages in thread
From: bhe @ 2014-06-20  1:58 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, vgoyal

On 06/20/14 at 01:07am, Atsushi Kumagai wrote:
> Hello Baoquan,
> 
> >Forget to mention only x86-64 is processed in this patch.
> >
> >On 06/11/14 at 08:39pm, Baoquan He wrote:
> >> User want to get a rough estimate of vmcore size, then they can decide
> >> how much storage space is reserved for vmcore dumping. This can help them
> >> to deploy their machines better, possibly hundreds of machines.
> 
> You suggested this feature before, but I don't still agree with this.
> 
> No one can guarantee that the vmcore size will be below the estimated
> size every time. However, if makedumpfile provides "--vmcore-estimate",
> some users may trust it completely and disk overflow might happen. 
> Ideally, users should prepare the disk which can store the possible
> maximum size of vmcore. Of course they can reduce the disk size on their
> responsibility, but makedumpfile can't help it as official feature.

Hi Atsushi,

Thanks for your comments.

In fact no one need guarantee the vmcore size will be equal to the
estimated size. As I said in this patch log, it's a rough estimate size.
E.g I have a machine with 100G memory, with the "-d 31", the vmcore size
will be about 400M. If "-d0" the vmcore size will be 2G. We know this
things very well, however can't guarantee all system admin know this
well too. Some of them manage tens of or hundreds of machines, they need
to know possibly how much disk space need be reserved, surely this
reserved disk space will be bigger than the estimated vmcore size so
that no disk overflow happen and also no too much disk space is wasted.

I was persuaded by this fact, by the dumping size of /proc/kcore, it can
give user a very clear range, at lease at order of magnitude. E.g if the
dumping size of kcore is 500M with a configured dump-level, user only
need reserve 1G disk space to make sure no disk overflow happens.

In this patch, I only get the "System RAM" memory excluding reserved
crashkernel memory region, it's totally the same as vmcore memory
region. If at this time crash happened, I believe the difference won't
exceed 50%. To be honest, 50% is a conservative number. On my test
machine, the difference is about 10%. I have to admit it's a dynamic
number.

Recently several different users want this RFE, I think this can help
user much.

Thanks
Baoquan



> 
> 
> Thanks
> Atsushi Kumagai
> 
> >> In this draft patch, a new configuration option is added,
> >>     "--vmcore-estimate"
> >> User can execute below command to get a dumped kcore. Since kcore is a
> >> elf file to map the whole memory of current kernel, it's  equal to the
> >> memory of crash kernel though it's not exact. Content of kcore is dynamic
> >> though /proc/vmcore is fixed once crash happened. But for vmcore size
> >> estimate, it is better enough.
> >>
> >> sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
> >>
> >> Questions:
> >> 1. Or we can get the dumpable page numbers only, then calculate the estimated
> >> vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
> >> lzo dump, we assume the compression ratio is 45%, then the estimate size is
> >> equal to: (dumpable page numbers) * 4096* 45%.
> >>
> >> This is easier but too rough, does anybody like this better compared with the
> >> real dumping implemented in this draft patch.
> >>
> >> 2. If dump the /proc/kcore, there's still a bug I can't fixed. When elf dump,
> >> in function write_elf_header()  it will pre-calculate a num_loads_dumpfile which
> >> is the number of program segment which will be dumped. However during dumping,
> >> the content of /proc/kcore is dynamic, the final num_loads_dumpfile may change
> >> when call write_elf_pages_cyclic/write_elf_pages(). This will cause the final
> >> dumped elf file has a bad file format. When you execute
> >> "readelf -a /var/crash/kcore-dump", you will be a little surprised.
> >>
> >> 3. This is not a formal patch, if the final solution is decided, I will post a
> >> patch, maybe a patchset. If you have suggestions about the code or implementation,
> >> please post your comment.
> >>
> >> Signed-off-by: Baoquan He <bhe@redhat.com>
> >> ---
> >>  elf_info.c     | 136 ++++++++++++++++--
> >>  elf_info.h     |  17 +++
> >>  makedumpfile.c | 438 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >>  makedumpfile.h |   5 +
> >>  4 files changed, 560 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/elf_info.c b/elf_info.c
> >> index b277f69..1b05ad1 100644
> >> --- a/elf_info.c
> >> +++ b/elf_info.c
> >> @@ -36,16 +36,9 @@
> >>
> >>  #define XEN_ELFNOTE_CRASH_INFO	(0x1000001)
> >>
> >> -struct pt_load_segment {
> >> -	off_t			file_offset;
> >> -	unsigned long long	phys_start;
> >> -	unsigned long long	phys_end;
> >> -	unsigned long long	virt_start;
> >> -	unsigned long long	virt_end;
> >> -};
> >>
> >>  static int			nr_cpus;             /* number of cpu */
> >> -static off_t			max_file_offset;
> >> +off_t			max_file_offset;
> >>
> >>  /*
> >>   * File information about /proc/vmcore:
> >> @@ -60,9 +53,9 @@ static int			flags_memory;
> >>  /*
> >>   * PT_LOAD information about /proc/vmcore:
> >>   */
> >> -static unsigned int		num_pt_loads;
> >> -static struct pt_load_segment	*pt_loads;
> >> -static off_t			offset_pt_load_memory;
> >> +unsigned int		num_pt_loads;
> >> +struct pt_load_segment	*pt_loads;
> >> +off_t			offset_pt_load_memory;
> >>
> >>  /*
> >>   * PT_NOTE information about /proc/vmcore:
> >> @@ -395,7 +388,49 @@ get_pt_note_info(void)
> >>  	return TRUE;
> >>  }
> >>
> >> +#define UNINITIALIZED  ((ulong)(-1))
> >>
> >> +#define SEEK_ERROR       (-1)
> >> +#define READ_ERROR       (-2)
> >> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
> >> +{
> >> +	int i;
> >> +	ulong kvaddr;
> >> +	Elf64_Nhdr *note64;
> >> +	off_t offset;
> >> +	char note[MAX_SIZE_NHDR];
> >> +	int size_desc;
> >> +	off_t offset_desc;
> >> +
> >> +	offset = UNINITIALIZED;
> >> +	kvaddr = (ulong)vmcoreinfo_addr | PAGE_OFFSET;
> >> +
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		if ((kvaddr >= p->virt_start) && (kvaddr < p->virt_end)) {
> >> +			offset = (off_t)(kvaddr - p->virt_start) +
> >> +			(off_t)p->file_offset;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	if (offset == UNINITIALIZED)
> >> +		return SEEK_ERROR;
> >> +
> >> +        if (lseek(fd_memory, offset, SEEK_SET) != offset)
> >> +		perror("lseek");
> >> +
> >> +	if (read(fd_memory, note, MAX_SIZE_NHDR) != MAX_SIZE_NHDR)
> >> +		return READ_ERROR;
> >> +
> >> +	note64 = (Elf64_Nhdr *)note;
> >> +	size_desc   = note_descsz(note);
> >> +	offset_desc = offset + offset_note_desc(note);
> >> +
> >> +	set_vmcoreinfo(offset_desc, size_desc);
> >> +
> >> +	return 0;
> >> +}
> >>  /*
> >>   * External functions.
> >>   */
> >> @@ -681,6 +716,55 @@ get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr)
> >>  	return TRUE;
> >>  }
> >>
> >> +int
> >> +get_elf_loads(int fd, char *filename)
> >> +{
> >> +	int i, j, phnum, elf_format;
> >> +	Elf64_Phdr phdr;
> >> +
> >> +	/*
> >> +	 * Check ELF64 or ELF32.
> >> +	 */
> >> +	elf_format = check_elf_format(fd, filename, &phnum, &num_pt_loads);
> >> +	if (elf_format == ELF64)
> >> +		flags_memory |= MEMORY_ELF64;
> >> +	else if (elf_format != ELF32)
> >> +		return FALSE;
> >> +
> >> +	if (!num_pt_loads) {
> >> +		ERRMSG("Can't get the number of PT_LOAD.\n");
> >> +		return FALSE;
> >> +	}
> >> +
> >> +	/*
> >> +	 * The below file information will be used as /proc/vmcore.
> >> +	 */
> >> +	fd_memory   = fd;
> >> +	name_memory = filename;
> >> +
> >> +	pt_loads = calloc(sizeof(struct pt_load_segment), num_pt_loads);
> >> +	if (pt_loads == NULL) {
> >> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
> >> +		    strerror(errno));
> >> +		return FALSE;
> >> +	}
> >> +	for (i = 0, j = 0; i < phnum; i++) {
> >> +		if (!get_phdr_memory(i, &phdr))
> >> +			return FALSE;
> >> +
> >> +		if (phdr.p_type != PT_LOAD)
> >> +			continue;
> >> +
> >> +		if (j >= num_pt_loads)
> >> +			return FALSE;
> >> +		if(!dump_Elf_load(&phdr, j))
> >> +			return FALSE;
> >> +		j++;
> >> +	}
> >> +
> >> +	return TRUE;
> >> +}
> >> +
> >>  /*
> >>   * Get ELF information about /proc/vmcore.
> >>   */
> >> @@ -826,6 +910,36 @@ get_phdr_memory(int index, Elf64_Phdr *phdr)
> >>  	return TRUE;
> >>  }
> >>
> >> +int
> >> +get_phdr_load(int index, Elf64_Phdr *phdr)
> >> +{
> >> +	Elf32_Phdr phdr32;
> >> +
> >> +	if (is_elf64_memory()) { /* ELF64 */
> >> +		phdr->p_type = PT_LOAD;
> >> +		phdr->p_vaddr = pt_loads[index].virt_start;
> >> +		phdr->p_paddr = pt_loads[index].phys_start;
> >> +		phdr->p_memsz  = pt_loads[index].phys_end - pt_loads[index].phys_start;
> >> +		phdr->p_filesz = phdr->p_memsz;
> >> +		phdr->p_offset = pt_loads[index].file_offset;
> >> +	} else {
> >> +		if (!get_elf32_phdr(fd_memory, name_memory, index, &phdr32)) {
> >> +			ERRMSG("Can't find Phdr %d.\n", index);
> >> +			return FALSE;
> >> +		}
> >> +		memset(phdr, 0, sizeof(Elf64_Phdr));
> >> +		phdr->p_type   = phdr32.p_type;
> >> +		phdr->p_flags  = phdr32.p_flags;
> >> +		phdr->p_offset = phdr32.p_offset;
> >> +		phdr->p_vaddr  = phdr32.p_vaddr;
> >> +		phdr->p_paddr  = phdr32.p_paddr;
> >> +		phdr->p_filesz = phdr32.p_filesz;
> >> +		phdr->p_memsz  = phdr32.p_memsz;
> >> +		phdr->p_align  = phdr32.p_align;
> >> +	}
> >> +	return TRUE;
> >> +}
> >> +
> >>  off_t
> >>  get_offset_pt_load_memory(void)
> >>  {
> >> diff --git a/elf_info.h b/elf_info.h
> >> index 801faff..0c67d74 100644
> >> --- a/elf_info.h
> >> +++ b/elf_info.h
> >> @@ -27,6 +27,19 @@
> >>
> >>  #define MAX_SIZE_NHDR	MAX(sizeof(Elf64_Nhdr), sizeof(Elf32_Nhdr))
> >>
> >> +struct pt_load_segment {
> >> +	off_t			file_offset;
> >> +	unsigned long long	phys_start;
> >> +	unsigned long long	phys_end;
> >> +	unsigned long long	virt_start;
> >> +	unsigned long long	virt_end;
> >> +};
> >> +
> >> +extern off_t			max_file_offset;
> >> +extern unsigned int		num_pt_loads;
> >> +extern struct pt_load_segment	*pt_loads;
> >> +
> >> +extern off_t			offset_pt_load_memory;
> >>
> >>  off_t paddr_to_offset(unsigned long long paddr);
> >>  off_t paddr_to_offset2(unsigned long long paddr, off_t hint);
> >> @@ -44,11 +57,14 @@ int get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr);
> >>  int get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr);
> >>  int get_elf_info(int fd, char *filename);
> >>  void free_elf_info(void);
> >> +int get_elf_loads(int fd, char *filename);
> >>
> >>  int is_elf64_memory(void);
> >>  int is_xen_memory(void);
> >>
> >>  int get_phnum_memory(void);
> >> +
> >> +int get_phdr_load(int index, Elf64_Phdr *phdr);
> >>  int get_phdr_memory(int index, Elf64_Phdr *phdr);
> >>  off_t get_offset_pt_load_memory(void);
> >>  int get_pt_load(int idx,
> >> @@ -68,6 +84,7 @@ void get_pt_note(off_t *offset, unsigned long *size);
> >>  int has_vmcoreinfo(void);
> >>  void set_vmcoreinfo(off_t offset, unsigned long size);
> >>  void get_vmcoreinfo(off_t *offset, unsigned long *size);
> >> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len);
> >>
> >>  int has_vmcoreinfo_xen(void);
> >>  void get_vmcoreinfo_xen(off_t *offset, unsigned long *size);
> >> diff --git a/makedumpfile.c b/makedumpfile.c
> >> index 34db997..ac02747 100644
> >> --- a/makedumpfile.c
> >> +++ b/makedumpfile.c
> >> @@ -5146,6 +5146,7 @@ create_dump_bitmap(void)
> >>
> >>  	if (info->flag_cyclic) {
> >>
> >> +		printf("create_dump_bitmap flag_cyclic\n");
> >>  		if (info->flag_elf_dumpfile) {
> >>  			if (!prepare_bitmap_buffer_cyclic())
> >>  				goto out;
> >> @@ -5189,14 +5190,23 @@ get_loads_dumpfile(void)
> >>
> >>  	initialize_2nd_bitmap(&bitmap2);
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> -
> >> -	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >>  			return FALSE;
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +	}
> >> +
> >> +	for (i = 0; i < num_pt_loads; i++) {
> >> +		if (info->flag_vmcore_estimate) {
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >> +
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		pfn_start = paddr_to_pfn(load.p_paddr);
> >>  		pfn_end   = paddr_to_pfn(load.p_paddr + load.p_memsz);
> >> @@ -5734,17 +5744,26 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
> >>  	off_seg_load    = info->offset_load_dumpfile;
> >>  	cd_page->offset = info->offset_load_dumpfile;
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >> +			return FALSE;
> >> +	}
> >>
> >>  	gettimeofday(&tv_start, NULL);
> >>
> >>  	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> -			return FALSE;
> >> +		if (info->flag_vmcore_estimate) {
> >> +			memset(&load, 0, sizeof(load));
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >>
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		off_memory= load.p_offset;
> >>  		paddr     = load.p_paddr;
> >> @@ -5923,14 +5942,24 @@ get_loads_dumpfile_cyclic(void)
> >>  	Elf64_Phdr load;
> >>  	struct cycle cycle = {0};
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >> +			return FALSE;
> >> +	}
> >>
> >>  	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> -			return FALSE;
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +		if (info->flag_vmcore_estimate) {
> >> +			memset(&load, 0, sizeof(load) );
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >> +
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		pfn_start = paddr_to_pfn(load.p_paddr);
> >>  		pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
> >> @@ -6016,17 +6045,26 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
> >>  	pfn_user = pfn_free = pfn_hwpoison = 0;
> >>  	pfn_memhole = info->max_mapnr;
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >> +			return FALSE;
> >> +	}
> >>
> >>  	gettimeofday(&tv_start, NULL);
> >>
> >>  	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> -			return FALSE;
> >> +		if (info->flag_vmcore_estimate) {
> >> +			memset(&load, 0, sizeof(load));
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >>
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		off_memory= load.p_offset;
> >>  		paddr = load.p_paddr;
> >> @@ -8929,6 +8967,13 @@ check_param_for_creating_dumpfile(int argc, char *argv[])
> >>  		 */
> >>  		info->name_memory   = argv[optind];
> >>
> >> +	} else if ((argc == optind + 2) && info->flag_vmcore_estimate) {
> >> +		/*
> >> +		 * Parameters for get the /proc/kcore to estimate
> >> +		 * the size of dumped vmcore
> >> +		 */
> >> +		info->name_memory   = argv[optind];
> >> +		info->name_dumpfile = argv[optind+1];
> >>  	} else
> >>  		return FALSE;
> >>
> >> @@ -9011,6 +9056,332 @@ out:
> >>  	return free_size;
> >>  }
> >>
> >> +struct memory_range {
> >> +        unsigned long long start, end;
> >> +};
> >> +
> >> +#define CRASH_RESERVED_MEM_NR   8
> >> +static struct memory_range crash_reserved_mem[CRASH_RESERVED_MEM_NR];
> >> +static int crash_reserved_mem_nr;
> >> +
> >> +/*
> >> + * iomem_for_each_line()
> >> + *
> >> + * Iterate over each line in the file returned by proc_iomem(). If match is
> >> + * NULL or if the line matches with our match-pattern then call the
> >> + * callback if non-NULL.
> >> + *
> >> + * Return the number of lines matched.
> >> + */
> >> +int iomem_for_each_line(char *match,
> >> +			      int (*callback)(void *data,
> >> +					      int nr,
> >> +					      char *str,
> >> +					      unsigned long base,
> >> +					      unsigned long length),
> >> +			      void *data)
> >> +{
> >> +	const char iomem[] = "/proc/iomem";
> >> +	char line[MAX_LINE];
> >> +	FILE *fp;
> >> +	unsigned long long start, end, size;
> >> +	char *str;
> >> +	int consumed;
> >> +	int count;
> >> +	int nr = 0;
> >> +
> >> +	fp = fopen(iomem, "r");
> >> +	if (!fp) {
> >> +		ERRMSG("Cannot open %s\n", iomem);
> >> +		exit(1);
> >> +	}
> >> +
> >> +	while(fgets(line, sizeof(line), fp) != 0) {
> >> +		count = sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed);
> >> +		if (count != 2)
> >> +			continue;
> >> +		str = line + consumed;
> >> +		size = end - start + 1;
> >> +		if (!match || memcmp(str, match, strlen(match)) == 0) {
> >> +			if (callback
> >> +			    && callback(data, nr, str, start, size) < 0) {
> >> +				break;
> >> +			}
> >> +			nr++;
> >> +		}
> >> +	}
> >> +
> >> +	fclose(fp);
> >> +
> >> +	return nr;
> >> +}
> >> +
> >> +static int crashkernel_mem_callback(void *data, int nr,
> >> +                                          char *str,
> >> +                                          unsigned long base,
> >> +                                          unsigned long length)
> >> +{
> >> +        if (nr >= CRASH_RESERVED_MEM_NR)
> >> +                return 1;
> >> +
> >> +        crash_reserved_mem[nr].start = base;
> >> +        crash_reserved_mem[nr].end   = base + length - 1;
> >> +        return 0;
> >> +}
> >> +
> >> +int is_crashkernel_mem_reserved(void)
> >> +{
> >> +        int ret;
> >> +
> >> +        ret = iomem_for_each_line("Crash kernel\n",
> >> +                                        crashkernel_mem_callback, NULL);
> >> +        crash_reserved_mem_nr = ret;
> >> +
> >> +        return !!crash_reserved_mem_nr;
> >> +}
> >> +
> >> +/* Returns the physical address of start of crash notes buffer for a kernel. */
> >> +static int get_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
> >> +{
> >> +	char line[MAX_LINE];
> >> +	int count;
> >> +	FILE *fp;
> >> +	unsigned long long temp, temp2;
> >> +
> >> +	*addr = 0;
> >> +	*len = 0;
> >> +
> >> +	if (!(fp = fopen("/sys/kernel/vmcoreinfo", "r")))
> >> +		return -1;
> >> +
> >> +	if (!fgets(line, sizeof(line), fp))
> >> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
> >> +	count = sscanf(line, "%Lx %Lx", &temp, &temp2);
> >> +	if (count != 2)
> >> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
> >> +
> >> +	*addr = (uint64_t) temp;
> >> +	*len = (uint64_t) temp2;
> >> +
> >> +	fclose(fp);
> >> +	return 0;
> >> +}
> >> +
> >> +
> >> +static int exclude_segment(struct pt_load_segment **pt_loads, unsigned int	*num_pt_loads, uint64_t start,
> >uint64_t end)
> >> +{
> >> +        int i, j, tidx = -1;
> >> +	unsigned long long	vstart, vend, kvstart, kvend;
> >> +        struct pt_load_segment temp_seg = {0};
> >> +	kvstart = (ulong)start | PAGE_OFFSET;
> >> +	kvend = (ulong)end | PAGE_OFFSET;
> >> +	unsigned long size;
> >> +
> >> +        for (i = 0; i < (*num_pt_loads); i++) {
> >> +                vstart = (*pt_loads)[i].virt_start;
> >> +                vend = (*pt_loads)[i].virt_end;
> >> +                if (kvstart <  vend && kvend > vstart) {
> >> +                        if (kvstart != vstart && kvend != vend) {
> >> +				/* Split load segment */
> >> +				temp_seg.phys_start = end +1;
> >> +				temp_seg.phys_end = (*pt_loads)[i].phys_end;
> >> +				temp_seg.virt_start = kvend + 1;
> >> +				temp_seg.virt_end = vend;
> >> +				temp_seg.file_offset = (*pt_loads)[i].file_offset + temp_seg.virt_start -
> >(*pt_loads)[i].virt_start;
> >> +
> >> +				(*pt_loads)[i].virt_end = kvstart - 1;
> >> +				(*pt_loads)[i].phys_end =  start -1;
> >> +
> >> +				tidx = i+1;
> >> +                        } else if (kvstart != vstart) {
> >> +				(*pt_loads)[i].phys_end = start - 1;
> >> +				(*pt_loads)[i].virt_end = kvstart - 1;
> >> +                        } else {
> >> +				(*pt_loads)[i].phys_start = end + 1;
> >> +				(*pt_loads)[i].virt_start = kvend + 1;
> >> +                        }
> >> +                }
> >> +        }
> >> +        /* Insert split load segment, if any. */
> >> +	if (tidx >= 0) {
> >> +		size = (*num_pt_loads + 1) * sizeof((*pt_loads)[0]);
> >> +		(*pt_loads) = realloc((*pt_loads), size);
> >> +		if  (!(*pt_loads) ) {
> >> +		    ERRMSG("Cannot realloc %ld bytes: %s\n",
> >> +		            size + 0UL, strerror(errno));
> >> +			exit(1);
> >> +		}
> >> +		for (j = (*num_pt_loads - 1); j >= tidx; j--)
> >> +		        (*pt_loads)[j+1] = (*pt_loads)[j];
> >> +		(*pt_loads)[tidx] = temp_seg;
> >> +		(*num_pt_loads)++;
> >> +        }
> >> +        return 0;
> >> +}
> >> +
> >> +static int
> >> +process_dump_load(struct pt_load_segment	*pls)
> >> +{
> >> +	unsigned long long paddr;
> >> +
> >> +	paddr = vaddr_to_paddr(pls->virt_start);
> >> +	pls->phys_start  = paddr;
> >> +	pls->phys_end    = paddr + (pls->virt_end - pls->virt_start);
> >> +	MSG("process_dump_load\n");
> >> +	MSG("  phys_start : %llx\n", pls->phys_start);
> >> +	MSG("  phys_end   : %llx\n", pls->phys_end);
> >> +	MSG("  virt_start : %llx\n", pls->virt_start);
> >> +	MSG("  virt_end   : %llx\n", pls->virt_end);
> >> +
> >> +	return TRUE;
> >> +}
> >> +
> >> +int get_kcore_dump_loads()
> >> +{
> >> +	struct pt_load_segment	*pls;
> >> +	int i, j, loads=0;
> >> +	unsigned long long paddr;
> >> +
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		if (is_vmalloc_addr(p->virt_start))
> >> +			continue;
> >> +		loads++;
> >> +	}
> >> +
> >> +	pls = calloc(sizeof(struct pt_load_segment), j);
> >> +	if (pls == NULL) {
> >> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
> >> +		    strerror(errno));
> >> +		return FALSE;
> >> +	}
> >> +
> >> +	for (i = 0, j=0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		if (is_vmalloc_addr(p->virt_start))
> >> +			continue;
> >> +		if (j >= loads)
> >> +			return FALSE;
> >> +
> >> +		if (j == 0) {
> >> +			offset_pt_load_memory = p->file_offset;
> >> +			if (offset_pt_load_memory == 0) {
> >> +				ERRMSG("Can't get the offset of page data.\n");
> >> +				return FALSE;
> >> +			}
> >> +		}
> >> +
> >> +		pls[j] = *p;
> >> +		process_dump_load(&pls[j]);
> >> +		j++;
> >> +	}
> >> +
> >> +	free(pt_loads);
> >> +	pt_loads = pls;
> >> +	num_pt_loads = loads;
> >> +
> >> +	for (i=0; i<crash_reserved_mem_nr; i++)
> >> +	{
> >> +		exclude_segment(&pt_loads, &num_pt_loads, crash_reserved_mem[i].start, crash_reserved_mem[i].end);
> >> +	}
> >> +
> >> +	max_file_offset = 0;
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		max_file_offset = MAX(max_file_offset,
> >> +				      p->file_offset + p->phys_end - p->phys_start);
> >> +	}
> >> +
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		MSG("LOAD (%d)\n", i);
> >> +		MSG("  phys_start : %llx\n", p->phys_start);
> >> +		MSG("  phys_end   : %llx\n", p->phys_end);
> >> +		MSG("  virt_start : %llx\n", p->virt_start);
> >> +		MSG("  virt_end   : %llx\n", p->virt_end);
> >> +	}
> >> +
> >> +	return TRUE;
> >> +}
> >> +
> >> +int get_page_offset()
> >> +{
> >> +	struct utsname utsname;
> >> +	if (uname(&utsname)) {
> >> +		ERRMSG("Cannot get name and information about current kernel : %s", strerror(errno));
> >> +		return FALSE;
> >> +	}
> >> +
> >> +	info->kernel_version = get_kernel_version(utsname.release);
> >> +	get_versiondep_info_x86_64();
> >> +	return TRUE;
> >> +}
> >> +
> >> +int vmcore_estimate(void)
> >> +{
> >> +	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
> >> +	int num_retry, status;
> >> +
> >> +	if (!is_crashkernel_mem_reserved()) {
> >> +		ERRMSG("No memory is reserved for crashkenrel!\n");
> >> +		exit(1);
> >> +	}
> >> +
> >> +	get_page_offset();
> >> +
> >> +#if 1
> >> +	if (!open_dump_memory())
> >> +		return FALSE;
> >> +#endif
> >> +
> >> +	if (info->flag_vmcore_estimate) {
> >> +		if (!get_elf_loads(info->fd_memory, info->name_memory))
> >> +			return FALSE;
> >> +	}
> >> +
> >> +	if (get_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> >> +		return FALSE;
> >> +
> >> +	if (set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> >> +		return FALSE;
> >> +
> >> +	if (!get_kcore_dump_loads())
> >> +		return FALSE;
> >> +
> >> +#if 1
> >> +	if (!initial())
> >> +		return FALSE;
> >> +#endif
> >> +
> >> +retry:
> >> +	if (!create_dump_bitmap())
> >> +		return FALSE;
> >> +
> >> +	if ((status = writeout_dumpfile()) == FALSE)
> >> +		return FALSE;
> >> +
> >> +	if (status == NOSPACE) {
> >> +		/*
> >> +		 * If specifying the other dump_level, makedumpfile tries
> >> +		 * to create a dumpfile with it again.
> >> +		 */
> >> +		num_retry++;
> >> +		if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
> >> +			return FALSE;
> >> +		MSG("Retry to create a dumpfile by dump_level(%d).\n",
> >> +		    info->dump_level);
> >> +		if (!delete_dumpfile())
> >> +			return FALSE;
> >> +		goto retry;
> >> +	}
> >> +	print_report();
> >> +
> >> +	clear_filter_info();
> >> +	if (!close_files_for_creating_dumpfile())
> >> +		return FALSE;
> >> +
> >> +	return TRUE;
> >> +}
> >>
> >>  /*
> >>   * Choose the lesser value of the two below as the size of cyclic buffer.
> >> @@ -9063,6 +9434,7 @@ static struct option longopts[] = {
> >>  	{"cyclic-buffer", required_argument, NULL, OPT_CYCLIC_BUFFER},
> >>  	{"eppic", required_argument, NULL, OPT_EPPIC},
> >>  	{"non-mmap", no_argument, NULL, OPT_NON_MMAP},
> >> +	{"vmcore-estimate", no_argument, NULL, OPT_VMCORE_ESTIMATE},
> >>  	{0, 0, 0, 0}
> >>  };
> >>
> >> @@ -9154,6 +9526,9 @@ main(int argc, char *argv[])
> >>  		case OPT_DUMP_DMESG:
> >>  			info->flag_dmesg = 1;
> >>  			break;
> >> +		case OPT_VMCORE_ESTIMATE:
> >> +			info->flag_vmcore_estimate = 1;
> >> +			break;
> >>  		case OPT_COMPRESS_SNAPPY:
> >>  			info->flag_compress = DUMP_DH_COMPRESSED_SNAPPY;
> >>  			break;
> >> @@ -9294,6 +9669,19 @@ main(int argc, char *argv[])
> >>
> >>  		MSG("\n");
> >>  		MSG("The dmesg log is saved to %s.\n", info->name_dumpfile);
> >> +	} else if (info->flag_vmcore_estimate) {
> >> +#if 1
> >> +		if (!check_param_for_creating_dumpfile(argc, argv)) {
> >> +			MSG("Commandline parameter is invalid.\n");
> >> +			MSG("Try `makedumpfile --help' for more information.\n");
> >> +			goto out;
> >> +		}
> >> +#endif
> >> +		if (!vmcore_estimate())
> >> +			goto out;
> >> +
> >> +		MSG("\n");
> >> +		MSG("vmcore size estimate successfully.\n");
> >>  	} else {
> >>  		if (!check_param_for_creating_dumpfile(argc, argv)) {
> >>  			MSG("Commandline parameter is invalid.\n");
> >> diff --git a/makedumpfile.h b/makedumpfile.h
> >> index 9402f05..c401337 100644
> >> --- a/makedumpfile.h
> >> +++ b/makedumpfile.h
> >> @@ -216,6 +216,9 @@ isAnon(unsigned long mapping)
> >>  #define FILENAME_STDOUT		"STDOUT"
> >>  #define MAP_REGION		(4096*1024)
> >>
> >> +#define MAX_LINE	160
> >> +
> >> +
> >>  /*
> >>   * Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
> >>   */
> >> @@ -910,6 +913,7 @@ struct DumpInfo {
> >>  	int		flag_force;	     /* overwrite existing stuff */
> >>  	int		flag_exclude_xen_dom;/* exclude Domain-U from xen-kdump */
> >>  	int             flag_dmesg;          /* dump the dmesg log out of the vmcore file */
> >> +	int             flag_vmcore_estimate;          /* estimate the size  of vmcore in current system */
> >>  	int		flag_use_printk_log; /* did we read printk_log symbol name? */
> >>  	int		flag_nospace;	     /* the flag of "No space on device" error */
> >>  	int		flag_vmemmap;        /* kernel supports vmemmap address space */
> >> @@ -1764,6 +1768,7 @@ struct elf_prstatus {
> >>  #define OPT_CYCLIC_BUFFER       OPT_START+11
> >>  #define OPT_EPPIC               OPT_START+12
> >>  #define OPT_NON_MMAP            OPT_START+13
> >> +#define OPT_VMCORE_ESTIMATE            OPT_START+14
> >>
> >>  /*
> >>   * Function Prototype.
> >> --
> >> 1.8.5.3
> >>
> >>
> >> _______________________________________________
> >> kexec mailing list
> >> kexec@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-20  1:07   ` Atsushi Kumagai
  2014-06-20  1:58     ` bhe
@ 2014-06-20  2:33     ` bhe
  2014-06-23 12:57     ` Vivek Goyal
  2 siblings, 0 replies; 16+ messages in thread
From: bhe @ 2014-06-20  2:33 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, vgoyal

On 06/20/14 at 01:07am, Atsushi Kumagai wrote:
> Hello Baoquan,
> 
> >Forget to mention only x86-64 is processed in this patch.
> >
> >On 06/11/14 at 08:39pm, Baoquan He wrote:
> >> User want to get a rough estimate of vmcore size, then they can decide
> >> how much storage space is reserved for vmcore dumping. This can help them
> >> to deploy their machines better, possibly hundreds of machines.
> 
> You suggested this feature before, but I don't still agree with this.
> 
> No one can guarantee that the vmcore size will be below the estimated
> size every time. However, if makedumpfile provides "--vmcore-estimate",
> some users may trust it completely and disk overflow might happen. 
> Ideally, users should prepare the disk which can store the possible
> maximum size of vmcore. Of course they can reduce the disk size on their
> responsibility, but makedumpfile can't help it as official feature.

Hi,

Currently I hesitate between 2 different designs. One is dumping
/proc/kcore, user can get its size and check its content. The other is
just give out the dumpable_pages which is just like HP UX has done.

People told me in HP UX if execute crashconf, its output like:

Total pages on system:            260351
Total pages included in dump:     108751

Dump compressed:    ON

Dump Parallel:    ON

This is easier and can also help user. And the dumpable_pages is exactly
the same as vmcore will have if at thie time opoint crash happened.
Since dumping kcore will cause file write, that's the difference between
kcore and vmcore dumping.

What's your opinion on this?

Thanks
Baoquan


> 
> 
> Thanks
> Atsushi Kumagai
> 
> >> In this draft patch, a new configuration option is added,
> >>     "--vmcore-estimate"
> >> User can execute below command to get a dumped kcore. Since kcore is a
> >> elf file to map the whole memory of current kernel, it's  equal to the
> >> memory of crash kernel though it's not exact. Content of kcore is dynamic
> >> though /proc/vmcore is fixed once crash happened. But for vmcore size
> >> estimate, it is better enough.
> >>
> >> sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
> >>
> >> Questions:
> >> 1. Or we can get the dumpable page numbers only, then calculate the estimated
> >> vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
> >> lzo dump, we assume the compression ratio is 45%, then the estimate size is
> >> equal to: (dumpable page numbers) * 4096* 45%.
> >>
> >> This is easier but too rough, does anybody like this better compared with the
> >> real dumping implemented in this draft patch.
> >>
> >> 2. If dump the /proc/kcore, there's still a bug I can't fixed. When elf dump,
> >> in function write_elf_header()  it will pre-calculate a num_loads_dumpfile which
> >> is the number of program segment which will be dumped. However during dumping,
> >> the content of /proc/kcore is dynamic, the final num_loads_dumpfile may change
> >> when call write_elf_pages_cyclic/write_elf_pages(). This will cause the final
> >> dumped elf file has a bad file format. When you execute
> >> "readelf -a /var/crash/kcore-dump", you will be a little surprised.
> >>
> >> 3. This is not a formal patch, if the final solution is decided, I will post a
> >> patch, maybe a patchset. If you have suggestions about the code or implementation,
> >> please post your comment.
> >>
> >> Signed-off-by: Baoquan He <bhe@redhat.com>
> >> ---
> >>  elf_info.c     | 136 ++++++++++++++++--
> >>  elf_info.h     |  17 +++
> >>  makedumpfile.c | 438 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >>  makedumpfile.h |   5 +
> >>  4 files changed, 560 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/elf_info.c b/elf_info.c
> >> index b277f69..1b05ad1 100644
> >> --- a/elf_info.c
> >> +++ b/elf_info.c
> >> @@ -36,16 +36,9 @@
> >>
> >>  #define XEN_ELFNOTE_CRASH_INFO	(0x1000001)
> >>
> >> -struct pt_load_segment {
> >> -	off_t			file_offset;
> >> -	unsigned long long	phys_start;
> >> -	unsigned long long	phys_end;
> >> -	unsigned long long	virt_start;
> >> -	unsigned long long	virt_end;
> >> -};
> >>
> >>  static int			nr_cpus;             /* number of cpu */
> >> -static off_t			max_file_offset;
> >> +off_t			max_file_offset;
> >>
> >>  /*
> >>   * File information about /proc/vmcore:
> >> @@ -60,9 +53,9 @@ static int			flags_memory;
> >>  /*
> >>   * PT_LOAD information about /proc/vmcore:
> >>   */
> >> -static unsigned int		num_pt_loads;
> >> -static struct pt_load_segment	*pt_loads;
> >> -static off_t			offset_pt_load_memory;
> >> +unsigned int		num_pt_loads;
> >> +struct pt_load_segment	*pt_loads;
> >> +off_t			offset_pt_load_memory;
> >>
> >>  /*
> >>   * PT_NOTE information about /proc/vmcore:
> >> @@ -395,7 +388,49 @@ get_pt_note_info(void)
> >>  	return TRUE;
> >>  }
> >>
> >> +#define UNINITIALIZED  ((ulong)(-1))
> >>
> >> +#define SEEK_ERROR       (-1)
> >> +#define READ_ERROR       (-2)
> >> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
> >> +{
> >> +	int i;
> >> +	ulong kvaddr;
> >> +	Elf64_Nhdr *note64;
> >> +	off_t offset;
> >> +	char note[MAX_SIZE_NHDR];
> >> +	int size_desc;
> >> +	off_t offset_desc;
> >> +
> >> +	offset = UNINITIALIZED;
> >> +	kvaddr = (ulong)vmcoreinfo_addr | PAGE_OFFSET;
> >> +
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		if ((kvaddr >= p->virt_start) && (kvaddr < p->virt_end)) {
> >> +			offset = (off_t)(kvaddr - p->virt_start) +
> >> +			(off_t)p->file_offset;
> >> +			break;
> >> +		}
> >> +	}
> >> +
> >> +	if (offset == UNINITIALIZED)
> >> +		return SEEK_ERROR;
> >> +
> >> +        if (lseek(fd_memory, offset, SEEK_SET) != offset)
> >> +		perror("lseek");
> >> +
> >> +	if (read(fd_memory, note, MAX_SIZE_NHDR) != MAX_SIZE_NHDR)
> >> +		return READ_ERROR;
> >> +
> >> +	note64 = (Elf64_Nhdr *)note;
> >> +	size_desc   = note_descsz(note);
> >> +	offset_desc = offset + offset_note_desc(note);
> >> +
> >> +	set_vmcoreinfo(offset_desc, size_desc);
> >> +
> >> +	return 0;
> >> +}
> >>  /*
> >>   * External functions.
> >>   */
> >> @@ -681,6 +716,55 @@ get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr)
> >>  	return TRUE;
> >>  }
> >>
> >> +int
> >> +get_elf_loads(int fd, char *filename)
> >> +{
> >> +	int i, j, phnum, elf_format;
> >> +	Elf64_Phdr phdr;
> >> +
> >> +	/*
> >> +	 * Check ELF64 or ELF32.
> >> +	 */
> >> +	elf_format = check_elf_format(fd, filename, &phnum, &num_pt_loads);
> >> +	if (elf_format == ELF64)
> >> +		flags_memory |= MEMORY_ELF64;
> >> +	else if (elf_format != ELF32)
> >> +		return FALSE;
> >> +
> >> +	if (!num_pt_loads) {
> >> +		ERRMSG("Can't get the number of PT_LOAD.\n");
> >> +		return FALSE;
> >> +	}
> >> +
> >> +	/*
> >> +	 * The below file information will be used as /proc/vmcore.
> >> +	 */
> >> +	fd_memory   = fd;
> >> +	name_memory = filename;
> >> +
> >> +	pt_loads = calloc(sizeof(struct pt_load_segment), num_pt_loads);
> >> +	if (pt_loads == NULL) {
> >> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
> >> +		    strerror(errno));
> >> +		return FALSE;
> >> +	}
> >> +	for (i = 0, j = 0; i < phnum; i++) {
> >> +		if (!get_phdr_memory(i, &phdr))
> >> +			return FALSE;
> >> +
> >> +		if (phdr.p_type != PT_LOAD)
> >> +			continue;
> >> +
> >> +		if (j >= num_pt_loads)
> >> +			return FALSE;
> >> +		if(!dump_Elf_load(&phdr, j))
> >> +			return FALSE;
> >> +		j++;
> >> +	}
> >> +
> >> +	return TRUE;
> >> +}
> >> +
> >>  /*
> >>   * Get ELF information about /proc/vmcore.
> >>   */
> >> @@ -826,6 +910,36 @@ get_phdr_memory(int index, Elf64_Phdr *phdr)
> >>  	return TRUE;
> >>  }
> >>
> >> +int
> >> +get_phdr_load(int index, Elf64_Phdr *phdr)
> >> +{
> >> +	Elf32_Phdr phdr32;
> >> +
> >> +	if (is_elf64_memory()) { /* ELF64 */
> >> +		phdr->p_type = PT_LOAD;
> >> +		phdr->p_vaddr = pt_loads[index].virt_start;
> >> +		phdr->p_paddr = pt_loads[index].phys_start;
> >> +		phdr->p_memsz  = pt_loads[index].phys_end - pt_loads[index].phys_start;
> >> +		phdr->p_filesz = phdr->p_memsz;
> >> +		phdr->p_offset = pt_loads[index].file_offset;
> >> +	} else {
> >> +		if (!get_elf32_phdr(fd_memory, name_memory, index, &phdr32)) {
> >> +			ERRMSG("Can't find Phdr %d.\n", index);
> >> +			return FALSE;
> >> +		}
> >> +		memset(phdr, 0, sizeof(Elf64_Phdr));
> >> +		phdr->p_type   = phdr32.p_type;
> >> +		phdr->p_flags  = phdr32.p_flags;
> >> +		phdr->p_offset = phdr32.p_offset;
> >> +		phdr->p_vaddr  = phdr32.p_vaddr;
> >> +		phdr->p_paddr  = phdr32.p_paddr;
> >> +		phdr->p_filesz = phdr32.p_filesz;
> >> +		phdr->p_memsz  = phdr32.p_memsz;
> >> +		phdr->p_align  = phdr32.p_align;
> >> +	}
> >> +	return TRUE;
> >> +}
> >> +
> >>  off_t
> >>  get_offset_pt_load_memory(void)
> >>  {
> >> diff --git a/elf_info.h b/elf_info.h
> >> index 801faff..0c67d74 100644
> >> --- a/elf_info.h
> >> +++ b/elf_info.h
> >> @@ -27,6 +27,19 @@
> >>
> >>  #define MAX_SIZE_NHDR	MAX(sizeof(Elf64_Nhdr), sizeof(Elf32_Nhdr))
> >>
> >> +struct pt_load_segment {
> >> +	off_t			file_offset;
> >> +	unsigned long long	phys_start;
> >> +	unsigned long long	phys_end;
> >> +	unsigned long long	virt_start;
> >> +	unsigned long long	virt_end;
> >> +};
> >> +
> >> +extern off_t			max_file_offset;
> >> +extern unsigned int		num_pt_loads;
> >> +extern struct pt_load_segment	*pt_loads;
> >> +
> >> +extern off_t			offset_pt_load_memory;
> >>
> >>  off_t paddr_to_offset(unsigned long long paddr);
> >>  off_t paddr_to_offset2(unsigned long long paddr, off_t hint);
> >> @@ -44,11 +57,14 @@ int get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr);
> >>  int get_elf32_ehdr(int fd, char *filename, Elf32_Ehdr *ehdr);
> >>  int get_elf_info(int fd, char *filename);
> >>  void free_elf_info(void);
> >> +int get_elf_loads(int fd, char *filename);
> >>
> >>  int is_elf64_memory(void);
> >>  int is_xen_memory(void);
> >>
> >>  int get_phnum_memory(void);
> >> +
> >> +int get_phdr_load(int index, Elf64_Phdr *phdr);
> >>  int get_phdr_memory(int index, Elf64_Phdr *phdr);
> >>  off_t get_offset_pt_load_memory(void);
> >>  int get_pt_load(int idx,
> >> @@ -68,6 +84,7 @@ void get_pt_note(off_t *offset, unsigned long *size);
> >>  int has_vmcoreinfo(void);
> >>  void set_vmcoreinfo(off_t offset, unsigned long size);
> >>  void get_vmcoreinfo(off_t *offset, unsigned long *size);
> >> +int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len);
> >>
> >>  int has_vmcoreinfo_xen(void);
> >>  void get_vmcoreinfo_xen(off_t *offset, unsigned long *size);
> >> diff --git a/makedumpfile.c b/makedumpfile.c
> >> index 34db997..ac02747 100644
> >> --- a/makedumpfile.c
> >> +++ b/makedumpfile.c
> >> @@ -5146,6 +5146,7 @@ create_dump_bitmap(void)
> >>
> >>  	if (info->flag_cyclic) {
> >>
> >> +		printf("create_dump_bitmap flag_cyclic\n");
> >>  		if (info->flag_elf_dumpfile) {
> >>  			if (!prepare_bitmap_buffer_cyclic())
> >>  				goto out;
> >> @@ -5189,14 +5190,23 @@ get_loads_dumpfile(void)
> >>
> >>  	initialize_2nd_bitmap(&bitmap2);
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> -
> >> -	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >>  			return FALSE;
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +	}
> >> +
> >> +	for (i = 0; i < num_pt_loads; i++) {
> >> +		if (info->flag_vmcore_estimate) {
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >> +
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		pfn_start = paddr_to_pfn(load.p_paddr);
> >>  		pfn_end   = paddr_to_pfn(load.p_paddr + load.p_memsz);
> >> @@ -5734,17 +5744,26 @@ write_elf_pages(struct cache_data *cd_header, struct cache_data *cd_page)
> >>  	off_seg_load    = info->offset_load_dumpfile;
> >>  	cd_page->offset = info->offset_load_dumpfile;
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >> +			return FALSE;
> >> +	}
> >>
> >>  	gettimeofday(&tv_start, NULL);
> >>
> >>  	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> -			return FALSE;
> >> +		if (info->flag_vmcore_estimate) {
> >> +			memset(&load, 0, sizeof(load));
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >>
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		off_memory= load.p_offset;
> >>  		paddr     = load.p_paddr;
> >> @@ -5923,14 +5942,24 @@ get_loads_dumpfile_cyclic(void)
> >>  	Elf64_Phdr load;
> >>  	struct cycle cycle = {0};
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >> +			return FALSE;
> >> +	}
> >>
> >>  	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> -			return FALSE;
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +		if (info->flag_vmcore_estimate) {
> >> +			memset(&load, 0, sizeof(load) );
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >> +
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		pfn_start = paddr_to_pfn(load.p_paddr);
> >>  		pfn_end = paddr_to_pfn(load.p_paddr + load.p_memsz);
> >> @@ -6016,17 +6045,26 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
> >>  	pfn_user = pfn_free = pfn_hwpoison = 0;
> >>  	pfn_memhole = info->max_mapnr;
> >>
> >> -	if (!(phnum = get_phnum_memory()))
> >> -		return FALSE;
> >> +	if (info->flag_vmcore_estimate) {
> >> +		phnum = num_pt_loads;
> >> +	} else {
> >> +		if (!(phnum = get_phnum_memory()))
> >> +			return FALSE;
> >> +	}
> >>
> >>  	gettimeofday(&tv_start, NULL);
> >>
> >>  	for (i = 0; i < phnum; i++) {
> >> -		if (!get_phdr_memory(i, &load))
> >> -			return FALSE;
> >> +		if (info->flag_vmcore_estimate) {
> >> +			memset(&load, 0, sizeof(load));
> >> +			get_phdr_load(i , &load);
> >> +		} else {
> >> +			if (!get_phdr_memory(i, &load))
> >> +				return FALSE;
> >>
> >> -		if (load.p_type != PT_LOAD)
> >> -			continue;
> >> +			if (load.p_type != PT_LOAD)
> >> +				continue;
> >> +		}
> >>
> >>  		off_memory= load.p_offset;
> >>  		paddr = load.p_paddr;
> >> @@ -8929,6 +8967,13 @@ check_param_for_creating_dumpfile(int argc, char *argv[])
> >>  		 */
> >>  		info->name_memory   = argv[optind];
> >>
> >> +	} else if ((argc == optind + 2) && info->flag_vmcore_estimate) {
> >> +		/*
> >> +		 * Parameters for get the /proc/kcore to estimate
> >> +		 * the size of dumped vmcore
> >> +		 */
> >> +		info->name_memory   = argv[optind];
> >> +		info->name_dumpfile = argv[optind+1];
> >>  	} else
> >>  		return FALSE;
> >>
> >> @@ -9011,6 +9056,332 @@ out:
> >>  	return free_size;
> >>  }
> >>
> >> +struct memory_range {
> >> +        unsigned long long start, end;
> >> +};
> >> +
> >> +#define CRASH_RESERVED_MEM_NR   8
> >> +static struct memory_range crash_reserved_mem[CRASH_RESERVED_MEM_NR];
> >> +static int crash_reserved_mem_nr;
> >> +
> >> +/*
> >> + * iomem_for_each_line()
> >> + *
> >> + * Iterate over each line in the file returned by proc_iomem(). If match is
> >> + * NULL or if the line matches with our match-pattern then call the
> >> + * callback if non-NULL.
> >> + *
> >> + * Return the number of lines matched.
> >> + */
> >> +int iomem_for_each_line(char *match,
> >> +			      int (*callback)(void *data,
> >> +					      int nr,
> >> +					      char *str,
> >> +					      unsigned long base,
> >> +					      unsigned long length),
> >> +			      void *data)
> >> +{
> >> +	const char iomem[] = "/proc/iomem";
> >> +	char line[MAX_LINE];
> >> +	FILE *fp;
> >> +	unsigned long long start, end, size;
> >> +	char *str;
> >> +	int consumed;
> >> +	int count;
> >> +	int nr = 0;
> >> +
> >> +	fp = fopen(iomem, "r");
> >> +	if (!fp) {
> >> +		ERRMSG("Cannot open %s\n", iomem);
> >> +		exit(1);
> >> +	}
> >> +
> >> +	while(fgets(line, sizeof(line), fp) != 0) {
> >> +		count = sscanf(line, "%Lx-%Lx : %n", &start, &end, &consumed);
> >> +		if (count != 2)
> >> +			continue;
> >> +		str = line + consumed;
> >> +		size = end - start + 1;
> >> +		if (!match || memcmp(str, match, strlen(match)) == 0) {
> >> +			if (callback
> >> +			    && callback(data, nr, str, start, size) < 0) {
> >> +				break;
> >> +			}
> >> +			nr++;
> >> +		}
> >> +	}
> >> +
> >> +	fclose(fp);
> >> +
> >> +	return nr;
> >> +}
> >> +
> >> +static int crashkernel_mem_callback(void *data, int nr,
> >> +                                          char *str,
> >> +                                          unsigned long base,
> >> +                                          unsigned long length)
> >> +{
> >> +        if (nr >= CRASH_RESERVED_MEM_NR)
> >> +                return 1;
> >> +
> >> +        crash_reserved_mem[nr].start = base;
> >> +        crash_reserved_mem[nr].end   = base + length - 1;
> >> +        return 0;
> >> +}
> >> +
> >> +int is_crashkernel_mem_reserved(void)
> >> +{
> >> +        int ret;
> >> +
> >> +        ret = iomem_for_each_line("Crash kernel\n",
> >> +                                        crashkernel_mem_callback, NULL);
> >> +        crash_reserved_mem_nr = ret;
> >> +
> >> +        return !!crash_reserved_mem_nr;
> >> +}
> >> +
> >> +/* Returns the physical address of start of crash notes buffer for a kernel. */
> >> +static int get_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
> >> +{
> >> +	char line[MAX_LINE];
> >> +	int count;
> >> +	FILE *fp;
> >> +	unsigned long long temp, temp2;
> >> +
> >> +	*addr = 0;
> >> +	*len = 0;
> >> +
> >> +	if (!(fp = fopen("/sys/kernel/vmcoreinfo", "r")))
> >> +		return -1;
> >> +
> >> +	if (!fgets(line, sizeof(line), fp))
> >> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
> >> +	count = sscanf(line, "%Lx %Lx", &temp, &temp2);
> >> +	if (count != 2)
> >> +		ERRMSG("Cannot parse %s: %s\n", "/sys/kernel/vmcoreinfo", strerror(errno));
> >> +
> >> +	*addr = (uint64_t) temp;
> >> +	*len = (uint64_t) temp2;
> >> +
> >> +	fclose(fp);
> >> +	return 0;
> >> +}
> >> +
> >> +
> >> +static int exclude_segment(struct pt_load_segment **pt_loads, unsigned int	*num_pt_loads, uint64_t start,
> >uint64_t end)
> >> +{
> >> +        int i, j, tidx = -1;
> >> +	unsigned long long	vstart, vend, kvstart, kvend;
> >> +        struct pt_load_segment temp_seg = {0};
> >> +	kvstart = (ulong)start | PAGE_OFFSET;
> >> +	kvend = (ulong)end | PAGE_OFFSET;
> >> +	unsigned long size;
> >> +
> >> +        for (i = 0; i < (*num_pt_loads); i++) {
> >> +                vstart = (*pt_loads)[i].virt_start;
> >> +                vend = (*pt_loads)[i].virt_end;
> >> +                if (kvstart <  vend && kvend > vstart) {
> >> +                        if (kvstart != vstart && kvend != vend) {
> >> +				/* Split load segment */
> >> +				temp_seg.phys_start = end +1;
> >> +				temp_seg.phys_end = (*pt_loads)[i].phys_end;
> >> +				temp_seg.virt_start = kvend + 1;
> >> +				temp_seg.virt_end = vend;
> >> +				temp_seg.file_offset = (*pt_loads)[i].file_offset + temp_seg.virt_start -
> >(*pt_loads)[i].virt_start;
> >> +
> >> +				(*pt_loads)[i].virt_end = kvstart - 1;
> >> +				(*pt_loads)[i].phys_end =  start -1;
> >> +
> >> +				tidx = i+1;
> >> +                        } else if (kvstart != vstart) {
> >> +				(*pt_loads)[i].phys_end = start - 1;
> >> +				(*pt_loads)[i].virt_end = kvstart - 1;
> >> +                        } else {
> >> +				(*pt_loads)[i].phys_start = end + 1;
> >> +				(*pt_loads)[i].virt_start = kvend + 1;
> >> +                        }
> >> +                }
> >> +        }
> >> +        /* Insert split load segment, if any. */
> >> +	if (tidx >= 0) {
> >> +		size = (*num_pt_loads + 1) * sizeof((*pt_loads)[0]);
> >> +		(*pt_loads) = realloc((*pt_loads), size);
> >> +		if  (!(*pt_loads) ) {
> >> +		    ERRMSG("Cannot realloc %ld bytes: %s\n",
> >> +		            size + 0UL, strerror(errno));
> >> +			exit(1);
> >> +		}
> >> +		for (j = (*num_pt_loads - 1); j >= tidx; j--)
> >> +		        (*pt_loads)[j+1] = (*pt_loads)[j];
> >> +		(*pt_loads)[tidx] = temp_seg;
> >> +		(*num_pt_loads)++;
> >> +        }
> >> +        return 0;
> >> +}
> >> +
> >> +static int
> >> +process_dump_load(struct pt_load_segment	*pls)
> >> +{
> >> +	unsigned long long paddr;
> >> +
> >> +	paddr = vaddr_to_paddr(pls->virt_start);
> >> +	pls->phys_start  = paddr;
> >> +	pls->phys_end    = paddr + (pls->virt_end - pls->virt_start);
> >> +	MSG("process_dump_load\n");
> >> +	MSG("  phys_start : %llx\n", pls->phys_start);
> >> +	MSG("  phys_end   : %llx\n", pls->phys_end);
> >> +	MSG("  virt_start : %llx\n", pls->virt_start);
> >> +	MSG("  virt_end   : %llx\n", pls->virt_end);
> >> +
> >> +	return TRUE;
> >> +}
> >> +
> >> +int get_kcore_dump_loads()
> >> +{
> >> +	struct pt_load_segment	*pls;
> >> +	int i, j, loads=0;
> >> +	unsigned long long paddr;
> >> +
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		if (is_vmalloc_addr(p->virt_start))
> >> +			continue;
> >> +		loads++;
> >> +	}
> >> +
> >> +	pls = calloc(sizeof(struct pt_load_segment), j);
> >> +	if (pls == NULL) {
> >> +		ERRMSG("Can't allocate memory for the PT_LOAD. %s\n",
> >> +		    strerror(errno));
> >> +		return FALSE;
> >> +	}
> >> +
> >> +	for (i = 0, j=0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		if (is_vmalloc_addr(p->virt_start))
> >> +			continue;
> >> +		if (j >= loads)
> >> +			return FALSE;
> >> +
> >> +		if (j == 0) {
> >> +			offset_pt_load_memory = p->file_offset;
> >> +			if (offset_pt_load_memory == 0) {
> >> +				ERRMSG("Can't get the offset of page data.\n");
> >> +				return FALSE;
> >> +			}
> >> +		}
> >> +
> >> +		pls[j] = *p;
> >> +		process_dump_load(&pls[j]);
> >> +		j++;
> >> +	}
> >> +
> >> +	free(pt_loads);
> >> +	pt_loads = pls;
> >> +	num_pt_loads = loads;
> >> +
> >> +	for (i=0; i<crash_reserved_mem_nr; i++)
> >> +	{
> >> +		exclude_segment(&pt_loads, &num_pt_loads, crash_reserved_mem[i].start, crash_reserved_mem[i].end);
> >> +	}
> >> +
> >> +	max_file_offset = 0;
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		max_file_offset = MAX(max_file_offset,
> >> +				      p->file_offset + p->phys_end - p->phys_start);
> >> +	}
> >> +
> >> +	for (i = 0; i < num_pt_loads; ++i) {
> >> +		struct pt_load_segment *p = &pt_loads[i];
> >> +		MSG("LOAD (%d)\n", i);
> >> +		MSG("  phys_start : %llx\n", p->phys_start);
> >> +		MSG("  phys_end   : %llx\n", p->phys_end);
> >> +		MSG("  virt_start : %llx\n", p->virt_start);
> >> +		MSG("  virt_end   : %llx\n", p->virt_end);
> >> +	}
> >> +
> >> +	return TRUE;
> >> +}
> >> +
> >> +int get_page_offset()
> >> +{
> >> +	struct utsname utsname;
> >> +	if (uname(&utsname)) {
> >> +		ERRMSG("Cannot get name and information about current kernel : %s", strerror(errno));
> >> +		return FALSE;
> >> +	}
> >> +
> >> +	info->kernel_version = get_kernel_version(utsname.release);
> >> +	get_versiondep_info_x86_64();
> >> +	return TRUE;
> >> +}
> >> +
> >> +int vmcore_estimate(void)
> >> +{
> >> +	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
> >> +	int num_retry, status;
> >> +
> >> +	if (!is_crashkernel_mem_reserved()) {
> >> +		ERRMSG("No memory is reserved for crashkenrel!\n");
> >> +		exit(1);
> >> +	}
> >> +
> >> +	get_page_offset();
> >> +
> >> +#if 1
> >> +	if (!open_dump_memory())
> >> +		return FALSE;
> >> +#endif
> >> +
> >> +	if (info->flag_vmcore_estimate) {
> >> +		if (!get_elf_loads(info->fd_memory, info->name_memory))
> >> +			return FALSE;
> >> +	}
> >> +
> >> +	if (get_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> >> +		return FALSE;
> >> +
> >> +	if (set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> >> +		return FALSE;
> >> +
> >> +	if (!get_kcore_dump_loads())
> >> +		return FALSE;
> >> +
> >> +#if 1
> >> +	if (!initial())
> >> +		return FALSE;
> >> +#endif
> >> +
> >> +retry:
> >> +	if (!create_dump_bitmap())
> >> +		return FALSE;
> >> +
> >> +	if ((status = writeout_dumpfile()) == FALSE)
> >> +		return FALSE;
> >> +
> >> +	if (status == NOSPACE) {
> >> +		/*
> >> +		 * If specifying the other dump_level, makedumpfile tries
> >> +		 * to create a dumpfile with it again.
> >> +		 */
> >> +		num_retry++;
> >> +		if ((info->dump_level = get_next_dump_level(num_retry)) < 0)
> >> +			return FALSE;
> >> +		MSG("Retry to create a dumpfile by dump_level(%d).\n",
> >> +		    info->dump_level);
> >> +		if (!delete_dumpfile())
> >> +			return FALSE;
> >> +		goto retry;
> >> +	}
> >> +	print_report();
> >> +
> >> +	clear_filter_info();
> >> +	if (!close_files_for_creating_dumpfile())
> >> +		return FALSE;
> >> +
> >> +	return TRUE;
> >> +}
> >>
> >>  /*
> >>   * Choose the lesser value of the two below as the size of cyclic buffer.
> >> @@ -9063,6 +9434,7 @@ static struct option longopts[] = {
> >>  	{"cyclic-buffer", required_argument, NULL, OPT_CYCLIC_BUFFER},
> >>  	{"eppic", required_argument, NULL, OPT_EPPIC},
> >>  	{"non-mmap", no_argument, NULL, OPT_NON_MMAP},
> >> +	{"vmcore-estimate", no_argument, NULL, OPT_VMCORE_ESTIMATE},
> >>  	{0, 0, 0, 0}
> >>  };
> >>
> >> @@ -9154,6 +9526,9 @@ main(int argc, char *argv[])
> >>  		case OPT_DUMP_DMESG:
> >>  			info->flag_dmesg = 1;
> >>  			break;
> >> +		case OPT_VMCORE_ESTIMATE:
> >> +			info->flag_vmcore_estimate = 1;
> >> +			break;
> >>  		case OPT_COMPRESS_SNAPPY:
> >>  			info->flag_compress = DUMP_DH_COMPRESSED_SNAPPY;
> >>  			break;
> >> @@ -9294,6 +9669,19 @@ main(int argc, char *argv[])
> >>
> >>  		MSG("\n");
> >>  		MSG("The dmesg log is saved to %s.\n", info->name_dumpfile);
> >> +	} else if (info->flag_vmcore_estimate) {
> >> +#if 1
> >> +		if (!check_param_for_creating_dumpfile(argc, argv)) {
> >> +			MSG("Commandline parameter is invalid.\n");
> >> +			MSG("Try `makedumpfile --help' for more information.\n");
> >> +			goto out;
> >> +		}
> >> +#endif
> >> +		if (!vmcore_estimate())
> >> +			goto out;
> >> +
> >> +		MSG("\n");
> >> +		MSG("vmcore size estimate successfully.\n");
> >>  	} else {
> >>  		if (!check_param_for_creating_dumpfile(argc, argv)) {
> >>  			MSG("Commandline parameter is invalid.\n");
> >> diff --git a/makedumpfile.h b/makedumpfile.h
> >> index 9402f05..c401337 100644
> >> --- a/makedumpfile.h
> >> +++ b/makedumpfile.h
> >> @@ -216,6 +216,9 @@ isAnon(unsigned long mapping)
> >>  #define FILENAME_STDOUT		"STDOUT"
> >>  #define MAP_REGION		(4096*1024)
> >>
> >> +#define MAX_LINE	160
> >> +
> >> +
> >>  /*
> >>   * Minimam vmcore has 2 ProgramHeaderTables(PT_NOTE and PT_LOAD).
> >>   */
> >> @@ -910,6 +913,7 @@ struct DumpInfo {
> >>  	int		flag_force;	     /* overwrite existing stuff */
> >>  	int		flag_exclude_xen_dom;/* exclude Domain-U from xen-kdump */
> >>  	int             flag_dmesg;          /* dump the dmesg log out of the vmcore file */
> >> +	int             flag_vmcore_estimate;          /* estimate the size  of vmcore in current system */
> >>  	int		flag_use_printk_log; /* did we read printk_log symbol name? */
> >>  	int		flag_nospace;	     /* the flag of "No space on device" error */
> >>  	int		flag_vmemmap;        /* kernel supports vmemmap address space */
> >> @@ -1764,6 +1768,7 @@ struct elf_prstatus {
> >>  #define OPT_CYCLIC_BUFFER       OPT_START+11
> >>  #define OPT_EPPIC               OPT_START+12
> >>  #define OPT_NON_MMAP            OPT_START+13
> >> +#define OPT_VMCORE_ESTIMATE            OPT_START+14
> >>
> >>  /*
> >>   * Function Prototype.
> >> --
> >> 1.8.5.3
> >>
> >>
> >> _______________________________________________
> >> kexec mailing list
> >> kexec@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-11 12:39 [PATCH] Makedumpfile: vmcore size estimate Baoquan He
  2014-06-11 13:57 ` Baoquan He
@ 2014-06-23 12:36 ` Vivek Goyal
  2014-06-23 13:05   ` Baoquan He
  1 sibling, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2014-06-23 12:36 UTC (permalink / raw)
  To: Baoquan He; +Cc: kumagai-atsushi, kexec

On Wed, Jun 11, 2014 at 08:39:05PM +0800, Baoquan He wrote:
> User want to get a rough estimate of vmcore size, then they can decide
> how much storage space is reserved for vmcore dumping. This can help them
> to deploy their machines better, possibly hundreds of machines.
> 
> In this draft patch, a new configuration option is added,
>     "--vmcore-estimate"
> User can execute below command to get a dumped kcore. Since kcore is a
> elf file to map the whole memory of current kernel, it's  equal to the
> memory of crash kernel though it's not exact. Content of kcore is dynamic
> though /proc/vmcore is fixed once crash happened. But for vmcore size
> estimate, it is better enough.
> 
> sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
> 
> Questions:
> 1. Or we can get the dumpable page numbers only, then calculate the estimated
> vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
> lzo dump, we assume the compression ratio is 45%, then the estimate size is
> equal to: (dumpable page numbers) * 4096* 45%.

I think that we can probably not guess the saving from compression.
Compression ratio varies based on content of page. So if we keep it simple
and just calculate the number of pages which will be dumped and multiply
it by page size, that number will be much more accurate (for current
system).

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-20  1:07   ` Atsushi Kumagai
  2014-06-20  1:58     ` bhe
  2014-06-20  2:33     ` bhe
@ 2014-06-23 12:57     ` Vivek Goyal
  2014-06-26  8:21       ` Atsushi Kumagai
  2 siblings, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2014-06-23 12:57 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, bhe

On Fri, Jun 20, 2014 at 01:07:52AM +0000, Atsushi Kumagai wrote:
> Hello Baoquan,
> 
> >Forget to mention only x86-64 is processed in this patch.
> >
> >On 06/11/14 at 08:39pm, Baoquan He wrote:
> >> User want to get a rough estimate of vmcore size, then they can decide
> >> how much storage space is reserved for vmcore dumping. This can help them
> >> to deploy their machines better, possibly hundreds of machines.
> 
> You suggested this feature before, but I don't still agree with this.
> 
> No one can guarantee that the vmcore size will be below the estimated
> size every time. However, if makedumpfile provides "--vmcore-estimate",
> some users may trust it completely and disk overflow might happen. 
> Ideally, users should prepare the disk which can store the possible
> maximum size of vmcore. Of course they can reduce the disk size on their
> responsibility, but makedumpfile can't help it as official feature.

Hi Atsushi,

Recently quite a few people have asked us for this feature. They manage
lots of system and have attached local disk or partition for saving
dump. Now say a system has few Tera bytes of memory and dedicating one
partition of size of few tera bytes per machine just for saving dump might
not be very practical.

I was given the example that AIX supports this kind of estimates too and
in fact looks like they leave a message if they find that current dump
partition size will not be sufficient to save dump. 

I think it is a good idea to try to solve this problem. We might not be
accurate but it will be better than user guessing that by how much to
reduce the partition size.

I am wondering what are the technical concerns. IIUC, biggest problem is that
number of pages dumped will vary as system continues to run. So
immediately after boot number of pages to be dumped might be small but
as more applications are launched, number of pages to be dumped will
incresae, most likely.

We can try to mitigate above problem by creating a new service which can
run at configured interval and check the size of memory required for
dump and size of dump partition configured. And user can either disable
this service or configure it to run every hour or every day or every week
or any interval they like to. 

So as long as we can come up with a tool which can guess number of pages
to be dumped fairly accurately, we should have a reasonably good system.
It will atleast be much better than user guessing the size of dump parition.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-23 12:36 ` Vivek Goyal
@ 2014-06-23 13:05   ` Baoquan He
  2014-06-23 13:55     ` Anders Rayner-Karlsson
  0 siblings, 1 reply; 16+ messages in thread
From: Baoquan He @ 2014-06-23 13:05 UTC (permalink / raw)
  To: Vivek Goyal, tak, gbarros; +Cc: kumagai-atsushi, kexec

On 06/23/14 at 08:36am, Vivek Goyal wrote:
> On Wed, Jun 11, 2014 at 08:39:05PM +0800, Baoquan He wrote:
> > sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
> > 
> > Questions:
> > 1. Or we can get the dumpable page numbers only, then calculate the estimated
> > vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
> > lzo dump, we assume the compression ratio is 45%, then the estimate size is
> > equal to: (dumpable page numbers) * 4096* 45%.
> 
> I think that we can probably not guess the saving from compression.
> Compression ratio varies based on content of page. So if we keep it simple
> and just calculate the number of pages which will be dumped and multiply
> it by page size, that number will be much more accurate (for current
> system).

Yeah, this is another plan. Then the output will be the size of elf
dump. If user configured the kdump compression, such as lzo/snappy, they
can just estimate it by themselves. It's OK if user can accept this
since this is much more accurate.

Hi Anders,

Do you have any comments on this? I found you are concerned with this
issue too.

Thanks
Baoquan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-23 13:05   ` Baoquan He
@ 2014-06-23 13:55     ` Anders Rayner-Karlsson
  0 siblings, 0 replies; 16+ messages in thread
From: Anders Rayner-Karlsson @ 2014-06-23 13:55 UTC (permalink / raw)
  To: Baoquan He; +Cc: Barros Guilherme, kumagai-atsushi, kexec, Vivek Goyal


[-- Attachment #1.1: Type: text/plain, Size: 2330 bytes --]


On 23 Jun 2014, at 15:05, Baoquan He <bhe@redhat.com> wrote:

> On 06/23/14 at 08:36am, Vivek Goyal wrote:
>> On Wed, Jun 11, 2014 at 08:39:05PM +0800, Baoquan He wrote:
>>> sudo makedumpfile -E -d 31 --vmcore-estimate /proc/kcore /var/crash/kcore-dump
>>> 
>>> Questions:
>>> 1. Or we can get the dumpable page numbers only, then calculate the estimated
>>> vmcore size by a predifined factor if it's kdump compressed dumping. E.g if
>>> lzo dump, we assume the compression ratio is 45%, then the estimate size is
>>> equal to: (dumpable page numbers) * 4096* 45%.
>> 
>> I think that we can probably not guess the saving from compression.
>> Compression ratio varies based on content of page. So if we keep it simple
>> and just calculate the number of pages which will be dumped and multiply
>> it by page size, that number will be much more accurate (for current
>> system).
> 
> Yeah, this is another plan. Then the output will be the size of elf
> dump. If user configured the kdump compression, such as lzo/snappy, they
> can just estimate it by themselves. It's OK if user can accept this
> since this is much more accurate.
> 
> Hi Anders,
> 
> Do you have any comments on this? I found you are concerned with this
> issue too.

Hi there,

Only comment I have is that I like your approach and that I agree
we will not be able to accurately predict compression ratio. While
we could run the compression to get an accurate prediction, it
would not be accurate five minutes later, so best not to go there
at all. Just being able to tell how big the dump will be at a
specific dumplevel, uncompressed, will be very welcome by customers
and partners.

Because it allows them to gauge how much space they need to set
aside for /var/crash or where ever they need to write the dump.
The old perl script I did after speaking with Neil Horman was only
every able to predict dumplevel 31, but not all of the time is
that enough. Changing to 17 or 1 for a specific problem, they want
to know how much space they need for that.

So, I am very happy to see this effort underway. :)

Thank you,

--
Anders Rayner-Karlsson / akarlsso@redhat.com / +46-76-805-2173
Principal Technical Account Manager & Support Account Director
         Red Hat Strategic Customer Engagement Team



[-- Attachment #1.2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

[-- Attachment #2: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-23 12:57     ` Vivek Goyal
@ 2014-06-26  8:21       ` Atsushi Kumagai
  2014-06-26 12:31         ` Vivek Goyal
  0 siblings, 1 reply; 16+ messages in thread
From: Atsushi Kumagai @ 2014-06-26  8:21 UTC (permalink / raw)
  To: vgoyal; +Cc: kexec, bhe

>On Fri, Jun 20, 2014 at 01:07:52AM +0000, Atsushi Kumagai wrote:
>> Hello Baoquan,
>>
>> >Forget to mention only x86-64 is processed in this patch.
>> >
>> >On 06/11/14 at 08:39pm, Baoquan He wrote:
>> >> User want to get a rough estimate of vmcore size, then they can decide
>> >> how much storage space is reserved for vmcore dumping. This can help them
>> >> to deploy their machines better, possibly hundreds of machines.
>>
>> You suggested this feature before, but I don't still agree with this.
>>
>> No one can guarantee that the vmcore size will be below the estimated
>> size every time. However, if makedumpfile provides "--vmcore-estimate",
>> some users may trust it completely and disk overflow might happen.
>> Ideally, users should prepare the disk which can store the possible
>> maximum size of vmcore. Of course they can reduce the disk size on their
>> responsibility, but makedumpfile can't help it as official feature.
>
>Hi Atsushi,
>
>Recently quite a few people have asked us for this feature. They manage
>lots of system and have attached local disk or partition for saving
>dump. Now say a system has few Tera bytes of memory and dedicating one
>partition of size of few tera bytes per machine just for saving dump might
>not be very practical.
>
>I was given the example that AIX supports this kind of estimates too and
>in fact looks like they leave a message if they find that current dump
>partition size will not be sufficient to save dump.
>
>I think it is a good idea to try to solve this problem. We might not be
>accurate but it will be better than user guessing that by how much to
>reduce the partition size.
>
>I am wondering what are the technical concerns. IIUC, biggest problem is that
>number of pages dumped will vary as system continues to run. So
>immediately after boot number of pages to be dumped might be small but
>as more applications are launched, number of pages to be dumped will
>incresae, most likely.

Yes, the actual dump size will be different from the estimated size, so 
this feature sounds incomplete and irresponsible to me.
The gap of size may be small in some cases especially when the dump level
is 31, but it will be variable basically, therefore I don't want to show
such an inaccurate value by makedumpfile.

>We can try to mitigate above problem by creating a new service which can
>run at configured interval and check the size of memory required for
>dump and size of dump partition configured. And user can either disable
>this service or configure it to run every hour or every day or every week
>or any interval they like to.

I think it's too much work, why do you want to check the required disk
size so frequently? To get the order of magnitude once will be enough to
prepare the disk for dump since we should include some space in the disk size
to absorb the fluctuation of the vmcore size. However, it depends on the
machine's work, makedumpfile can't provide the index for estimation which
can be applied to everyone.


Thanks
Atsushi Kumagai

>So as long as we can come up with a tool which can guess number of pages
>to be dumped fairly accurately, we should have a reasonably good system.
>It will atleast be much better than user guessing the size of dump parition.
>
>Thanks
>Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-26  8:21       ` Atsushi Kumagai
@ 2014-06-26 12:31         ` Vivek Goyal
  2014-07-02  0:25           ` Atsushi Kumagai
  0 siblings, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2014-06-26 12:31 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, bhe

On Thu, Jun 26, 2014 at 08:21:58AM +0000, Atsushi Kumagai wrote:
> >On Fri, Jun 20, 2014 at 01:07:52AM +0000, Atsushi Kumagai wrote:
> >> Hello Baoquan,
> >>
> >> >Forget to mention only x86-64 is processed in this patch.
> >> >
> >> >On 06/11/14 at 08:39pm, Baoquan He wrote:
> >> >> User want to get a rough estimate of vmcore size, then they can decide
> >> >> how much storage space is reserved for vmcore dumping. This can help them
> >> >> to deploy their machines better, possibly hundreds of machines.
> >>
> >> You suggested this feature before, but I don't still agree with this.
> >>
> >> No one can guarantee that the vmcore size will be below the estimated
> >> size every time. However, if makedumpfile provides "--vmcore-estimate",
> >> some users may trust it completely and disk overflow might happen.
> >> Ideally, users should prepare the disk which can store the possible
> >> maximum size of vmcore. Of course they can reduce the disk size on their
> >> responsibility, but makedumpfile can't help it as official feature.
> >
> >Hi Atsushi,
> >
> >Recently quite a few people have asked us for this feature. They manage
> >lots of system and have attached local disk or partition for saving
> >dump. Now say a system has few Tera bytes of memory and dedicating one
> >partition of size of few tera bytes per machine just for saving dump might
> >not be very practical.
> >
> >I was given the example that AIX supports this kind of estimates too and
> >in fact looks like they leave a message if they find that current dump
> >partition size will not be sufficient to save dump.
> >
> >I think it is a good idea to try to solve this problem. We might not be
> >accurate but it will be better than user guessing that by how much to
> >reduce the partition size.
> >
> >I am wondering what are the technical concerns. IIUC, biggest problem is that
> >number of pages dumped will vary as system continues to run. So
> >immediately after boot number of pages to be dumped might be small but
> >as more applications are launched, number of pages to be dumped will
> >incresae, most likely.
> 
> Yes, the actual dump size will be different from the estimated size, so 
> this feature sounds incomplete and irresponsible to me.
> The gap of size may be small in some cases especially when the dump level
> is 31, but it will be variable basically, therefore I don't want to show
> such an inaccurate value by makedumpfile.

Hi Atsushi,

I am wondering why it will not be accurate or atleast close to accurate.
Technically we should be able to walk through all the struct pages and
decide which ones will be dumped based on filtering level. And that should
give us pretty good idea of dump size. Isn't it.

So for a *given momment in time* our estimates should be pretty close for
all dump levels.

What is variable though is that this estimate will change as more
applications are launched and devices come and go as that will force
allocation of new memory hence size of dump.

And that we should be able to handle with the help of another service.

> 
> >We can try to mitigate above problem by creating a new service which can
> >run at configured interval and check the size of memory required for
> >dump and size of dump partition configured. And user can either disable
> >this service or configure it to run every hour or every day or every week
> >or any interval they like to.
> 
> I think it's too much work, why do you want to check the required disk
> size so frequently?

Interval can be dynamic. Checking estimate after fresh boot will not make
sense as system is not loaded at all.

May be run it once a month if you think once a week is too much.

> To get the order of magnitude once will be enough to
> prepare the disk for dump since we should include some space in the disk size
> to absorb the fluctuation of the vmcore size.

I think that's also fine. That can be first step and if that works we
don't have to create a service to check it regularly. I am only concerned
that estimate taken after boot can vary a lot on large machines once all
the applications are launched.

> However, it depends on the
> machine's work, makedumpfile can't provide the index for estimation which
> can be applied to everyone.

I still don't understand that why makedumpfile can't provide an estimate
* of that momement * reasonably.

I don't want to implement a separate utility for this as makedumpfile
already has all the logic to go through pages, prepare bitmaps and figure
out which ones will be dumped. It will be just duplication of code and
waste of effort.

We have a real problem at our hand. What do we tell customers that how
big your dump partition should be. They have a multi tera byte machine. Do
we tell them that create a multi tera byte dedicated dump partition.
That's not practical at all.

And asking them to guess is not reasonable either. makedumpfile can make
much more educated guesses. It is not perfect but it is still much better
than user making a wild guess.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] Makedumpfile: vmcore size estimate
  2014-06-26 12:31         ` Vivek Goyal
@ 2014-07-02  0:25           ` Atsushi Kumagai
  2014-07-02  8:13             ` bhe
  0 siblings, 1 reply; 16+ messages in thread
From: Atsushi Kumagai @ 2014-07-02  0:25 UTC (permalink / raw)
  To: vgoyal, bhe; +Cc: kexec

>On Thu, Jun 26, 2014 at 08:21:58AM +0000, Atsushi Kumagai wrote:
>> >On Fri, Jun 20, 2014 at 01:07:52AM +0000, Atsushi Kumagai wrote:
>> >> Hello Baoquan,
>> >>
>> >> >Forget to mention only x86-64 is processed in this patch.
>> >> >
>> >> >On 06/11/14 at 08:39pm, Baoquan He wrote:
>> >> >> User want to get a rough estimate of vmcore size, then they can decide
>> >> >> how much storage space is reserved for vmcore dumping. This can help them
>> >> >> to deploy their machines better, possibly hundreds of machines.
>> >>
>> >> You suggested this feature before, but I don't still agree with this.
>> >>
>> >> No one can guarantee that the vmcore size will be below the estimated
>> >> size every time. However, if makedumpfile provides "--vmcore-estimate",
>> >> some users may trust it completely and disk overflow might happen.
>> >> Ideally, users should prepare the disk which can store the possible
>> >> maximum size of vmcore. Of course they can reduce the disk size on their
>> >> responsibility, but makedumpfile can't help it as official feature.
>> >
>> >Hi Atsushi,
>> >
>> >Recently quite a few people have asked us for this feature. They manage
>> >lots of system and have attached local disk or partition for saving
>> >dump. Now say a system has few Tera bytes of memory and dedicating one
>> >partition of size of few tera bytes per machine just for saving dump might
>> >not be very practical.
>> >
>> >I was given the example that AIX supports this kind of estimates too and
>> >in fact looks like they leave a message if they find that current dump
>> >partition size will not be sufficient to save dump.
>> >
>> >I think it is a good idea to try to solve this problem. We might not be
>> >accurate but it will be better than user guessing that by how much to
>> >reduce the partition size.
>> >
>> >I am wondering what are the technical concerns. IIUC, biggest problem is that
>> >number of pages dumped will vary as system continues to run. So
>> >immediately after boot number of pages to be dumped might be small but
>> >as more applications are launched, number of pages to be dumped will
>> >incresae, most likely.
>>
>> Yes, the actual dump size will be different from the estimated size, so
>> this feature sounds incomplete and irresponsible to me.
>> The gap of size may be small in some cases especially when the dump level
>> is 31, but it will be variable basically, therefore I don't want to show
>> such an inaccurate value by makedumpfile.
>
>Hi Atsushi,
>
>I am wondering why it will not be accurate or atleast close to accurate.
>Technically we should be able to walk through all the struct pages and
>decide which ones will be dumped based on filtering level. And that should
>give us pretty good idea of dump size. Isn't it.
>
>So for a *given momment in time* our estimates should be pretty close for
>all dump levels.
>
>What is variable though is that this estimate will change as more
>applications are launched and devices come and go as that will force
>allocation of new memory hence size of dump.
>
>And that we should be able to handle with the help of another service.
>
>>
>> >We can try to mitigate above problem by creating a new service which can
>> >run at configured interval and check the size of memory required for
>> >dump and size of dump partition configured. And user can either disable
>> >this service or configure it to run every hour or every day or every week
>> >or any interval they like to.
>>
>> I think it's too much work, why do you want to check the required disk
>> size so frequently?
>
>Interval can be dynamic. Checking estimate after fresh boot will not make
>sense as system is not loaded at all.
>
>May be run it once a month if you think once a week is too much.
>
>> To get the order of magnitude once will be enough to
>> prepare the disk for dump since we should include some space in the disk size
>> to absorb the fluctuation of the vmcore size.
>
>I think that's also fine. That can be first step and if that works we
>don't have to create a service to check it regularly. I am only concerned
>that estimate taken after boot can vary a lot on large machines once all
>the applications are launched.
>
>> However, it depends on the
>> machine's work, makedumpfile can't provide the index for estimation which
>> can be applied to everyone.
>
>I still don't understand that why makedumpfile can't provide an estimate
>* of that momement * reasonably.

I think *estimate* is inappropriate to express this feature since it just
analyze the memory usage at that moment, I want to avoid a misunderstanding
that this feature is a prediction. 

>I don't want to implement a separate utility for this as makedumpfile
>already has all the logic to go through pages, prepare bitmaps and figure
>out which ones will be dumped. It will be just duplication of code and
>waste of effort.
>
>We have a real problem at our hand. What do we tell customers that how
>big your dump partition should be. They have a multi tera byte machine. Do
>we tell them that create a multi tera byte dedicated dump partition.
>That's not practical at all.
>
>And asking them to guess is not reasonable either. makedumpfile can make
>much more educated guesses. It is not perfect but it is still much better
>than user making a wild guess.

Well, fine. I have 2 requests for accepting this feature:

  - Make this feature as simple as possible.
    I don't want to take time to maintain this, so I prefer the
    other idea which, Baoquan said, is like HP UX's feature.

  - Don't provide this feature as "vmcore size estimate", it just
    show the number of dumpable pages at the moment. Then please show
    the WARNING message to inform users about it.

Could you remake your patch, Baoquan?


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-07-02  0:25           ` Atsushi Kumagai
@ 2014-07-02  8:13             ` bhe
  2014-07-04  1:53               ` HATAYAMA, Daisuke
  0 siblings, 1 reply; 16+ messages in thread
From: bhe @ 2014-07-02  8:13 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, vgoyal

On 07/02/14 at 12:25am, Atsushi Kumagai wrote:
> >On Thu, Jun 26, 2014 at 08:21:58AM +0000, Atsushi Kumagai wrote:

> >I still don't understand that why makedumpfile can't provide an estimate
> >* of that momement * reasonably.
> 
> I think *estimate* is inappropriate to express this feature since it just
> analyze the memory usage at that moment, I want to avoid a misunderstanding
> that this feature is a prediction. 

I agree that understanding it as analyze memory usage is better.

> 
> >I don't want to implement a separate utility for this as makedumpfile
> >already has all the logic to go through pages, prepare bitmaps and figure
> >out which ones will be dumped. It will be just duplication of code and
> >waste of effort.
> >
> >We have a real problem at our hand. What do we tell customers that how
> >big your dump partition should be. They have a multi tera byte machine. Do
> >we tell them that create a multi tera byte dedicated dump partition.
> >That's not practical at all.
> >
> >And asking them to guess is not reasonable either. makedumpfile can make
> >much more educated guesses. It is not perfect but it is still much better
> >than user making a wild guess.
> 
> Well, fine. I have 2 requests for accepting this feature:
> 
>   - Make this feature as simple as possible.
>     I don't want to take time to maintain this, so I prefer the
>     other idea which, Baoquan said, is like HP UX's feature.

OK, I will try. In fact it's simple to just show the number of dumpable
pages.

> 
>   - Don't provide this feature as "vmcore size estimate", it just
>     show the number of dumpable pages at the moment. Then please show
>     the WARNING message to inform users about it.

OK, good suggestion. Will do.

> 
> Could you remake your patch, Baoquan?
> 
> 
> Thanks
> Atsushi Kumagai
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-07-02  8:13             ` bhe
@ 2014-07-04  1:53               ` HATAYAMA, Daisuke
  2014-07-04  9:44                 ` bhe
  2014-07-11 13:51                 ` Vivek Goyal
  0 siblings, 2 replies; 16+ messages in thread
From: HATAYAMA, Daisuke @ 2014-07-04  1:53 UTC (permalink / raw)
  To: bhe, Atsushi Kumagai; +Cc: kexec, vgoyal



(2014/07/02 17:13), bhe@redhat.com wrote:
> On 07/02/14 at 12:25am, Atsushi Kumagai wrote:
>>> On Thu, Jun 26, 2014 at 08:21:58AM +0000, Atsushi Kumagai wrote:
>
>>> I still don't understand that why makedumpfile can't provide an estimate
>>> * of that momement * reasonably.
>>
>> I think *estimate* is inappropriate to express this feature since it just
>> analyze the memory usage at that moment, I want to avoid a misunderstanding
>> that this feature is a prediction.
>
> I agree that understanding it as analyze memory usage is better.
>
>>
>>> I don't want to implement a separate utility for this as makedumpfile
>>> already has all the logic to go through pages, prepare bitmaps and figure
>>> out which ones will be dumped. It will be just duplication of code and
>>> waste of effort.
>>>
>>> We have a real problem at our hand. What do we tell customers that how
>>> big your dump partition should be. They have a multi tera byte machine. Do
>>> we tell them that create a multi tera byte dedicated dump partition.
>>> That's not practical at all.
>>>
>>> And asking them to guess is not reasonable either. makedumpfile can make
>>> much more educated guesses. It is not perfect but it is still much better
>>> than user making a wild guess.
>>
>> Well, fine. I have 2 requests for accepting this feature:
>>
>>    - Make this feature as simple as possible.
>>      I don't want to take time to maintain this, so I prefer the
>>      other idea which, Baoquan said, is like HP UX's feature.
>
> OK, I will try. In fact it's simple to just show the number of dumpable
> pages.
>
>>
>>    - Don't provide this feature as "vmcore size estimate", it just
>>      show the number of dumpable pages at the moment. Then please show
>>      the WARNING message to inform users about it.
>
> OK, good suggestion. Will do.
>

There are things that make users to guess actual vmcore size; not only compression but also additional meta-data, note data, header, bitmaps, of vmcores. I think it important to stress that the actual dump size should be larger than the number of pages displayed there so you (users) should care about that enough.

For fail safe, it should address ENOSPC case more. Sadly, preparing too small disks for vmcores is human error. In general, we cannot avoid this in real world. It's important to make vmcore valid even in case of ENOSPC in the sense that at least generated part of vmcore can correctly be analized by crash. In this direction, I previously sent the patch to create 1st bitmap first but this patch alone is still unsatisfactory to deal with the issue. It's necessary to flush the 2nd bitmap and the data left in caches too.

>>
>> Could you remake your patch, Baoquan?
>>

-- 
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-07-04  1:53               ` HATAYAMA, Daisuke
@ 2014-07-04  9:44                 ` bhe
  2014-07-11 13:51                 ` Vivek Goyal
  1 sibling, 0 replies; 16+ messages in thread
From: bhe @ 2014-07-04  9:44 UTC (permalink / raw)
  To: HATAYAMA, Daisuke; +Cc: kexec, Atsushi Kumagai, vgoyal

On 07/04/14 at 10:53am, HATAYAMA, Daisuke wrote:
> 
> There are things that make users to guess actual vmcore size; not only compression but also additional meta-data, note data, header, bitmaps, of vmcores. I think it important to stress that the actual dump size should be larger than the number of pages displayed there so you (users) should care about that enough.
> 

Yes, if dump is kdump compression user need know actual vmcore includes
more. If only show number of dumpable page, the size should be very
close to the elf vmcore size. But I think it's a good idea to tell user
the difference.

> For fail safe, it should address ENOSPC case more. Sadly, preparing too small disks for vmcores is human error. In general, we cannot avoid this in real world. It's important to make vmcore valid even in case of ENOSPC in the sense that at least generated part of vmcore can correctly be analized by crash. In this direction, I previously sent the patch to create 1st bitmap first but this patch alone is still unsatisfactory to deal with the issue. It's necessary to flush the 2nd bitmap and the data left in caches too.
> 

Yes, it makes sense to me that generate part of vmcore even in case of
ENOSPC, maybe later further attempt can be taken in this direction. 


> 
> -- 
> Thanks.
> HATAYAMA, Daisuke
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Makedumpfile: vmcore size estimate
  2014-07-04  1:53               ` HATAYAMA, Daisuke
  2014-07-04  9:44                 ` bhe
@ 2014-07-11 13:51                 ` Vivek Goyal
  1 sibling, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2014-07-11 13:51 UTC (permalink / raw)
  To: HATAYAMA, Daisuke; +Cc: kexec, Atsushi Kumagai, bhe

On Fri, Jul 04, 2014 at 10:53:35AM +0900, HATAYAMA, Daisuke wrote:

[..]
> For fail safe, it should address ENOSPC case more. Sadly, preparing too small disks for vmcores is human error. In general, we cannot avoid this in real world. It's important to make vmcore valid even in case of ENOSPC in the sense that at least generated part of vmcore can correctly be analized by crash.

I agree. Dealing with ENOSPC makes sense. Save as much as possible and
display a warning message that dump got truncated. Still dump file should
be readable by "crash". That way we will not lose the whole dump and 
whatever has been saved, might be sufficient to analyze the dump.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-07-11 13:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 12:39 [PATCH] Makedumpfile: vmcore size estimate Baoquan He
2014-06-11 13:57 ` Baoquan He
2014-06-20  1:07   ` Atsushi Kumagai
2014-06-20  1:58     ` bhe
2014-06-20  2:33     ` bhe
2014-06-23 12:57     ` Vivek Goyal
2014-06-26  8:21       ` Atsushi Kumagai
2014-06-26 12:31         ` Vivek Goyal
2014-07-02  0:25           ` Atsushi Kumagai
2014-07-02  8:13             ` bhe
2014-07-04  1:53               ` HATAYAMA, Daisuke
2014-07-04  9:44                 ` bhe
2014-07-11 13:51                 ` Vivek Goyal
2014-06-23 12:36 ` Vivek Goyal
2014-06-23 13:05   ` Baoquan He
2014-06-23 13:55     ` Anders Rayner-Karlsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.