All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] makedumpfile: Fix handling of ELF segments with p_memsz > p_filesz
@ 2016-02-10  7:47 Petr Tesarik
  2016-02-10  7:49 ` [PATCH 1/3] makedumpfile: Keep segment memory size when re-filtering ELF dumps Petr Tesarik
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Petr Tesarik @ 2016-02-10  7:47 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

Hello Ivan & Atsushi,

after spending some time on this, I think I have a complete working
solution. I chose to rewrite readpage_elf from scratch. Unfortunately,
I'm unable to verify that it still works the same (sans bugs) because
of a regression in commit d18796d090623d18f46c8dc608dcad9960fbbe9b
(reported on the mailing list).

I was able to verify that --dump-dmesg works on a variety of weird ELF
files that I created manually (and the output hasn't changed). In my
opinion, this is enough to verify that the read routine works, but the
final decision is up to you.

Petr Tesarik

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] makedumpfile: Keep segment memory size when re-filtering ELF dumps
  2016-02-10  7:47 [PATCH 0/3] makedumpfile: Fix handling of ELF segments with p_memsz > p_filesz Petr Tesarik
@ 2016-02-10  7:49 ` Petr Tesarik
  2016-02-10  7:49 ` [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered Petr Tesarik
  2016-02-10  7:50 ` [PATCH 3/3] makedumpfile: Rewrite readpage_elf Petr Tesarik
  2 siblings, 0 replies; 15+ messages in thread
From: Petr Tesarik @ 2016-02-10  7:49 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

Originally, makedumpfile was designed to read from /proc/vmcore, where
each segment's p_memsz is equal to its p_filesz (see below). However,
makedumpfile can also be used to re-filter an already filtered ELF dump
file, where memory size may be larger than file size. In that case the
memory size should be used as the size of the segment. This affects:

1. max_mapnr
   This value is computed as the highest phys_end. If the last segment
   was filtered, max_mapnr may be too small, and the crash utility will
   refuse to read that memory (even with --zero_excluded).

2. p_memsz field in ELF dumps
   The resulting ELF segment p_memsz will be capped to original file's
   p_filesz, ignoring the original p_memsz.

3. memory holes in KDUMP dumps
   Pages excluded in the original ELF dump will be appear as memory
   holes in the resulting KDUMP file's first bitmap.

4. vtop translation
   Virtual addresses that were filtered out in the original ELF file
   cannot be translated to physical addresses using the generic vtop
   translation.

My fix uses p_memsz to set physical and virtual extents of ELF segments,
because this is their actual size. However, file size is important when
accessing page data, so it must be stored separately and checked when
translating a physical address to a file offset.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>

---
 elf_info.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/elf_info.c
+++ b/elf_info.c
@@ -38,6 +38,7 @@
 
 struct pt_load_segment {
 	off_t			file_offset;
+	off_t			file_size;
 	unsigned long long	phys_start;
 	unsigned long long	phys_end;
 	unsigned long long	virt_start;
@@ -162,10 +163,11 @@ dump_Elf_load(Elf64_Phdr *prog, int num_
 
 	pls = &pt_loads[num_load];
 	pls->phys_start  = prog->p_paddr;
-	pls->phys_end    = pls->phys_start + prog->p_filesz;
+	pls->phys_end    = pls->phys_start + prog->p_memsz;
 	pls->virt_start  = prog->p_vaddr;
-	pls->virt_end    = pls->virt_start + prog->p_filesz;
+	pls->virt_end    = pls->virt_start + prog->p_memsz;
 	pls->file_offset = prog->p_offset;
+	pls->file_size   = prog->p_filesz;
 
 	DEBUG_MSG("LOAD (%d)\n", num_load);
 	DEBUG_MSG("  phys_start : %llx\n", pls->phys_start);
@@ -462,7 +464,7 @@ paddr_to_offset(unsigned long long paddr
 	for (i = offset = 0; i < num_pt_loads; i++) {
 		pls = &pt_loads[i];
 		if ((paddr >= pls->phys_start)
-		    && (paddr < pls->phys_end)) {
+		    && (paddr < pls->phys_start + pls->file_size)) {
 			offset = (off_t)(paddr - pls->phys_start) +
 				pls->file_offset;
 			break;
@@ -480,16 +482,14 @@ paddr_to_offset2(unsigned long long padd
 {
 	int i;
 	off_t offset;
-	unsigned long long len;
 	struct pt_load_segment *pls;
 
 	for (i = offset = 0; i < num_pt_loads; i++) {
 		pls = &pt_loads[i];
-		len = pls->phys_end - pls->phys_start;
 		if ((paddr >= pls->phys_start)
-		    && (paddr < pls->phys_end)
+		    && (paddr < pls->phys_start + pls->file_size)
 		    && (hint >= pls->file_offset)
-		    && (hint < pls->file_offset + len)) {
+		    && (hint < pls->file_offset + pls->file_size)) {
 			offset = (off_t)(paddr - pls->phys_start) +
 				pls->file_offset;
 			break;

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered
  2016-02-10  7:47 [PATCH 0/3] makedumpfile: Fix handling of ELF segments with p_memsz > p_filesz Petr Tesarik
  2016-02-10  7:49 ` [PATCH 1/3] makedumpfile: Keep segment memory size when re-filtering ELF dumps Petr Tesarik
@ 2016-02-10  7:49 ` Petr Tesarik
  2016-05-18  7:44   ` Atsushi Kumagai
  2016-02-10  7:50 ` [PATCH 3/3] makedumpfile: Rewrite readpage_elf Petr Tesarik
  2 siblings, 1 reply; 15+ messages in thread
From: Petr Tesarik @ 2016-02-10  7:49 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

Pages that were filtered in the original dump file should also be filtered
in the destination file, i.e. pages between p_filesz and p_memsz must have
their corresponding bit cleared in the 2nd bitmap. Note that in theory,
such ELF segments might not refer to filtered pages, because the crash
kernel copies the program headers verbatim from elfcorehdr=. However, all
common sources of /proc/vmcore program headers use the same value for each
p_memsz and p_filesz:

a. kexec(8)
   Program headers are created in two places, but in both cases the value
   of p_memsz is equal to p_filesz.
   Reference: kexec/crashdump-elf.c in kexec-tools

b. kexec_file_load(2)
   Conceptually the same code as kexec(8): two places, both set p_filesz
   and p_memsz to the same value.
   Reference: kexec/crashdump-elf.c in Linux kernel

c. created in the crash kernel (s390)
   On s390, program headers are created in the crash kernel, but both
   p_filesz and p_memsz are set to "end - start", i.e. the same value.
   Reference: arch/s390/kernel/crash_dump.c in Linux kernel

The above means that p_memsz > p_filesz only in ELF files that were
processed by makedumpfile, hence it can be safely assumed that pages
between p_filesz and p_memsz were filtered out.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>

---
 elf_info.c     |   26 ++++++++++++++++++++++++++
 elf_info.h     |    5 +++++
 makedumpfile.c |   32 ++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

--- a/elf_info.c
+++ b/elf_info.c
@@ -1096,6 +1096,32 @@ get_pt_load(int idx,
 	return TRUE;
 }
 
+int
+get_pt_load_extents(int idx,
+	unsigned long long *phys_start,
+	unsigned long long *phys_end,
+	off_t *file_offset,
+	off_t *file_size)
+{
+	struct pt_load_segment *pls;
+
+	if (num_pt_loads <= idx)
+		return FALSE;
+
+	pls = &pt_loads[idx];
+
+	if (phys_start)
+		*phys_start  = pls->phys_start;
+	if (phys_end)
+		*phys_end    = pls->phys_end;
+	if (file_offset)
+		*file_offset = pls->file_offset;
+	if (file_size)
+		*file_size   = pls->file_size;
+
+	return TRUE;
+}
+
 unsigned int
 get_num_pt_loads(void)
 {
--- a/elf_info.h
+++ b/elf_info.h
@@ -61,6 +61,11 @@ int get_pt_load(int idx,
 	unsigned long long *phys_end,
 	unsigned long long *virt_start,
 	unsigned long long *virt_end);
+int get_pt_load_extents(int idx,
+	unsigned long long *phys_start,
+	unsigned long long *phys_end,
+	off_t *file_offset,
+	off_t *file_size);
 unsigned int get_num_pt_loads(void);
 
 void set_nr_cpus(int num);
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -4395,6 +4395,32 @@ read_start_flat_header(void)
 	return TRUE;
 }
 
+static void
+exclude_nodata_pages(struct cycle *cycle)
+{
+	int i;
+	unsigned long long phys_start, phys_end;
+	off_t file_size;
+
+	i = 0;
+	while (get_pt_load_extents(i, &phys_start, &phys_end,
+				   NULL, &file_size)) {
+		unsigned long long pfn, pfn_end;
+
+		pfn = paddr_to_pfn(phys_start + file_size);
+		pfn_end = paddr_to_pfn(phys_end);
+		if (pfn < cycle->start_pfn)
+			pfn = cycle->start_pfn;
+		if (pfn_end >= cycle->end_pfn)
+			pfn_end = cycle->end_pfn - 1;
+		while (pfn <= pfn_end) {
+			clear_bit_on_2nd_bitmap(pfn, cycle);
+			++pfn;
+		}
+		++i;
+	}
+}
+
 int
 read_flat_data_header(struct makedumpfile_data_header *fdh)
 {
@@ -6087,6 +6113,12 @@ create_2nd_bitmap(struct cycle *cycle)
 	}
 
 	/*
+	 * If re-filtering ELF dump, exclude pages that were already
+	 * excluded in the original file.
+	 */
+	exclude_nodata_pages(cycle);
+
+	/*
 	 * Exclude cache pages, cache private pages, user data pages,
 	 * and hwpoison pages.
 	 */

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/3] makedumpfile: Rewrite readpage_elf
  2016-02-10  7:47 [PATCH 0/3] makedumpfile: Fix handling of ELF segments with p_memsz > p_filesz Petr Tesarik
  2016-02-10  7:49 ` [PATCH 1/3] makedumpfile: Keep segment memory size when re-filtering ELF dumps Petr Tesarik
  2016-02-10  7:49 ` [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered Petr Tesarik
@ 2016-02-10  7:50 ` Petr Tesarik
  2016-02-10 22:18   ` Ivan Delalande
  2016-02-29 18:55   ` Petr Tesarik
  2 siblings, 2 replies; 15+ messages in thread
From: Petr Tesarik @ 2016-02-10  7:50 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

The current code in readpage_elf (and readpage_elf_parallel) is extremely
hard to follow. Additionally, it still does not cover all possible cases.
For example, attempts to read outside of any ELF segment will end up with
phys_start being 0, frac_head a negative number, interpreted as a large
positive number by memset() and write past buffer end.

Instead of trying to handle even more "corner cases", I rewrote the
algorithm from scratch. The basic idea is simple: set a goal to fill the
page buffer with data, then work towards that goal by:

  - filling holes with zeroes (see Note below),
  - p_filesz portions with file data and
  - remaining p_memsz portions again with zeroes.

Repeat this method for each LOAD until the goal is achieved, or an error
occurs. In most cases, the loop runs only once.

Note: A "hole" is data at a physical address that is not covered by any
ELF LOAD program header. In other words, the ELF file does not specify
any data for such a hole (not even zeroes). So, why does makedumpfile
fill them with zeroes? It's because makedumpfile works with page
granularity (the compressed format does not even have a way to store
a partial page), so if only part of a page is stored, a complete page
must be provided to make this partial data accessible.

Credits to Ivan Delalande <colona@arista.com> who first found the
problem and wrote the original fix.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>

---
 elf_info.c     |   28 +++++++
 elf_info.h     |    1 
 makedumpfile.c |  208 ++++++++++++++++++++++-----------------------------------
 3 files changed, 110 insertions(+), 127 deletions(-)

--- a/elf_info.c
+++ b/elf_info.c
@@ -691,6 +691,34 @@ get_max_paddr(void)
 	return max_paddr;
 }
 
+/*
+ * Find the LOAD segment which is closest to the requested
+ * physical address within a given distance.
+ *  If there is no such segment, return a negative number.
+ */
+int
+closest_pt_load(unsigned long long paddr, unsigned long distance)
+{
+	int i, bestidx;
+	struct pt_load_segment *pls;
+	unsigned long bestdist;
+
+	bestdist = distance;
+	bestidx = -1;
+	for (i = 0; i < num_pt_loads; ++i) {
+		pls = &pt_loads[i];
+		if (paddr >= pls->phys_end)
+			continue;
+		if (paddr >= pls->phys_start)
+			return i;	/* Exact match */
+		if (bestdist > pls->phys_start - paddr) {
+			bestdist = pls->phys_start - paddr;
+			bestidx = i;
+		}
+	}
+	return bestidx;
+}
+
 int
 get_elf64_ehdr(int fd, char *filename, Elf64_Ehdr *ehdr)
 {
--- a/elf_info.h
+++ b/elf_info.h
@@ -39,6 +39,7 @@ off_t offset_to_pt_load_end(off_t offset
 unsigned long long vaddr_to_paddr_general(unsigned long long vaddr);
 off_t vaddr_to_offset_slow(int fd, char *filename, unsigned long long vaddr);
 unsigned long long get_max_paddr(void);
+int closest_pt_load(unsigned long long paddr, unsigned long distance);
 
 int page_is_fractional(off_t page_offset);
 
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -648,73 +648,51 @@ read_from_vmcore_parallel(int fd_memory,
 static int
 readpage_elf(unsigned long long paddr, void *bufptr)
 {
-	off_t offset1, offset2;
-	size_t size1, size2;
-	unsigned long long phys_start, phys_end, frac_head = 0;
-
-	offset1 = paddr_to_offset(paddr);
-	offset2 = paddr_to_offset(paddr + info->page_size);
-	phys_start = paddr;
-	phys_end = paddr + info->page_size;
-
-	/*
-	 * Check the case phys_start isn't aligned by page size like below:
-	 *
-	 *                           phys_start
-	 *                           = 0x40ffda7000
-	 *         |<-- frac_head -->|------------- PT_LOAD -------------
-	 *     ----+-----------------------+---------------------+----
-	 *         |         pfn:N         |       pfn:N+1       | ...
-	 *     ----+-----------------------+---------------------+----
-	 *         |
-	 *     pfn_to_paddr(pfn:N)               # page size = 16k
-	 *     = 0x40ffda4000
-	 */
-	if (!offset1) {
-		phys_start = page_head_to_phys_start(paddr);
-		offset1 = paddr_to_offset(phys_start);
-		frac_head = phys_start - paddr;
-		memset(bufptr, 0, frac_head);
-	}
-
-	/*
-	 * Check the case phys_end isn't aligned by page size like the
-	 * phys_start's case.
-	 */
-	if (!offset2) {
-		phys_end = page_head_to_phys_end(paddr);
-		offset2 = paddr_to_offset(phys_end);
-		memset(bufptr + (phys_end - paddr), 0, info->page_size - (phys_end - paddr));
-	}
+	int idx;
+	off_t offset, size;
+	void *p, *endp;
+	unsigned long long phys_start, phys_end;
+
+	p = bufptr;
+	endp = p + info->page_size;
+	while (p < endp) {
+		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
+		if (idx < 0)
+			break;
+
+		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
+		if (phys_start > paddr + (p - bufptr)) {
+			memset(p, 0, phys_start - paddr);
+			p += phys_start - paddr;
+		}
 
-	/*
-	 * Check the separated page on different PT_LOAD segments.
-	 */
-	if (offset1 + (phys_end - phys_start) == offset2) {
-		size1 = phys_end - phys_start;
-	} else {
-		for (size1 = 1; size1 < info->page_size - frac_head; size1++) {
-			offset2 = paddr_to_offset(phys_start + size1);
-			if (offset1 + size1 != offset2)
-				break;
+		offset += paddr - phys_start;
+		if (size > paddr - phys_start) {
+			size -= paddr - phys_start;
+			if (size > endp - p)
+				size = endp - p;
+			if (!read_from_vmcore(offset, p, size)) {
+				ERRMSG("Can't read the dump memory(%s).\n",
+				       info->name_memory);
+				return FALSE;
+			}
+			p += size;
+		}
+		if (p < endp) {
+			size = phys_end - paddr;
+			if (size > endp - p)
+				size = endp - p;
+			memset(p, 0, size);
+			p += size;
 		}
 	}
 
-	if(!read_from_vmcore(offset1, bufptr + frac_head, size1)) {
-		ERRMSG("Can't read the dump memory(%s).\n",
-		       info->name_memory);
+	if (p == bufptr) {
+		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
+		       paddr);
 		return FALSE;
-	}
-
-	if (size1 + frac_head != info->page_size) {
-		size2 = phys_end - (phys_start + size1);
-
-		if(!read_from_vmcore(offset2, bufptr + frac_head + size1, size2)) {
-			ERRMSG("Can't read the dump memory(%s).\n",
-			       info->name_memory);
-			return FALSE;
-		}
-	}
+	} else if (p < bufptr)
+		memset(p, 0, endp - p);
 
 	return TRUE;
 }
@@ -722,76 +700,52 @@ readpage_elf(unsigned long long paddr, v
 static int
 readpage_elf_parallel(int fd_memory, unsigned long long paddr, void *bufptr)
 {
-	off_t offset1, offset2;
-	size_t size1, size2;
-	unsigned long long phys_start, phys_end, frac_head = 0;
-
-	offset1 = paddr_to_offset(paddr);
-	offset2 = paddr_to_offset(paddr + info->page_size);
-	phys_start = paddr;
-	phys_end = paddr + info->page_size;
-
-	/*
-	 * Check the case phys_start isn't aligned by page size like below:
-	 *
-	 *                           phys_start
-	 *                           = 0x40ffda7000
-	 *         |<-- frac_head -->|------------- PT_LOAD -------------
-	 *     ----+-----------------------+---------------------+----
-	 *         |         pfn:N         |       pfn:N+1       | ...
-	 *     ----+-----------------------+---------------------+----
-	 *         |
-	 *     pfn_to_paddr(pfn:N)               # page size = 16k
-	 *     = 0x40ffda4000
-	 */
-	if (!offset1) {
-		phys_start = page_head_to_phys_start(paddr);
-		offset1 = paddr_to_offset(phys_start);
-		frac_head = phys_start - paddr;
-		memset(bufptr, 0, frac_head);
-	}
-
-	/*
-	 * Check the case phys_end isn't aligned by page size like the
-	 * phys_start's case.
-	 */
-	if (!offset2) {
-		phys_end = page_head_to_phys_end(paddr);
-		offset2 = paddr_to_offset(phys_end);
-		memset(bufptr + (phys_end - paddr), 0, info->page_size
-							- (phys_end - paddr));
-	}
+	int idx;
+	off_t offset, size;
+	void *p, *endp;
+	unsigned long long phys_start, phys_end;
+
+	p = bufptr;
+	endp = p + info->page_size;
+	while (p < endp) {
+		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
+		if (idx < 0)
+			break;
+
+		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
+		if (phys_start > paddr + (p - bufptr)) {
+			memset(p, 0, phys_start - paddr);
+			p += phys_start - paddr;
+		}
 
-	/*
-	 * Check the separated page on different PT_LOAD segments.
-	 */
-	if (offset1 + (phys_end - phys_start) == offset2) {
-		size1 = phys_end - phys_start;
-	} else {
-		for (size1 = 1; size1 < info->page_size - frac_head; size1++) {
-			offset2 = paddr_to_offset(phys_start + size1);
-			if (offset1 + size1 != offset2)
-				break;
+		offset += paddr - phys_start;
+		if (size > paddr - phys_start) {
+			size -= paddr - phys_start;
+			if (size > endp - p)
+				size = endp - p;
+			if (!read_from_vmcore_parallel(fd_memory, offset, bufptr,
+						       size)) {
+				ERRMSG("Can't read the dump memory(%s).\n",
+				       info->name_memory);
+				return FALSE;
+			}
+			p += size;
+		}
+		if (p < endp) {
+			size = phys_end - paddr;
+			if (size > endp - p)
+				size = endp - p;
+			memset(p, 0, size);
+			p += size;
 		}
 	}
 
-	if(!read_from_vmcore_parallel(fd_memory, offset1, bufptr + frac_head,
-								size1)) {
-		ERRMSG("Can't read the dump memory(%s).\n",
-		       info->name_memory);
+	if (p == bufptr) {
+		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
+		       paddr);
 		return FALSE;
-	}
-
-	if (size1 + frac_head != info->page_size) {
-		size2 = phys_end - (phys_start + size1);
-
-		if(!read_from_vmcore_parallel(fd_memory, offset2,
-					bufptr + frac_head + size1, size2)) {
-			ERRMSG("Can't read the dump memory(%s).\n",
-			       info->name_memory);
-			return FALSE;
-		}
-	}
+	} else if (p < bufptr)
+		memset(p, 0, endp - p);
 
 	return TRUE;
 }

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] makedumpfile: Rewrite readpage_elf
  2016-02-10  7:50 ` [PATCH 3/3] makedumpfile: Rewrite readpage_elf Petr Tesarik
@ 2016-02-10 22:18   ` Ivan Delalande
  2016-02-17  7:58     ` Atsushi Kumagai
  2016-02-29 18:55   ` Petr Tesarik
  1 sibling, 1 reply; 15+ messages in thread
From: Ivan Delalande @ 2016-02-10 22:18 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Atsushi Kumagai, kexec mailing list

On Wed, Feb 10, 2016 at 08:50:09AM +0100, Petr Tesarik wrote:
> The current code in readpage_elf (and readpage_elf_parallel) is extremely
> hard to follow. Additionally, it still does not cover all possible cases.
> For example, attempts to read outside of any ELF segment will end up with
> phys_start being 0, frac_head a negative number, interpreted as a large
> positive number by memset() and write past buffer end.
> 
> Instead of trying to handle even more "corner cases", I rewrote the
> algorithm from scratch. The basic idea is simple: set a goal to fill the
> page buffer with data, then work towards that goal by:
> 
>   - filling holes with zeroes (see Note below),
>   - p_filesz portions with file data and
>   - remaining p_memsz portions again with zeroes.
> 
> Repeat this method for each LOAD until the goal is achieved, or an error
> occurs. In most cases, the loop runs only once.
> 
> Note: A "hole" is data at a physical address that is not covered by any
> ELF LOAD program header. In other words, the ELF file does not specify
> any data for such a hole (not even zeroes). So, why does makedumpfile
> fill them with zeroes? It's because makedumpfile works with page
> granularity (the compressed format does not even have a way to store
> a partial page), so if only part of a page is stored, a complete page
> must be provided to make this partial data accessible.
> 
> Credits to Ivan Delalande <colona@arista.com> who first found the
> problem and wrote the original fix.
> 
> Signed-off-by: Petr Tesarik <ptesarik@suse.com>

Tested-by: Ivan Delalande <colona@arista.com>

Dump-dmesg works well and gives the expected results with our various
setups (x86_64 only). Thanks for your work Petr!

-- 
Ivan "Colona" Delalande
Arista Networks

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 3/3] makedumpfile: Rewrite readpage_elf
  2016-02-10 22:18   ` Ivan Delalande
@ 2016-02-17  7:58     ` Atsushi Kumagai
  0 siblings, 0 replies; 15+ messages in thread
From: Atsushi Kumagai @ 2016-02-17  7:58 UTC (permalink / raw)
  To: Ivan Delalande, Petr Tesarik; +Cc: kexec mailing list

>On Wed, Feb 10, 2016 at 08:50:09AM +0100, Petr Tesarik wrote:
>> The current code in readpage_elf (and readpage_elf_parallel) is extremely
>> hard to follow. Additionally, it still does not cover all possible cases.
>> For example, attempts to read outside of any ELF segment will end up with
>> phys_start being 0, frac_head a negative number, interpreted as a large
>> positive number by memset() and write past buffer end.
>>
>> Instead of trying to handle even more "corner cases", I rewrote the
>> algorithm from scratch. The basic idea is simple: set a goal to fill the
>> page buffer with data, then work towards that goal by:
>>
>>   - filling holes with zeroes (see Note below),
>>   - p_filesz portions with file data and
>>   - remaining p_memsz portions again with zeroes.
>>
>> Repeat this method for each LOAD until the goal is achieved, or an error
>> occurs. In most cases, the loop runs only once.
>>
>> Note: A "hole" is data at a physical address that is not covered by any
>> ELF LOAD program header. In other words, the ELF file does not specify
>> any data for such a hole (not even zeroes). So, why does makedumpfile
>> fill them with zeroes? It's because makedumpfile works with page
>> granularity (the compressed format does not even have a way to store
>> a partial page), so if only part of a page is stored, a complete page
>> must be provided to make this partial data accessible.
>>
>> Credits to Ivan Delalande <colona@arista.com> who first found the
>> problem and wrote the original fix.
>>
>> Signed-off-by: Petr Tesarik <ptesarik@suse.com>
>
>Tested-by: Ivan Delalande <colona@arista.com>
>
>Dump-dmesg works well and gives the expected results with our various
>setups (x86_64 only). Thanks for your work Petr!

Thanks for your great work, Petr and Ivan.
I'll merge this patch set into v1.6.0.


Thanks,
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] makedumpfile: Rewrite readpage_elf
  2016-02-10  7:50 ` [PATCH 3/3] makedumpfile: Rewrite readpage_elf Petr Tesarik
  2016-02-10 22:18   ` Ivan Delalande
@ 2016-02-29 18:55   ` Petr Tesarik
  2016-02-29 18:59     ` [PATCH] makedumpfile: When reading partial ELF pages, check final pointer against buffer end Petr Tesarik
  1 sibling, 1 reply; 15+ messages in thread
From: Petr Tesarik @ 2016-02-29 18:55 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

Hi,

I just found a silly typo in my patch, see below.

On Wed, 10 Feb 2016 08:50:09 +0100
Petr Tesarik <ptesarik@suse.cz> wrote:

> The current code in readpage_elf (and readpage_elf_parallel) is extremely
> hard to follow. Additionally, it still does not cover all possible cases.
> For example, attempts to read outside of any ELF segment will end up with
> phys_start being 0, frac_head a negative number, interpreted as a large
> positive number by memset() and write past buffer end.
> 
> Instead of trying to handle even more "corner cases", I rewrote the
> algorithm from scratch. The basic idea is simple: set a goal to fill the
> page buffer with data, then work towards that goal by:
> 
>   - filling holes with zeroes (see Note below),
>   - p_filesz portions with file data and
>   - remaining p_memsz portions again with zeroes.
> 
> Repeat this method for each LOAD until the goal is achieved, or an error
> occurs. In most cases, the loop runs only once.
> 
> Note: A "hole" is data at a physical address that is not covered by any
> ELF LOAD program header. In other words, the ELF file does not specify
> any data for such a hole (not even zeroes). So, why does makedumpfile
> fill them with zeroes? It's because makedumpfile works with page
> granularity (the compressed format does not even have a way to store
> a partial page), so if only part of a page is stored, a complete page
> must be provided to make this partial data accessible.
> 
> Credits to Ivan Delalande <colona@arista.com> who first found the
> problem and wrote the original fix.
> 
> Signed-off-by: Petr Tesarik <ptesarik@suse.com>
> 
> ---
>  elf_info.c     |   28 +++++++
>  elf_info.h     |    1 
>  makedumpfile.c |  208 ++++++++++++++++++++++-----------------------------------
>  3 files changed, 110 insertions(+), 127 deletions(-)
> 
>[...]
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -648,73 +648,51 @@ read_from_vmcore_parallel(int fd_memory,
>  static int
>  readpage_elf(unsigned long long paddr, void *bufptr)
>  {
> -	off_t offset1, offset2;
> -	size_t size1, size2;
> -	unsigned long long phys_start, phys_end, frac_head = 0;
> -
> -	offset1 = paddr_to_offset(paddr);
> -	offset2 = paddr_to_offset(paddr + info->page_size);
> -	phys_start = paddr;
> -	phys_end = paddr + info->page_size;
> -
> -	/*
> -	 * Check the case phys_start isn't aligned by page size like below:
> -	 *
> -	 *                           phys_start
> -	 *                           = 0x40ffda7000
> -	 *         |<-- frac_head -->|------------- PT_LOAD -------------
> -	 *     ----+-----------------------+---------------------+----
> -	 *         |         pfn:N         |       pfn:N+1       | ...
> -	 *     ----+-----------------------+---------------------+----
> -	 *         |
> -	 *     pfn_to_paddr(pfn:N)               # page size = 16k
> -	 *     = 0x40ffda4000
> -	 */
> -	if (!offset1) {
> -		phys_start = page_head_to_phys_start(paddr);
> -		offset1 = paddr_to_offset(phys_start);
> -		frac_head = phys_start - paddr;
> -		memset(bufptr, 0, frac_head);
> -	}
> -
> -	/*
> -	 * Check the case phys_end isn't aligned by page size like the
> -	 * phys_start's case.
> -	 */
> -	if (!offset2) {
> -		phys_end = page_head_to_phys_end(paddr);
> -		offset2 = paddr_to_offset(phys_end);
> -		memset(bufptr + (phys_end - paddr), 0, info->page_size - (phys_end - paddr));
> -	}
> +	int idx;
> +	off_t offset, size;
> +	void *p, *endp;
> +	unsigned long long phys_start, phys_end;
> +
> +	p = bufptr;
> +	endp = p + info->page_size;
> +	while (p < endp) {
> +		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
> +		if (idx < 0)
> +			break;
> +
> +		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
> +		if (phys_start > paddr + (p - bufptr)) {
> +			memset(p, 0, phys_start - paddr);
> +			p += phys_start - paddr;
> +		}
>  
> -	/*
> -	 * Check the separated page on different PT_LOAD segments.
> -	 */
> -	if (offset1 + (phys_end - phys_start) == offset2) {
> -		size1 = phys_end - phys_start;
> -	} else {
> -		for (size1 = 1; size1 < info->page_size - frac_head; size1++) {
> -			offset2 = paddr_to_offset(phys_start + size1);
> -			if (offset1 + size1 != offset2)
> -				break;
> +		offset += paddr - phys_start;
> +		if (size > paddr - phys_start) {
> +			size -= paddr - phys_start;
> +			if (size > endp - p)
> +				size = endp - p;
> +			if (!read_from_vmcore(offset, p, size)) {
> +				ERRMSG("Can't read the dump memory(%s).\n",
> +				       info->name_memory);
> +				return FALSE;
> +			}
> +			p += size;
> +		}
> +		if (p < endp) {
> +			size = phys_end - paddr;
> +			if (size > endp - p)
> +				size = endp - p;
> +			memset(p, 0, size);
> +			p += size;
>  		}
>  	}
>  
> -	if(!read_from_vmcore(offset1, bufptr + frac_head, size1)) {
> -		ERRMSG("Can't read the dump memory(%s).\n",
> -		       info->name_memory);
> +	if (p == bufptr) {
> +		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
> +		       paddr);
>  		return FALSE;
> -	}
> -
> -	if (size1 + frac_head != info->page_size) {
> -		size2 = phys_end - (phys_start + size1);
> -
> -		if(!read_from_vmcore(offset2, bufptr + frac_head + size1, size2)) {
> -			ERRMSG("Can't read the dump memory(%s).\n",
> -			       info->name_memory);
> -			return FALSE;
> -		}
> -	}
> +	} else if (p < bufptr)

Here, of course, it should be:

	} else if (p < endp)

> +		memset(p, 0, endp - p);
>  
>  	return TRUE;
>  }

And same in the second hunk.

I'm sending a patch in a minute.

Petr Tesarik

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] makedumpfile: When reading partial ELF pages, check final pointer against buffer end
  2016-02-29 18:55   ` Petr Tesarik
@ 2016-02-29 18:59     ` Petr Tesarik
  2016-02-29 19:40       ` Petr Tesarik
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Tesarik @ 2016-02-29 18:59 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

If the last part of a page is not present in the ELF file, it should
be replaced with zeroes. However, the check is incorrect.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>

diff --git a/makedumpfile.c b/makedumpfile.c
index 867b953..138ddec 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -691,7 +691,7 @@ readpage_elf(unsigned long long paddr, void *bufptr)
 		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
 		       paddr);
 		return FALSE;
-	} else if (p < bufptr)
+	} else if (p < endp)
 		memset(p, 0, endp - p);
 
 	return TRUE;
@@ -744,7 +744,7 @@ readpage_elf_parallel(int fd_memory, unsigned long long paddr, void *bufptr)
 		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
 		       paddr);
 		return FALSE;
-	} else if (p < bufptr)
+	} else if (p < endp)
 		memset(p, 0, endp - p);
 
 	return TRUE;

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] makedumpfile: When reading partial ELF pages, check final pointer against buffer end
  2016-02-29 18:59     ` [PATCH] makedumpfile: When reading partial ELF pages, check final pointer against buffer end Petr Tesarik
@ 2016-02-29 19:40       ` Petr Tesarik
  2016-03-08 11:58         ` [PATCH] makedumpfile: Fix several issues with reading ELF pages Petr Tesarik
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Tesarik @ 2016-02-29 19:40 UTC (permalink / raw)
  To: Atsushi Kumagai, Ivan Delalande; +Cc: kexec mailing list

Hi again,

On Mon, 29 Feb 2016 19:59:52 +0100
Petr Tesarik <ptesarik@suse.cz> wrote:

> If the last part of a page is not present in the ELF file, it should
> be replaced with zeroes. However, the check is incorrect.

While this fix is correct, it seems there are a few more errors in the
logic. I found this issues while adapting the code for libkdumpfile
(https://github.com/ptesarik/libkdumpfile). This project includes a
test suite, so let's postpone fixing the code in makedumpfile until I'm
done with writing a good set of tests for this other project and making
sure that all of them pass.

Stay tuned,
Petr Tesarik

> Signed-off-by: Petr Tesarik <ptesarik@suse.com>
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 867b953..138ddec 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -691,7 +691,7 @@ readpage_elf(unsigned long long paddr, void
> *bufptr) ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
>  		       paddr);
>  		return FALSE;
> -	} else if (p < bufptr)
> +	} else if (p < endp)
>  		memset(p, 0, endp - p);
>  
>  	return TRUE;
> @@ -744,7 +744,7 @@ readpage_elf_parallel(int fd_memory, unsigned
> long long paddr, void *bufptr) ERRMSG("Attempt to read non-existent
> page at 0x%llx.\n", paddr);
>  		return FALSE;
> -	} else if (p < bufptr)
> +	} else if (p < endp)
>  		memset(p, 0, endp - p);
>  
>  	return TRUE;


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] makedumpfile: Fix several issues with reading ELF pages
  2016-02-29 19:40       ` Petr Tesarik
@ 2016-03-08 11:58         ` Petr Tesarik
  2016-03-11  9:16           ` Atsushi Kumagai
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Tesarik @ 2016-03-08 11:58 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec mailing list

While adopting the algorithm for libkdumpfile, several corner cases
were found by a test case:

  1. If the last part of a page is not present in the ELF file, it
     should be replaced with zeroes. However, the check is incorrect.

  2. If the beginning of a page is not present, following data is
     read from an incorrect file offset.

  3. If the page is split among several segments, the current
     position in the buffer is not always taken into account.

Case 1 is a simple typo/braino (writing bufptr instead of endp).

To fix cases 2 and 3, it is best to update the paddr variable so that
it always corresponds to the current read position in the buffer.

I have tested the new code with a specially crafted ELF dump where one
page had the following (artificial) layout:

  #  PAGE RANGE    STORED IN   DATA
  --------------------------------------------------
  1  0x000-0x007   nowhere     fake zero
  2  0x008-0x067   LOAD #1     read from file
  3  0x068-0x06f   nowhere     fake zero
  4  0x070-0x13f   LOAD #2     read from file
  5  0x140-0x147   LOAD #2     zero (memsz > filesz)
  6  0x148-0xff7   LOAD #3     read from file
  7  0xff8-0xfff   nowhere     fake zero

Case 1 tests the conditional right after get_pt_load_extents().
Case 2 tests file read after missing data.
Case 3 tests the conditional from case 1 with non-zero buffer position.
Case 5 tests the last conditional in the loop (p < endp).
Case 6 tests exact match in get_pt_load_extents().
Case 7 tests the final conditional after the loop.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>

---
 makedumpfile.c |   18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -656,14 +656,15 @@ readpage_elf(unsigned long long paddr, v
 	p = bufptr;
 	endp = p + info->page_size;
 	while (p < endp) {
-		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
+		idx = closest_pt_load(paddr, endp - p);
 		if (idx < 0)
 			break;
 
 		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
-		if (phys_start > paddr + (p - bufptr)) {
+		if (phys_start > paddr) {
 			memset(p, 0, phys_start - paddr);
 			p += phys_start - paddr;
+			paddr = phys_start;
 		}
 
 		offset += paddr - phys_start;
@@ -677,6 +678,7 @@ readpage_elf(unsigned long long paddr, v
 				return FALSE;
 			}
 			p += size;
+			paddr += size;
 		}
 		if (p < endp) {
 			size = phys_end - paddr;
@@ -684,6 +686,7 @@ readpage_elf(unsigned long long paddr, v
 				size = endp - p;
 			memset(p, 0, size);
 			p += size;
+			paddr += size;
 		}
 	}
 
@@ -691,7 +694,7 @@ readpage_elf(unsigned long long paddr, v
 		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
 		       paddr);
 		return FALSE;
-	} else if (p < bufptr)
+	} else if (p < endp)
 		memset(p, 0, endp - p);
 
 	return TRUE;
@@ -708,14 +711,15 @@ readpage_elf_parallel(int fd_memory, uns
 	p = bufptr;
 	endp = p + info->page_size;
 	while (p < endp) {
-		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
+		idx = closest_pt_load(paddr, endp - p);
 		if (idx < 0)
 			break;
 
 		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
-		if (phys_start > paddr + (p - bufptr)) {
+		if (phys_start > paddr) {
 			memset(p, 0, phys_start - paddr);
 			p += phys_start - paddr;
+			paddr = phys_start;
 		}
 
 		offset += paddr - phys_start;
@@ -730,6 +734,7 @@ readpage_elf_parallel(int fd_memory, uns
 				return FALSE;
 			}
 			p += size;
+			paddr += size;
 		}
 		if (p < endp) {
 			size = phys_end - paddr;
@@ -737,6 +742,7 @@ readpage_elf_parallel(int fd_memory, uns
 				size = endp - p;
 			memset(p, 0, size);
 			p += size;
+			paddr += size;
 		}
 	}
 
@@ -744,7 +750,7 @@ readpage_elf_parallel(int fd_memory, uns
 		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
 		       paddr);
 		return FALSE;
-	} else if (p < bufptr)
+	} else if (p < endp)
 		memset(p, 0, endp - p);
 
 	return TRUE;

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] makedumpfile: Fix several issues with reading ELF pages
  2016-03-08 11:58         ` [PATCH] makedumpfile: Fix several issues with reading ELF pages Petr Tesarik
@ 2016-03-11  9:16           ` Atsushi Kumagai
  0 siblings, 0 replies; 15+ messages in thread
From: Atsushi Kumagai @ 2016-03-11  9:16 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: kexec mailing list

Hello Petr,

Thanks for your investigation and fixing the issue.
I'll merge this into v1.6.0.

Regards,
Atsushi Kumagai

>While adopting the algorithm for libkdumpfile, several corner cases
>were found by a test case:
>
>  1. If the last part of a page is not present in the ELF file, it
>     should be replaced with zeroes. However, the check is incorrect.
>
>  2. If the beginning of a page is not present, following data is
>     read from an incorrect file offset.
>
>  3. If the page is split among several segments, the current
>     position in the buffer is not always taken into account.
>
>Case 1 is a simple typo/braino (writing bufptr instead of endp).
>
>To fix cases 2 and 3, it is best to update the paddr variable so that
>it always corresponds to the current read position in the buffer.
>
>I have tested the new code with a specially crafted ELF dump where one
>page had the following (artificial) layout:
>
>  #  PAGE RANGE    STORED IN   DATA
>  --------------------------------------------------
>  1  0x000-0x007   nowhere     fake zero
>  2  0x008-0x067   LOAD #1     read from file
>  3  0x068-0x06f   nowhere     fake zero
>  4  0x070-0x13f   LOAD #2     read from file
>  5  0x140-0x147   LOAD #2     zero (memsz > filesz)
>  6  0x148-0xff7   LOAD #3     read from file
>  7  0xff8-0xfff   nowhere     fake zero
>
>Case 1 tests the conditional right after get_pt_load_extents().
>Case 2 tests file read after missing data.
>Case 3 tests the conditional from case 1 with non-zero buffer position.
>Case 5 tests the last conditional in the loop (p < endp).
>Case 6 tests exact match in get_pt_load_extents().
>Case 7 tests the final conditional after the loop.
>
>Signed-off-by: Petr Tesarik <ptesarik@suse.com>
>
>---
> makedumpfile.c |   18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
>--- a/makedumpfile.c
>+++ b/makedumpfile.c
>@@ -656,14 +656,15 @@ readpage_elf(unsigned long long paddr, v
> 	p = bufptr;
> 	endp = p + info->page_size;
> 	while (p < endp) {
>-		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
>+		idx = closest_pt_load(paddr, endp - p);
> 		if (idx < 0)
> 			break;
>
> 		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
>-		if (phys_start > paddr + (p - bufptr)) {
>+		if (phys_start > paddr) {
> 			memset(p, 0, phys_start - paddr);
> 			p += phys_start - paddr;
>+			paddr = phys_start;
> 		}
>
> 		offset += paddr - phys_start;
>@@ -677,6 +678,7 @@ readpage_elf(unsigned long long paddr, v
> 				return FALSE;
> 			}
> 			p += size;
>+			paddr += size;
> 		}
> 		if (p < endp) {
> 			size = phys_end - paddr;
>@@ -684,6 +686,7 @@ readpage_elf(unsigned long long paddr, v
> 				size = endp - p;
> 			memset(p, 0, size);
> 			p += size;
>+			paddr += size;
> 		}
> 	}
>
>@@ -691,7 +694,7 @@ readpage_elf(unsigned long long paddr, v
> 		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
> 		       paddr);
> 		return FALSE;
>-	} else if (p < bufptr)
>+	} else if (p < endp)
> 		memset(p, 0, endp - p);
>
> 	return TRUE;
>@@ -708,14 +711,15 @@ readpage_elf_parallel(int fd_memory, uns
> 	p = bufptr;
> 	endp = p + info->page_size;
> 	while (p < endp) {
>-		idx = closest_pt_load(paddr + (p - bufptr), endp - p);
>+		idx = closest_pt_load(paddr, endp - p);
> 		if (idx < 0)
> 			break;
>
> 		get_pt_load_extents(idx, &phys_start, &phys_end, &offset, &size);
>-		if (phys_start > paddr + (p - bufptr)) {
>+		if (phys_start > paddr) {
> 			memset(p, 0, phys_start - paddr);
> 			p += phys_start - paddr;
>+			paddr = phys_start;
> 		}
>
> 		offset += paddr - phys_start;
>@@ -730,6 +734,7 @@ readpage_elf_parallel(int fd_memory, uns
> 				return FALSE;
> 			}
> 			p += size;
>+			paddr += size;
> 		}
> 		if (p < endp) {
> 			size = phys_end - paddr;
>@@ -737,6 +742,7 @@ readpage_elf_parallel(int fd_memory, uns
> 				size = endp - p;
> 			memset(p, 0, size);
> 			p += size;
>+			paddr += size;
> 		}
> 	}
>
>@@ -744,7 +750,7 @@ readpage_elf_parallel(int fd_memory, uns
> 		ERRMSG("Attempt to read non-existent page at 0x%llx.\n",
> 		       paddr);
> 		return FALSE;
>-	} else if (p < bufptr)
>+	} else if (p < endp)
> 		memset(p, 0, endp - p);
>
> 	return TRUE;

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered
  2016-02-10  7:49 ` [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered Petr Tesarik
@ 2016-05-18  7:44   ` Atsushi Kumagai
  2016-06-07  4:18     ` Atsushi Kumagai
  0 siblings, 1 reply; 15+ messages in thread
From: Atsushi Kumagai @ 2016-05-18  7:44 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: kexec mailing list

Hello Petr,

Sorry for my late comment:

>Pages that were filtered in the original dump file should also be filtered
>in the destination file, i.e. pages between p_filesz and p_memsz must have
>their corresponding bit cleared in the 2nd bitmap. Note that in theory,
>such ELF segments might not refer to filtered pages, because the crash
>kernel copies the program headers verbatim from elfcorehdr=. However, all
>common sources of /proc/vmcore program headers use the same value for each
>p_memsz and p_filesz:
>
>a. kexec(8)
>   Program headers are created in two places, but in both cases the value
>   of p_memsz is equal to p_filesz.
>   Reference: kexec/crashdump-elf.c in kexec-tools
>
>b. kexec_file_load(2)
>   Conceptually the same code as kexec(8): two places, both set p_filesz
>   and p_memsz to the same value.
>   Reference: kexec/crashdump-elf.c in Linux kernel
>
>c. created in the crash kernel (s390)
>   On s390, program headers are created in the crash kernel, but both
>   p_filesz and p_memsz are set to "end - start", i.e. the same value.
>   Reference: arch/s390/kernel/crash_dump.c in Linux kernel
>
>The above means that p_memsz > p_filesz only in ELF files that were
>processed by makedumpfile, hence it can be safely assumed that pages
>between p_filesz and p_memsz were filtered out.
>
>Signed-off-by: Petr Tesarik <ptesarik@suse.com>
>
>---
> elf_info.c     |   26 ++++++++++++++++++++++++++
> elf_info.h     |    5 +++++
> makedumpfile.c |   32 ++++++++++++++++++++++++++++++++
> 3 files changed, 63 insertions(+)
>
>--- a/elf_info.c
>+++ b/elf_info.c
>@@ -1096,6 +1096,32 @@ get_pt_load(int idx,
> 	return TRUE;
> }
>
>+int
>+get_pt_load_extents(int idx,
>+	unsigned long long *phys_start,
>+	unsigned long long *phys_end,
>+	off_t *file_offset,
>+	off_t *file_size)
>+{
>+	struct pt_load_segment *pls;
>+
>+	if (num_pt_loads <= idx)
>+		return FALSE;
>+
>+	pls = &pt_loads[idx];
>+
>+	if (phys_start)
>+		*phys_start  = pls->phys_start;
>+	if (phys_end)
>+		*phys_end    = pls->phys_end;
>+	if (file_offset)
>+		*file_offset = pls->file_offset;
>+	if (file_size)
>+		*file_size   = pls->file_size;
>+
>+	return TRUE;
>+}
>+
> unsigned int
> get_num_pt_loads(void)
> {
>--- a/elf_info.h
>+++ b/elf_info.h
>@@ -61,6 +61,11 @@ int get_pt_load(int idx,
> 	unsigned long long *phys_end,
> 	unsigned long long *virt_start,
> 	unsigned long long *virt_end);
>+int get_pt_load_extents(int idx,
>+	unsigned long long *phys_start,
>+	unsigned long long *phys_end,
>+	off_t *file_offset,
>+	off_t *file_size);
> unsigned int get_num_pt_loads(void);
>
> void set_nr_cpus(int num);
>--- a/makedumpfile.c
>+++ b/makedumpfile.c
>@@ -4395,6 +4395,32 @@ read_start_flat_header(void)
> 	return TRUE;
> }
>
>+static void
>+exclude_nodata_pages(struct cycle *cycle)
>+{
>+	int i;
>+	unsigned long long phys_start, phys_end;
>+	off_t file_size;
>+
>+	i = 0;
>+	while (get_pt_load_extents(i, &phys_start, &phys_end,
>+				   NULL, &file_size)) {
>+		unsigned long long pfn, pfn_end;
>+
>+		pfn = paddr_to_pfn(phys_start + file_size);
>+		pfn_end = paddr_to_pfn(phys_end);

Does this code exclude the first pfn of out of PT_LOAD even if there is
no unstored pages ? pfn and pfn_end will point at the next pfn to the
last pfn of PT_LOAD.
This will be problem for the environments which have a overlapped PT_LOAD
segment like ia64.

>+		if (pfn < cycle->start_pfn)
>+			pfn = cycle->start_pfn;
>+		if (pfn_end >= cycle->end_pfn)
>+			pfn_end = cycle->end_pfn - 1;
>+		while (pfn <= pfn_end) {

Should we change this condition to "pfn < pfn_end" ?

>+			clear_bit_on_2nd_bitmap(pfn, cycle);
>+			++pfn;
>+		}
>+		++i;
>+	}
>+}

Thanks,
Atsushi Kumagai

>+
> int
> read_flat_data_header(struct makedumpfile_data_header *fdh)
> {
>@@ -6087,6 +6113,12 @@ create_2nd_bitmap(struct cycle *cycle)
> 	}
>
> 	/*
>+	 * If re-filtering ELF dump, exclude pages that were already
>+	 * excluded in the original file.
>+	 */
>+	exclude_nodata_pages(cycle);
>+
>+	/*
> 	 * Exclude cache pages, cache private pages, user data pages,
> 	 * and hwpoison pages.
> 	 */
>
>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered
  2016-05-18  7:44   ` Atsushi Kumagai
@ 2016-06-07  4:18     ` Atsushi Kumagai
  2016-07-12 21:38       ` Petr Tesarik
  0 siblings, 1 reply; 15+ messages in thread
From: Atsushi Kumagai @ 2016-06-07  4:18 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: kexec mailing list

>>+static void
>>+exclude_nodata_pages(struct cycle *cycle)
>>+{
>>+	int i;
>>+	unsigned long long phys_start, phys_end;
>>+	off_t file_size;
>>+
>>+	i = 0;
>>+	while (get_pt_load_extents(i, &phys_start, &phys_end,
>>+				   NULL, &file_size)) {
>>+		unsigned long long pfn, pfn_end;
>>+
>>+		pfn = paddr_to_pfn(phys_start + file_size);
>>+		pfn_end = paddr_to_pfn(phys_end);
>
>Does this code exclude the first pfn of out of PT_LOAD even if there is
>no unstored pages ? pfn and pfn_end will point at the next pfn to the
>last pfn of PT_LOAD.
>This will be problem for the environments which have a overlapped PT_LOAD
>segment like ia64.
>
>>+		if (pfn < cycle->start_pfn)
>>+			pfn = cycle->start_pfn;
>>+		if (pfn_end >= cycle->end_pfn)
>>+			pfn_end = cycle->end_pfn - 1;
>>+		while (pfn <= pfn_end) {
>
>Should we change this condition to "pfn < pfn_end" ?
>
>>+			clear_bit_on_2nd_bitmap(pfn, cycle);
>>+			++pfn;
>>+		}
>>+		++i;
>>+	}
>>+}

I fixed this as below for v1.6.0.
Of course your comment would still be helpful.

--
Author: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Date:   Thu May 26 15:17:25 2016 +0900

    [PATCH] Fix boundary value bug for checking memory holes

    Now the next pfn to the last pfn of PT_LOAD is always excluded
    by boundary value bug.

         --------- PT_LOAD ----------|
         ----+-----------------------+---------------------+----
             |         pfn:N         |       pfn:N+1       | ...
         ----+-----------------------+---------------------+----
                                              ^ this pfn

    This will be problem in the environments which have a overlapped
    PT_LOAD segment like ia64 because the pfn excluded by this bug
    can be included in another PT_LOAD.

    Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>

diff --git a/makedumpfile.c b/makedumpfile.c
index 4f17686..8a80976 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -4449,7 +4449,7 @@ exclude_nodata_pages(struct cycle *cycle)
                        pfn = cycle->start_pfn;
                if (pfn_end >= cycle->end_pfn)
                        pfn_end = cycle->end_pfn - 1;
-               while (pfn <= pfn_end) {
+               while (pfn < pfn_end) {
                        clear_bit_on_2nd_bitmap(pfn, cycle);
                        ++pfn;
                }

Thanks,
Atsushi Kumagai

>
>Thanks,
>Atsushi Kumagai
>
>>+
>> int
>> read_flat_data_header(struct makedumpfile_data_header *fdh)
>> {
>>@@ -6087,6 +6113,12 @@ create_2nd_bitmap(struct cycle *cycle)
>> 	}
>>
>> 	/*
>>+	 * If re-filtering ELF dump, exclude pages that were already
>>+	 * excluded in the original file.
>>+	 */
>>+	exclude_nodata_pages(cycle);
>>+
>>+	/*
>> 	 * Exclude cache pages, cache private pages, user data pages,
>> 	 * and hwpoison pages.
>> 	 */
>>
>>_______________________________________________
>>kexec mailing list
>>kexec@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/kexec
>
>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered
  2016-06-07  4:18     ` Atsushi Kumagai
@ 2016-07-12 21:38       ` Petr Tesarik
  2016-07-13  7:43         ` Atsushi Kumagai
  0 siblings, 1 reply; 15+ messages in thread
From: Petr Tesarik @ 2016-07-12 21:38 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec mailing list

On Tue, 7 Jun 2016 04:18:48 +0000
Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> wrote:

> >>+static void
> >>+exclude_nodata_pages(struct cycle *cycle)
> >>+{
> >>+	int i;
> >>+	unsigned long long phys_start, phys_end;
> >>+	off_t file_size;
> >>+
> >>+	i = 0;
> >>+	while (get_pt_load_extents(i, &phys_start, &phys_end,
> >>+				   NULL, &file_size)) {
> >>+		unsigned long long pfn, pfn_end;
> >>+
> >>+		pfn = paddr_to_pfn(phys_start + file_size);
> >>+		pfn_end = paddr_to_pfn(phys_end);
> >
> >Does this code exclude the first pfn of out of PT_LOAD even if there is
> >no unstored pages ? pfn and pfn_end will point at the next pfn to the
> >last pfn of PT_LOAD.

Indeed, phys_end is in fact the first physical address that is _not_
contained in the segment.

Good catch!

> >This will be problem for the environments which have a overlapped PT_LOAD
> >segment like ia64.

I think it's quite the opposite. Partial pages on IA64 are the only
ones that were handled correctly with the old code.

> >
> >>+		if (pfn < cycle->start_pfn)
> >>+			pfn = cycle->start_pfn;
> >>+		if (pfn_end >= cycle->end_pfn)
> >>+			pfn_end = cycle->end_pfn - 1;
> >>+		while (pfn <= pfn_end) {
> >
> >Should we change this condition to "pfn < pfn_end" ?

Better than nothing, but if the last page of a LOAD segment is not
partial, it will not be excluded. Except for partial pages, end_pfn
refers to the first PFN _after_ the current segment.

> >
> >>+			clear_bit_on_2nd_bitmap(pfn, cycle);
> >>+			++pfn;
> >>+		}
> >>+		++i;
> >>+	}
> >>+}
> 
> I fixed this as below for v1.6.0.
> Of course your comment would still be helpful.

It comes too late, I'm afraid.

In all ia64 dumps with partial pages I have seen, the full content of
the partial page is indeed stored in another segment, so the page would
be marked incorrectly. OTOH I think the off-by-one bug you fixed
affected all segments, even those without any partial pages; it would
incorrectly mark the first PFN after the segment as filtered.

The perfect solution is to round the starting PFN up instead of down to
preserve partial pages, but I'll have to think about it some more
before I can send a patch.

Thank you for your fix!

Petr T

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered
  2016-07-12 21:38       ` Petr Tesarik
@ 2016-07-13  7:43         ` Atsushi Kumagai
  0 siblings, 0 replies; 15+ messages in thread
From: Atsushi Kumagai @ 2016-07-13  7:43 UTC (permalink / raw)
  To: ptesarik; +Cc: kexec

Hello Petr,

I'm happy to get your reply.

>On Tue, 7 Jun 2016 04:18:48 +0000
>Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> wrote:
>
>> >>+static void
>> >>+exclude_nodata_pages(struct cycle *cycle)
>> >>+{
>> >>+	int i;
>> >>+	unsigned long long phys_start, phys_end;
>> >>+	off_t file_size;
>> >>+
>> >>+	i = 0;
>> >>+	while (get_pt_load_extents(i, &phys_start, &phys_end,
>> >>+				   NULL, &file_size)) {
>> >>+		unsigned long long pfn, pfn_end;
>> >>+
>> >>+		pfn = paddr_to_pfn(phys_start + file_size);
>> >>+		pfn_end = paddr_to_pfn(phys_end);
>> >
>> >Does this code exclude the first pfn of out of PT_LOAD even if there is
>> >no unstored pages ? pfn and pfn_end will point at the next pfn to the
>> >last pfn of PT_LOAD.
>
>Indeed, phys_end is in fact the first physical address that is _not_
>contained in the segment.
>
>Good catch!
>
>> >This will be problem for the environments which have a overlapped PT_LOAD
>> >segment like ia64.
>
>I think it's quite the opposite. Partial pages on IA64 are the only
>ones that were handled correctly with the old code.

I didn't take care partial pages, just considered the case below:

   pfn             1   2   3   4   5   6   7
------------------------------------------------
PT_LAOD(1)       |---+---+---+---|
PT_LAOD(2)               |---+---+---+---|

During checking PT_LOAD(1), the bit of pfn:5 will be dropped if
mem_size equals file_size, but practically the pfn is an used page
as PT_LAOD(2) shows.
If there is only PT_LOAD(1), pfn:5 must be on hole and the actual
harm doesn't occur.

>> >
>> >>+		if (pfn < cycle->start_pfn)
>> >>+			pfn = cycle->start_pfn;
>> >>+		if (pfn_end >= cycle->end_pfn)
>> >>+			pfn_end = cycle->end_pfn - 1;
>> >>+		while (pfn <= pfn_end) {
>> >
>> >Should we change this condition to "pfn < pfn_end" ?
>
>Better than nothing, but if the last page of a LOAD segment is not
>partial, it will not be excluded. Except for partial pages, end_pfn
>refers to the first PFN _after_ the current segment.

If mem_size equals file_size, the last page of a LOAD also shouldn't
be removed. However, the condition of "pfn <= pfn_end" can remove it
since end_pfn refers to the first PFN _after_ the current segment.
I just fixed it.

>> >
>> >>+			clear_bit_on_2nd_bitmap(pfn, cycle);
>> >>+			++pfn;
>> >>+		}
>> >>+		++i;
>> >>+	}
>> >>+}
>>
>> I fixed this as below for v1.6.0.
>> Of course your comment would still be helpful.
>
>It comes too late, I'm afraid.
>
>In all ia64 dumps with partial pages I have seen, the full content of
>the partial page is indeed stored in another segment, so the page would
>be marked incorrectly. OTOH I think the off-by-one bug you fixed
>affected all segments, even those without any partial pages; it would
>incorrectly mark the first PFN after the segment as filtered.
>
>The perfect solution is to round the starting PFN up instead of down to
>preserve partial pages, but I'll have to think about it some more
>before I can send a patch.

Again, I didn't consider partial page cases, I also should reconsider this.
Thank you for pointing it out.


Regards,
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-07-13  8:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-10  7:47 [PATCH 0/3] makedumpfile: Fix handling of ELF segments with p_memsz > p_filesz Petr Tesarik
2016-02-10  7:49 ` [PATCH 1/3] makedumpfile: Keep segment memory size when re-filtering ELF dumps Petr Tesarik
2016-02-10  7:49 ` [PATCH 2/3] makedumpfile: Mark unstored ELF pages as filtered Petr Tesarik
2016-05-18  7:44   ` Atsushi Kumagai
2016-06-07  4:18     ` Atsushi Kumagai
2016-07-12 21:38       ` Petr Tesarik
2016-07-13  7:43         ` Atsushi Kumagai
2016-02-10  7:50 ` [PATCH 3/3] makedumpfile: Rewrite readpage_elf Petr Tesarik
2016-02-10 22:18   ` Ivan Delalande
2016-02-17  7:58     ` Atsushi Kumagai
2016-02-29 18:55   ` Petr Tesarik
2016-02-29 18:59     ` [PATCH] makedumpfile: When reading partial ELF pages, check final pointer against buffer end Petr Tesarik
2016-02-29 19:40       ` Petr Tesarik
2016-03-08 11:58         ` [PATCH] makedumpfile: Fix several issues with reading ELF pages Petr Tesarik
2016-03-11  9:16           ` Atsushi Kumagai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.