All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-05-23  5:24 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:24 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation due to repeated page
table changes, TLB flush and build-up of VM related objects.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance under sufficiently large mapping size.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and I/O work.

Benchmark
=========

You can see two benchmarks on terabyte memory system. Both show about
40 seconds on 2TB system. This is almost equal to performance by
experimental kernel-side memory filtering.

- makedumpfile mmap() benchmark, by Jingbai Ma
  https://lkml.org/lkml/2013/3/27/19

- makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
  https://lkml.org/lkml/2013/3/26/914

ChangeLog
=========

v7 => v8)

- Unify set_vmcore_list_offsets_elf{64,32} as set_vmcore_list_offsets.
  [Patch 2/9]
- Introduce update_note_header_size_elf{64,32} and cleanup
  get_note_number_and_size_elf{64,32} and copy_notes_elf{64,32}.
  [Patch 6/9]
- Create new patch that sets VM_USERMAP flag in VM object for ELF note
  segment buffer.
  [Patch 7/9]
- Unify get_vmcore_size_elf{64,32} as get_vmcore_size.
  [Patch 8/9]

v6 => v7)

- Rebase 3.10-rc2.
- Move roundup operation to note segment from patch 2/8 to patch 6/8.
- Rewrite get_note_number_and_size_elf{64,32} and
  copy_notes_elf{64,32} in patch 6/8.

v5 => v6)

- Change patch order: clenaup patch => PT_LOAD change patch =>
  vmalloc-related patch => mmap patch.
- Some cleanups: improve symbol names simply, add helper functoins for
  processing ELF note segment and add comments for the helper
  functions.
- Fix patch description of patch 7/8.

v4 => v5)

- Rebase 3.10-rc1.
- Introduce remap_vmalloc_range_partial() in order to remap vmalloc
  memory in a part of vma area.
- Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
  remap_vmalloc_range_partial() to remap the memory to userspace.

v3 => v4)

- Rebase 3.9-rc7.
- Drop clean-up patches orthogonal to the main topic of this patch set.
- Copy ELF note segments in the 2nd kernel just as in v1. Allocate
  vmcore objects per pages. => See [PATCH 5/8]
- Map memory referenced by PT_LOAD entry directly even if the start or
  end of the region doesn't fit inside page boundary, no longer copy
  them as the previous v3. Then, holes, outside OS memory, are visible
  from /proc/vmcore. => See [PATCH 7/8]

v2 => v3)

- Rebase 3.9-rc3.
- Copy program headers separately from e_phoff in ELF note segment
  buffer. Now there's no risk to allocate huge memory if program
  header table positions after memory segment.
- Add cleanup patch that removes unnecessary variable.
- Fix wrongly using the variable that is buffer size configurable at
  runtime. Instead, use the variable that has original buffer size.

v1 => v2)

- Clean up the existing codes: use e_phoff, and remove the assumption
  on PT_NOTE entries.
- Fix potential bug that ELF header size is not included in exported
  vmcoreinfo size.
- Divide patch modifying read_vmcore() into two: clean-up and primary
  code change.
- Put ELF note segments in page-size boundary on the 1st kernel
  instead of copying them into the buffer on the 2nd kernel.

Test
====

This patch set is composed based on v3.10-rc2, tested on x86_64,
x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.

---

HATAYAMA Daisuke (9):
      vmcore: support mmap() on /proc/vmcore
      vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
      vmcore: Allow user process to remap ELF note segment buffer
      vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
      vmalloc: introduce remap_vmalloc_range_partial
      vmalloc: make find_vm_area check in range
      vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
      vmcore: allocate buffer for ELF headers on page-size alignment
      vmcore: clean up read_vmcore()


 fs/proc/vmcore.c        |  657 +++++++++++++++++++++++++++++++++--------------
 include/linux/vmalloc.h |    4 
 mm/vmalloc.c            |   65 +++--
 3 files changed, 515 insertions(+), 211 deletions(-)

-- 

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-05-23  5:24 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:24 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation due to repeated page
table changes, TLB flush and build-up of VM related objects.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance under sufficiently large mapping size.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and I/O work.

Benchmark
=========

You can see two benchmarks on terabyte memory system. Both show about
40 seconds on 2TB system. This is almost equal to performance by
experimental kernel-side memory filtering.

- makedumpfile mmap() benchmark, by Jingbai Ma
  https://lkml.org/lkml/2013/3/27/19

- makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
  https://lkml.org/lkml/2013/3/26/914

ChangeLog
=========

v7 => v8)

- Unify set_vmcore_list_offsets_elf{64,32} as set_vmcore_list_offsets.
  [Patch 2/9]
- Introduce update_note_header_size_elf{64,32} and cleanup
  get_note_number_and_size_elf{64,32} and copy_notes_elf{64,32}.
  [Patch 6/9]
- Create new patch that sets VM_USERMAP flag in VM object for ELF note
  segment buffer.
  [Patch 7/9]
- Unify get_vmcore_size_elf{64,32} as get_vmcore_size.
  [Patch 8/9]

v6 => v7)

- Rebase 3.10-rc2.
- Move roundup operation to note segment from patch 2/8 to patch 6/8.
- Rewrite get_note_number_and_size_elf{64,32} and
  copy_notes_elf{64,32} in patch 6/8.

v5 => v6)

- Change patch order: clenaup patch => PT_LOAD change patch =>
  vmalloc-related patch => mmap patch.
- Some cleanups: improve symbol names simply, add helper functoins for
  processing ELF note segment and add comments for the helper
  functions.
- Fix patch description of patch 7/8.

v4 => v5)

- Rebase 3.10-rc1.
- Introduce remap_vmalloc_range_partial() in order to remap vmalloc
  memory in a part of vma area.
- Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
  remap_vmalloc_range_partial() to remap the memory to userspace.

v3 => v4)

- Rebase 3.9-rc7.
- Drop clean-up patches orthogonal to the main topic of this patch set.
- Copy ELF note segments in the 2nd kernel just as in v1. Allocate
  vmcore objects per pages. => See [PATCH 5/8]
- Map memory referenced by PT_LOAD entry directly even if the start or
  end of the region doesn't fit inside page boundary, no longer copy
  them as the previous v3. Then, holes, outside OS memory, are visible
  from /proc/vmcore. => See [PATCH 7/8]

v2 => v3)

- Rebase 3.9-rc3.
- Copy program headers separately from e_phoff in ELF note segment
  buffer. Now there's no risk to allocate huge memory if program
  header table positions after memory segment.
- Add cleanup patch that removes unnecessary variable.
- Fix wrongly using the variable that is buffer size configurable at
  runtime. Instead, use the variable that has original buffer size.

v1 => v2)

- Clean up the existing codes: use e_phoff, and remove the assumption
  on PT_NOTE entries.
- Fix potential bug that ELF header size is not included in exported
  vmcoreinfo size.
- Divide patch modifying read_vmcore() into two: clean-up and primary
  code change.
- Put ELF note segments in page-size boundary on the 1st kernel
  instead of copying them into the buffer on the 2nd kernel.

Test
====

This patch set is composed based on v3.10-rc2, tested on x86_64,
x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.

---

HATAYAMA Daisuke (9):
      vmcore: support mmap() on /proc/vmcore
      vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
      vmcore: Allow user process to remap ELF note segment buffer
      vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
      vmalloc: introduce remap_vmalloc_range_partial
      vmalloc: make find_vm_area check in range
      vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
      vmcore: allocate buffer for ELF headers on page-size alignment
      vmcore: clean up read_vmcore()


 fs/proc/vmcore.c        |  657 +++++++++++++++++++++++++++++++++--------------
 include/linux/vmalloc.h |    4 
 mm/vmalloc.c            |   65 +++--
 3 files changed, 515 insertions(+), 211 deletions(-)

-- 

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-05-23  5:24 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:24 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation due to repeated page
table changes, TLB flush and build-up of VM related objects.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance under sufficiently large mapping size.

In particular, the current main user of this mmap() is makedumpfile,
which not only reads memory from /proc/vmcore but also does other
processing like filtering, compression and I/O work.

Benchmark
=========

You can see two benchmarks on terabyte memory system. Both show about
40 seconds on 2TB system. This is almost equal to performance by
experimental kernel-side memory filtering.

- makedumpfile mmap() benchmark, by Jingbai Ma
  https://lkml.org/lkml/2013/3/27/19

- makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
  https://lkml.org/lkml/2013/3/26/914

ChangeLog
=========

v7 => v8)

- Unify set_vmcore_list_offsets_elf{64,32} as set_vmcore_list_offsets.
  [Patch 2/9]
- Introduce update_note_header_size_elf{64,32} and cleanup
  get_note_number_and_size_elf{64,32} and copy_notes_elf{64,32}.
  [Patch 6/9]
- Create new patch that sets VM_USERMAP flag in VM object for ELF note
  segment buffer.
  [Patch 7/9]
- Unify get_vmcore_size_elf{64,32} as get_vmcore_size.
  [Patch 8/9]

v6 => v7)

- Rebase 3.10-rc2.
- Move roundup operation to note segment from patch 2/8 to patch 6/8.
- Rewrite get_note_number_and_size_elf{64,32} and
  copy_notes_elf{64,32} in patch 6/8.

v5 => v6)

- Change patch order: clenaup patch => PT_LOAD change patch =>
  vmalloc-related patch => mmap patch.
- Some cleanups: improve symbol names simply, add helper functoins for
  processing ELF note segment and add comments for the helper
  functions.
- Fix patch description of patch 7/8.

v4 => v5)

- Rebase 3.10-rc1.
- Introduce remap_vmalloc_range_partial() in order to remap vmalloc
  memory in a part of vma area.
- Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
  remap_vmalloc_range_partial() to remap the memory to userspace.

v3 => v4)

- Rebase 3.9-rc7.
- Drop clean-up patches orthogonal to the main topic of this patch set.
- Copy ELF note segments in the 2nd kernel just as in v1. Allocate
  vmcore objects per pages. => See [PATCH 5/8]
- Map memory referenced by PT_LOAD entry directly even if the start or
  end of the region doesn't fit inside page boundary, no longer copy
  them as the previous v3. Then, holes, outside OS memory, are visible
  from /proc/vmcore. => See [PATCH 7/8]

v2 => v3)

- Rebase 3.9-rc3.
- Copy program headers separately from e_phoff in ELF note segment
  buffer. Now there's no risk to allocate huge memory if program
  header table positions after memory segment.
- Add cleanup patch that removes unnecessary variable.
- Fix wrongly using the variable that is buffer size configurable at
  runtime. Instead, use the variable that has original buffer size.

v1 => v2)

- Clean up the existing codes: use e_phoff, and remove the assumption
  on PT_NOTE entries.
- Fix potential bug that ELF header size is not included in exported
  vmcoreinfo size.
- Divide patch modifying read_vmcore() into two: clean-up and primary
  code change.
- Put ELF note segments in page-size boundary on the 1st kernel
  instead of copying them into the buffer on the 2nd kernel.

Test
====

This patch set is composed based on v3.10-rc2, tested on x86_64,
x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.

---

HATAYAMA Daisuke (9):
      vmcore: support mmap() on /proc/vmcore
      vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
      vmcore: Allow user process to remap ELF note segment buffer
      vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
      vmalloc: introduce remap_vmalloc_range_partial
      vmalloc: make find_vm_area check in range
      vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
      vmcore: allocate buffer for ELF headers on page-size alignment
      vmcore: clean up read_vmcore()


 fs/proc/vmcore.c        |  657 +++++++++++++++++++++++++++++++++--------------
 include/linux/vmalloc.h |    4 
 mm/vmalloc.c            |   65 +++--
 3 files changed, 515 insertions(+), 211 deletions(-)

-- 

Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH v8 1/9] vmcore: clean up read_vmcore()
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Rewrite part of read_vmcore() that reads objects in vmcore_list in the
same way as part reading ELF headers, by which some duplicated and
redundant codes are removed.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   68 ++++++++++++++++--------------------------------------
 1 files changed, 20 insertions(+), 48 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 17f7e08..ab0c92e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -118,27 +118,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
 	return read;
 }
 
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
-					struct vmcore **m_ptr)
-{
-	struct vmcore *m;
-	u64 paddr;
-
-	list_for_each_entry(m, vc_list, list) {
-		u64 start, end;
-		start = m->offset;
-		end = m->offset + m->size - 1;
-		if (offset >= start && offset <= end) {
-			paddr = m->paddr + offset - start;
-			*m_ptr = m;
-			return paddr;
-		}
-	}
-	*m_ptr = NULL;
-	return 0;
-}
-
 /* Read from the ELF header and then the crash dump. On error, negative value is
  * returned otherwise number of bytes read are returned.
  */
@@ -147,8 +126,8 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 {
 	ssize_t acc = 0, tmp;
 	size_t tsz;
-	u64 start, nr_bytes;
-	struct vmcore *curr_m = NULL;
+	u64 start;
+	struct vmcore *m = NULL;
 
 	if (buflen == 0 || *fpos >= vmcore_size)
 		return 0;
@@ -174,33 +153,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
-	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
-	if (!curr_m)
-        	return -EINVAL;
-
-	while (buflen) {
-		tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
-
-		/* Calculate left bytes in current memory segment. */
-		nr_bytes = (curr_m->size - (start - curr_m->paddr));
-		if (tsz > nr_bytes)
-			tsz = nr_bytes;
-
-		tmp = read_from_oldmem(buffer, tsz, &start, 1);
-		if (tmp < 0)
-			return tmp;
-		buflen -= tsz;
-		*fpos += tsz;
-		buffer += tsz;
-		acc += tsz;
-		if (start >= (curr_m->paddr + curr_m->size)) {
-			if (curr_m->list.next == &vmcore_list)
-				return acc;	/*EOF*/
-			curr_m = list_entry(curr_m->list.next,
-						struct vmcore, list);
-			start = curr_m->paddr;
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = m->offset + m->size - *fpos;
+			if (buflen < tsz)
+				tsz = buflen;
+			start = m->paddr + *fpos - m->offset;
+			tmp = read_from_oldmem(buffer, tsz, &start, 1);
+			if (tmp < 0)
+				return tmp;
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
 		}
 	}
+
 	return acc;
 }
 


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 1/9] vmcore: clean up read_vmcore()
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Rewrite part of read_vmcore() that reads objects in vmcore_list in the
same way as part reading ELF headers, by which some duplicated and
redundant codes are removed.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   68 ++++++++++++++++--------------------------------------
 1 files changed, 20 insertions(+), 48 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 17f7e08..ab0c92e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -118,27 +118,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
 	return read;
 }
 
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
-					struct vmcore **m_ptr)
-{
-	struct vmcore *m;
-	u64 paddr;
-
-	list_for_each_entry(m, vc_list, list) {
-		u64 start, end;
-		start = m->offset;
-		end = m->offset + m->size - 1;
-		if (offset >= start && offset <= end) {
-			paddr = m->paddr + offset - start;
-			*m_ptr = m;
-			return paddr;
-		}
-	}
-	*m_ptr = NULL;
-	return 0;
-}
-
 /* Read from the ELF header and then the crash dump. On error, negative value is
  * returned otherwise number of bytes read are returned.
  */
@@ -147,8 +126,8 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 {
 	ssize_t acc = 0, tmp;
 	size_t tsz;
-	u64 start, nr_bytes;
-	struct vmcore *curr_m = NULL;
+	u64 start;
+	struct vmcore *m = NULL;
 
 	if (buflen == 0 || *fpos >= vmcore_size)
 		return 0;
@@ -174,33 +153,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
-	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
-	if (!curr_m)
-        	return -EINVAL;
-
-	while (buflen) {
-		tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
-
-		/* Calculate left bytes in current memory segment. */
-		nr_bytes = (curr_m->size - (start - curr_m->paddr));
-		if (tsz > nr_bytes)
-			tsz = nr_bytes;
-
-		tmp = read_from_oldmem(buffer, tsz, &start, 1);
-		if (tmp < 0)
-			return tmp;
-		buflen -= tsz;
-		*fpos += tsz;
-		buffer += tsz;
-		acc += tsz;
-		if (start >= (curr_m->paddr + curr_m->size)) {
-			if (curr_m->list.next == &vmcore_list)
-				return acc;	/*EOF*/
-			curr_m = list_entry(curr_m->list.next,
-						struct vmcore, list);
-			start = curr_m->paddr;
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = m->offset + m->size - *fpos;
+			if (buflen < tsz)
+				tsz = buflen;
+			start = m->paddr + *fpos - m->offset;
+			tmp = read_from_oldmem(buffer, tsz, &start, 1);
+			if (tmp < 0)
+				return tmp;
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
 		}
 	}
+
 	return acc;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 1/9] vmcore: clean up read_vmcore()
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

Rewrite part of read_vmcore() that reads objects in vmcore_list in the
same way as part reading ELF headers, by which some duplicated and
redundant codes are removed.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   68 ++++++++++++++++--------------------------------------
 1 files changed, 20 insertions(+), 48 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 17f7e08..ab0c92e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -118,27 +118,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
 	return read;
 }
 
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
-					struct vmcore **m_ptr)
-{
-	struct vmcore *m;
-	u64 paddr;
-
-	list_for_each_entry(m, vc_list, list) {
-		u64 start, end;
-		start = m->offset;
-		end = m->offset + m->size - 1;
-		if (offset >= start && offset <= end) {
-			paddr = m->paddr + offset - start;
-			*m_ptr = m;
-			return paddr;
-		}
-	}
-	*m_ptr = NULL;
-	return 0;
-}
-
 /* Read from the ELF header and then the crash dump. On error, negative value is
  * returned otherwise number of bytes read are returned.
  */
@@ -147,8 +126,8 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 {
 	ssize_t acc = 0, tmp;
 	size_t tsz;
-	u64 start, nr_bytes;
-	struct vmcore *curr_m = NULL;
+	u64 start;
+	struct vmcore *m = NULL;
 
 	if (buflen == 0 || *fpos >= vmcore_size)
 		return 0;
@@ -174,33 +153,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
-	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
-	if (!curr_m)
-        	return -EINVAL;
-
-	while (buflen) {
-		tsz = min_t(size_t, buflen, PAGE_SIZE - (start & ~PAGE_MASK));
-
-		/* Calculate left bytes in current memory segment. */
-		nr_bytes = (curr_m->size - (start - curr_m->paddr));
-		if (tsz > nr_bytes)
-			tsz = nr_bytes;
-
-		tmp = read_from_oldmem(buffer, tsz, &start, 1);
-		if (tmp < 0)
-			return tmp;
-		buflen -= tsz;
-		*fpos += tsz;
-		buffer += tsz;
-		acc += tsz;
-		if (start >= (curr_m->paddr + curr_m->size)) {
-			if (curr_m->list.next == &vmcore_list)
-				return acc;	/*EOF*/
-			curr_m = list_entry(curr_m->list.next,
-						struct vmcore, list);
-			start = curr_m->paddr;
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = m->offset + m->size - *fpos;
+			if (buflen < tsz)
+				tsz = buflen;
+			start = m->paddr + *fpos - m->offset;
+			tmp = read_from_oldmem(buffer, tsz, &start, 1);
+			if (tmp < 0)
+				return tmp;
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
 		}
 	}
+
 	return acc;
 }
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().

Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually
used buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.

The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.

The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.

Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that
the offset includes the holes towards the page boundary.

As a result, both set_vmcore_list_offsets_elf{64,32} have the same
definition. Merge them as set_vmcore_list_offsets.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   94 ++++++++++++++++++++++++------------------------------
 1 files changed, 42 insertions(+), 52 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index ab0c92e..80fea97 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
 /* Stores the pointer to the buffer containing kernel elf core headers. */
 static char *elfcorebuf;
 static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -186,7 +187,7 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -195,7 +196,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
 		size += phdr_ptr->p_memsz;
 		phdr_ptr++;
@@ -203,7 +204,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 	return size;
 }
 
-static u64 __init get_vmcore_size_elf32(char *elfptr)
+static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -212,7 +213,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-	size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
 		size += phdr_ptr->p_memsz;
 		phdr_ptr++;
@@ -294,6 +295,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
+	memset(elfptr + *elfsz, 0, i);
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -375,6 +378,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
+	memset(elfptr + *elfsz, 0, i);
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -398,8 +403,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
+	vmcore_off = elfsz +
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -435,8 +439,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
+	vmcore_off = elfsz +
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -459,38 +462,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
-						struct list_head *vc_list)
+static void __init set_vmcore_list_offsets(size_t elfsz,
+					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf64_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
-	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
-
-	list_for_each_entry(m, vc_list, list) {
-		m->offset = vmcore_off;
-		vmcore_off += m->size;
-	}
-}
-
-/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
-						struct list_head *vc_list)
-{
-	loff_t vmcore_off;
-	Elf32_Ehdr *ehdr_ptr;
-	struct vmcore *m;
-
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+	vmcore_off = elfsz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+	elfcorebuf_sz = elfcorebuf_sz_orig;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
-	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 
@@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
+	elfcorebuf_sz = elfcorebuf_sz_orig;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
-	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 
@@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
 	} else {
 		pr_warn("Warning: Core image elf header is not sane\n");
 		return -EINVAL;
@@ -683,7 +672,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	kfree(elfcorebuf);
+	free_pages((unsigned long)elfcorebuf,
+		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().

Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually
used buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.

The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.

The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.

Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that
the offset includes the holes towards the page boundary.

As a result, both set_vmcore_list_offsets_elf{64,32} have the same
definition. Merge them as set_vmcore_list_offsets.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   94 ++++++++++++++++++++++++------------------------------
 1 files changed, 42 insertions(+), 52 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index ab0c92e..80fea97 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
 /* Stores the pointer to the buffer containing kernel elf core headers. */
 static char *elfcorebuf;
 static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -186,7 +187,7 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -195,7 +196,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
 		size += phdr_ptr->p_memsz;
 		phdr_ptr++;
@@ -203,7 +204,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 	return size;
 }
 
-static u64 __init get_vmcore_size_elf32(char *elfptr)
+static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -212,7 +213,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-	size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
 		size += phdr_ptr->p_memsz;
 		phdr_ptr++;
@@ -294,6 +295,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
+	memset(elfptr + *elfsz, 0, i);
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -375,6 +378,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
+	memset(elfptr + *elfsz, 0, i);
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -398,8 +403,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
+	vmcore_off = elfsz +
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -435,8 +439,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
+	vmcore_off = elfsz +
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -459,38 +462,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
-						struct list_head *vc_list)
+static void __init set_vmcore_list_offsets(size_t elfsz,
+					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf64_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
-	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
-
-	list_for_each_entry(m, vc_list, list) {
-		m->offset = vmcore_off;
-		vmcore_off += m->size;
-	}
-}
-
-/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
-						struct list_head *vc_list)
-{
-	loff_t vmcore_off;
-	Elf32_Ehdr *ehdr_ptr;
-	struct vmcore *m;
-
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+	vmcore_off = elfsz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+	elfcorebuf_sz = elfcorebuf_sz_orig;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
-	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 
@@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
+	elfcorebuf_sz = elfcorebuf_sz_orig;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
-	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 
@@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
 	} else {
 		pr_warn("Warning: Core image elf header is not sane\n");
 		return -EINVAL;
@@ -683,7 +672,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	kfree(elfcorebuf);
+	free_pages((unsigned long)elfcorebuf,
+		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().

Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually
used buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.

The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.

The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.

Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that
the offset includes the holes towards the page boundary.

As a result, both set_vmcore_list_offsets_elf{64,32} have the same
definition. Merge them as set_vmcore_list_offsets.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   94 ++++++++++++++++++++++++------------------------------
 1 files changed, 42 insertions(+), 52 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index ab0c92e..80fea97 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
 /* Stores the pointer to the buffer containing kernel elf core headers. */
 static char *elfcorebuf;
 static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -186,7 +187,7 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -195,7 +196,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
 		size += phdr_ptr->p_memsz;
 		phdr_ptr++;
@@ -203,7 +204,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 	return size;
 }
 
-static u64 __init get_vmcore_size_elf32(char *elfptr)
+static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -212,7 +213,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-	size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
 		size += phdr_ptr->p_memsz;
 		phdr_ptr++;
@@ -294,6 +295,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
+	memset(elfptr + *elfsz, 0, i);
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -375,6 +378,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
+	memset(elfptr + *elfsz, 0, i);
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
@@ -398,8 +403,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
+	vmcore_off = elfsz +
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -435,8 +439,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
+	vmcore_off = elfsz +
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -459,38 +462,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
-						struct list_head *vc_list)
+static void __init set_vmcore_list_offsets(size_t elfsz,
+					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
-	Elf64_Ehdr *ehdr_ptr;
 	struct vmcore *m;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
-	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
-
-	list_for_each_entry(m, vc_list, list) {
-		m->offset = vmcore_off;
-		vmcore_off += m->size;
-	}
-}
-
-/* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf32(char *elfptr,
-						struct list_head *vc_list)
-{
-	loff_t vmcore_off;
-	Elf32_Ehdr *ehdr_ptr;
-	struct vmcore *m;
-
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf32_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+	vmcore_off = elfsz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+	elfcorebuf_sz = elfcorebuf_sz_orig;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
-	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 
@@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
+	elfcorebuf_sz = elfcorebuf_sz_orig;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
-	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
+	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 
@@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
 	} else {
 		pr_warn("Warning: Core image elf header is not sane\n");
 		return -EINVAL;
@@ -683,7 +672,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	kfree(elfcorebuf);
+	free_pages((unsigned long)elfcorebuf,
+		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as
holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
start] and [end, roundup(end, PAGE_SIZE)].

Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the
variable m corresponds to. Then, pictorially:

  m->offset                    +---------------+
                               | hole          |
phdr->p_offset =               +---------------+
  m->offset + (paddr - start)  |               |\
                               | kernel memory | phdr->p_memsz
                               |               |/
                               +---------------+
                               | hole          |
  m->offset + m->size          +---------------+

where m->offset and m->offset + m->size are always page-size aligned.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   30 ++++++++++++++++++++++--------
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 80fea97..686068d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -407,20 +407,27 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 paddr, start, end, size;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
+		paddr = phdr_ptr->p_offset;
+		start = rounddown(paddr, PAGE_SIZE);
+		end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+		size = end - start;
+
 		/* Add this contiguous chunk of memory to vmcore list.*/
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
+		new->paddr = start;
+		new->size = size;
 		list_add_tail(&new->list, vc_list);
 
 		/* Update the program header offset. */
-		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		phdr_ptr->p_offset = vmcore_off + (paddr - start);
+		vmcore_off = vmcore_off + size;
 	}
 	return 0;
 }
@@ -443,20 +450,27 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 paddr, start, end, size;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
+		paddr = phdr_ptr->p_offset;
+		start = rounddown(paddr, PAGE_SIZE);
+		end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+		size = end - start;
+
 		/* Add this contiguous chunk of memory to vmcore list.*/
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
+		new->paddr = start;
+		new->size = size;
 		list_add_tail(&new->list, vc_list);
 
 		/* Update the program header offset */
-		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		phdr_ptr->p_offset = vmcore_off + (paddr - start);
+		vmcore_off = vmcore_off + size;
 	}
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as
holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
start] and [end, roundup(end, PAGE_SIZE)].

Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the
variable m corresponds to. Then, pictorially:

  m->offset                    +---------------+
                               | hole          |
phdr->p_offset =               +---------------+
  m->offset + (paddr - start)  |               |\
                               | kernel memory | phdr->p_memsz
                               |               |/
                               +---------------+
                               | hole          |
  m->offset + m->size          +---------------+

where m->offset and m->offset + m->size are always page-size aligned.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   30 ++++++++++++++++++++++--------
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 80fea97..686068d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -407,20 +407,27 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 paddr, start, end, size;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
+		paddr = phdr_ptr->p_offset;
+		start = rounddown(paddr, PAGE_SIZE);
+		end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+		size = end - start;
+
 		/* Add this contiguous chunk of memory to vmcore list.*/
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
+		new->paddr = start;
+		new->size = size;
 		list_add_tail(&new->list, vc_list);
 
 		/* Update the program header offset. */
-		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		phdr_ptr->p_offset = vmcore_off + (paddr - start);
+		vmcore_off = vmcore_off + size;
 	}
 	return 0;
 }
@@ -443,20 +450,27 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 paddr, start, end, size;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
+		paddr = phdr_ptr->p_offset;
+		start = rounddown(paddr, PAGE_SIZE);
+		end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+		size = end - start;
+
 		/* Add this contiguous chunk of memory to vmcore list.*/
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
+		new->paddr = start;
+		new->size = size;
 		list_add_tail(&new->list, vc_list);
 
 		/* Update the program header offset */
-		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		phdr_ptr->p_offset = vmcore_off + (paddr - start);
+		vmcore_off = vmcore_off + size;
 	}
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as
holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
start] and [end, roundup(end, PAGE_SIZE)].

Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the
variable m corresponds to. Then, pictorially:

  m->offset                    +---------------+
                               | hole          |
phdr->p_offset =               +---------------+
  m->offset + (paddr - start)  |               |\
                               | kernel memory | phdr->p_memsz
                               |               |/
                               +---------------+
                               | hole          |
  m->offset + m->size          +---------------+

where m->offset and m->offset + m->size are always page-size aligned.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   30 ++++++++++++++++++++++--------
 1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 80fea97..686068d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -407,20 +407,27 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 paddr, start, end, size;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
+		paddr = phdr_ptr->p_offset;
+		start = rounddown(paddr, PAGE_SIZE);
+		end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+		size = end - start;
+
 		/* Add this contiguous chunk of memory to vmcore list.*/
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
+		new->paddr = start;
+		new->size = size;
 		list_add_tail(&new->list, vc_list);
 
 		/* Update the program header offset. */
-		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		phdr_ptr->p_offset = vmcore_off + (paddr - start);
+		vmcore_off = vmcore_off + size;
 	}
 	return 0;
 }
@@ -443,20 +450,27 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 			phdr_ptr->p_memsz; /* Note sections */
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 paddr, start, end, size;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
+		paddr = phdr_ptr->p_offset;
+		start = rounddown(paddr, PAGE_SIZE);
+		end = roundup(paddr + phdr_ptr->p_memsz, PAGE_SIZE);
+		size = end - start;
+
 		/* Add this contiguous chunk of memory to vmcore list.*/
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
+		new->paddr = start;
+		new->size = size;
 		list_add_tail(&new->list, vc_list);
 
 		/* Update the program header offset */
-		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		phdr_ptr->p_offset = vmcore_off + (paddr - start);
+		vmcore_off = vmcore_off + size;
 	}
 	return 0;
 }


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 4/9] vmalloc: make find_vm_area check in range
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Currently, __find_vmap_area searches for the kernel VM area starting
at a given address. This patch changes this behavior so that it
searches for the kernel VM area to which the address belongs. This
change is needed by remap_vmalloc_range_partial to be introduced in
later patch that receives any position of kernel VM area as target
address.

This patch changes the condition (addr > va->va_start) to the
equivalent (addr >= va->va_end) by taking advantage of the fact that
each kernel VM area is non-overlapping.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---

 mm/vmalloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d365724..3875fa2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -292,7 +292,7 @@ static struct vmap_area *__find_vmap_area(unsigned long addr)
 		va = rb_entry(n, struct vmap_area, rb_node);
 		if (addr < va->va_start)
 			n = n->rb_left;
-		else if (addr > va->va_start)
+		else if (addr >= va->va_end)
 			n = n->rb_right;
 		else
 			return va;


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 4/9] vmalloc: make find_vm_area check in range
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Currently, __find_vmap_area searches for the kernel VM area starting
at a given address. This patch changes this behavior so that it
searches for the kernel VM area to which the address belongs. This
change is needed by remap_vmalloc_range_partial to be introduced in
later patch that receives any position of kernel VM area as target
address.

This patch changes the condition (addr > va->va_start) to the
equivalent (addr >= va->va_end) by taking advantage of the fact that
each kernel VM area is non-overlapping.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---

 mm/vmalloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d365724..3875fa2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -292,7 +292,7 @@ static struct vmap_area *__find_vmap_area(unsigned long addr)
 		va = rb_entry(n, struct vmap_area, rb_node);
 		if (addr < va->va_start)
 			n = n->rb_left;
-		else if (addr > va->va_start)
+		else if (addr >= va->va_end)
 			n = n->rb_right;
 		else
 			return va;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 4/9] vmalloc: make find_vm_area check in range
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

Currently, __find_vmap_area searches for the kernel VM area starting
at a given address. This patch changes this behavior so that it
searches for the kernel VM area to which the address belongs. This
change is needed by remap_vmalloc_range_partial to be introduced in
later patch that receives any position of kernel VM area as target
address.

This patch changes the condition (addr > va->va_start) to the
equivalent (addr >= va->va_end) by taking advantage of the fact that
each kernel VM area is non-overlapping.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---

 mm/vmalloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d365724..3875fa2 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -292,7 +292,7 @@ static struct vmap_area *__find_vmap_area(unsigned long addr)
 		va = rb_entry(n, struct vmap_area, rb_node);
 		if (addr < va->va_start)
 			n = n->rb_left;
-		else if (addr > va->va_start)
+		else if (addr >= va->va_end)
 			n = n->rb_right;
 		else
 			return va;


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

We want to allocate ELF note segment buffer on the 2nd kernel in
vmalloc space and remap it to user-space in order to reduce the risk
that memory allocation fails on system with huge number of CPUs and so
with huge ELF note segment that exceeds 11-order block size.

Although there's already remap_vmalloc_range for the purpose of
remapping vmalloc memory to user-space, we need to specify user-space
range via vma. Mmap on /proc/vmcore needs to remap range across
multiple objects, so the interface that requires vma to cover full
range is problematic.

This patch introduces remap_vmalloc_range_partial that receives
user-space range as a pair of base address and size and can be used
for mmap on /proc/vmcore case.

remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 include/linux/vmalloc.h |    4 +++
 mm/vmalloc.c            |   63 +++++++++++++++++++++++++++++++++--------------
 2 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 7d5773a..dd0a2c8 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -82,6 +82,10 @@ extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
 extern void vunmap(const void *addr);
 
+extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
+				       unsigned long uaddr, void *kaddr,
+				       unsigned long size);
+
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
 void vmalloc_sync_all(void);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3875fa2..d9a9f4f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2148,42 +2148,44 @@ finished:
 }
 
 /**
- *	remap_vmalloc_range  -  map vmalloc pages to userspace
- *	@vma:		vma to cover (map full range of vma)
- *	@addr:		vmalloc memory
- *	@pgoff:		number of pages into addr before first page to map
+ *	remap_vmalloc_range_partial  -  map vmalloc pages to userspace
+ *	@vma:		vma to cover
+ *	@uaddr:		target user address to start at
+ *	@kaddr:		virtual address of vmalloc kernel memory
+ *	@size:		size of map area
  *
  *	Returns:	0 for success, -Exxx on failure
  *
- *	This function checks that addr is a valid vmalloc'ed area, and
- *	that it is big enough to cover the vma. Will return failure if
- *	that criteria isn't met.
+ *	This function checks that @kaddr is a valid vmalloc'ed area,
+ *	and that it is big enough to cover the range starting at
+ *	@uaddr in @vma. Will return failure if that criteria isn't
+ *	met.
  *
  *	Similar to remap_pfn_range() (see mm/memory.c)
  */
-int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
-						unsigned long pgoff)
+int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
+				void *kaddr, unsigned long size)
 {
 	struct vm_struct *area;
-	unsigned long uaddr = vma->vm_start;
-	unsigned long usize = vma->vm_end - vma->vm_start;
 
-	if ((PAGE_SIZE-1) & (unsigned long)addr)
+	size = PAGE_ALIGN(size);
+
+	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
+	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
 		return -EINVAL;
 
-	area = find_vm_area(addr);
+	area = find_vm_area(kaddr);
 	if (!area)
 		return -EINVAL;
 
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
+	if (kaddr + size > area->addr + area->size)
 		return -EINVAL;
 
-	addr += pgoff << PAGE_SHIFT;
 	do {
-		struct page *page = vmalloc_to_page(addr);
+		struct page *page = vmalloc_to_page(kaddr);
 		int ret;
 
 		ret = vm_insert_page(vma, uaddr, page);
@@ -2191,14 +2193,37 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 			return ret;
 
 		uaddr += PAGE_SIZE;
-		addr += PAGE_SIZE;
-		usize -= PAGE_SIZE;
-	} while (usize > 0);
+		kaddr += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	} while (size > 0);
 
 	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
 
 	return 0;
 }
+EXPORT_SYMBOL(remap_vmalloc_range_partial);
+
+/**
+ *	remap_vmalloc_range  -  map vmalloc pages to userspace
+ *	@vma:		vma to cover (map full range of vma)
+ *	@addr:		vmalloc memory
+ *	@pgoff:		number of pages into addr before first page to map
+ *
+ *	Returns:	0 for success, -Exxx on failure
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	that it is big enough to cover the vma. Will return failure if
+ *	that criteria isn't met.
+ *
+ *	Similar to remap_pfn_range() (see mm/memory.c)
+ */
+int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
+						unsigned long pgoff)
+{
+	return remap_vmalloc_range_partial(vma, vma->vm_start,
+					   addr + (pgoff << PAGE_SHIFT),
+					   vma->vm_end - vma->vm_start);
+}
 EXPORT_SYMBOL(remap_vmalloc_range);
 
 /*


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

We want to allocate ELF note segment buffer on the 2nd kernel in
vmalloc space and remap it to user-space in order to reduce the risk
that memory allocation fails on system with huge number of CPUs and so
with huge ELF note segment that exceeds 11-order block size.

Although there's already remap_vmalloc_range for the purpose of
remapping vmalloc memory to user-space, we need to specify user-space
range via vma. Mmap on /proc/vmcore needs to remap range across
multiple objects, so the interface that requires vma to cover full
range is problematic.

This patch introduces remap_vmalloc_range_partial that receives
user-space range as a pair of base address and size and can be used
for mmap on /proc/vmcore case.

remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 include/linux/vmalloc.h |    4 +++
 mm/vmalloc.c            |   63 +++++++++++++++++++++++++++++++++--------------
 2 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 7d5773a..dd0a2c8 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -82,6 +82,10 @@ extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
 extern void vunmap(const void *addr);
 
+extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
+				       unsigned long uaddr, void *kaddr,
+				       unsigned long size);
+
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
 void vmalloc_sync_all(void);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3875fa2..d9a9f4f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2148,42 +2148,44 @@ finished:
 }
 
 /**
- *	remap_vmalloc_range  -  map vmalloc pages to userspace
- *	@vma:		vma to cover (map full range of vma)
- *	@addr:		vmalloc memory
- *	@pgoff:		number of pages into addr before first page to map
+ *	remap_vmalloc_range_partial  -  map vmalloc pages to userspace
+ *	@vma:		vma to cover
+ *	@uaddr:		target user address to start at
+ *	@kaddr:		virtual address of vmalloc kernel memory
+ *	@size:		size of map area
  *
  *	Returns:	0 for success, -Exxx on failure
  *
- *	This function checks that addr is a valid vmalloc'ed area, and
- *	that it is big enough to cover the vma. Will return failure if
- *	that criteria isn't met.
+ *	This function checks that @kaddr is a valid vmalloc'ed area,
+ *	and that it is big enough to cover the range starting at
+ *	@uaddr in @vma. Will return failure if that criteria isn't
+ *	met.
  *
  *	Similar to remap_pfn_range() (see mm/memory.c)
  */
-int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
-						unsigned long pgoff)
+int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
+				void *kaddr, unsigned long size)
 {
 	struct vm_struct *area;
-	unsigned long uaddr = vma->vm_start;
-	unsigned long usize = vma->vm_end - vma->vm_start;
 
-	if ((PAGE_SIZE-1) & (unsigned long)addr)
+	size = PAGE_ALIGN(size);
+
+	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
+	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
 		return -EINVAL;
 
-	area = find_vm_area(addr);
+	area = find_vm_area(kaddr);
 	if (!area)
 		return -EINVAL;
 
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
+	if (kaddr + size > area->addr + area->size)
 		return -EINVAL;
 
-	addr += pgoff << PAGE_SHIFT;
 	do {
-		struct page *page = vmalloc_to_page(addr);
+		struct page *page = vmalloc_to_page(kaddr);
 		int ret;
 
 		ret = vm_insert_page(vma, uaddr, page);
@@ -2191,14 +2193,37 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 			return ret;
 
 		uaddr += PAGE_SIZE;
-		addr += PAGE_SIZE;
-		usize -= PAGE_SIZE;
-	} while (usize > 0);
+		kaddr += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	} while (size > 0);
 
 	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
 
 	return 0;
 }
+EXPORT_SYMBOL(remap_vmalloc_range_partial);
+
+/**
+ *	remap_vmalloc_range  -  map vmalloc pages to userspace
+ *	@vma:		vma to cover (map full range of vma)
+ *	@addr:		vmalloc memory
+ *	@pgoff:		number of pages into addr before first page to map
+ *
+ *	Returns:	0 for success, -Exxx on failure
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	that it is big enough to cover the vma. Will return failure if
+ *	that criteria isn't met.
+ *
+ *	Similar to remap_pfn_range() (see mm/memory.c)
+ */
+int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
+						unsigned long pgoff)
+{
+	return remap_vmalloc_range_partial(vma, vma->vm_start,
+					   addr + (pgoff << PAGE_SHIFT),
+					   vma->vm_end - vma->vm_start);
+}
 EXPORT_SYMBOL(remap_vmalloc_range);
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

We want to allocate ELF note segment buffer on the 2nd kernel in
vmalloc space and remap it to user-space in order to reduce the risk
that memory allocation fails on system with huge number of CPUs and so
with huge ELF note segment that exceeds 11-order block size.

Although there's already remap_vmalloc_range for the purpose of
remapping vmalloc memory to user-space, we need to specify user-space
range via vma. Mmap on /proc/vmcore needs to remap range across
multiple objects, so the interface that requires vma to cover full
range is problematic.

This patch introduces remap_vmalloc_range_partial that receives
user-space range as a pair of base address and size and can be used
for mmap on /proc/vmcore case.

remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 include/linux/vmalloc.h |    4 +++
 mm/vmalloc.c            |   63 +++++++++++++++++++++++++++++++++--------------
 2 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 7d5773a..dd0a2c8 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -82,6 +82,10 @@ extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
 extern void vunmap(const void *addr);
 
+extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
+				       unsigned long uaddr, void *kaddr,
+				       unsigned long size);
+
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
 void vmalloc_sync_all(void);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3875fa2..d9a9f4f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2148,42 +2148,44 @@ finished:
 }
 
 /**
- *	remap_vmalloc_range  -  map vmalloc pages to userspace
- *	@vma:		vma to cover (map full range of vma)
- *	@addr:		vmalloc memory
- *	@pgoff:		number of pages into addr before first page to map
+ *	remap_vmalloc_range_partial  -  map vmalloc pages to userspace
+ *	@vma:		vma to cover
+ *	@uaddr:		target user address to start at
+ *	@kaddr:		virtual address of vmalloc kernel memory
+ *	@size:		size of map area
  *
  *	Returns:	0 for success, -Exxx on failure
  *
- *	This function checks that addr is a valid vmalloc'ed area, and
- *	that it is big enough to cover the vma. Will return failure if
- *	that criteria isn't met.
+ *	This function checks that @kaddr is a valid vmalloc'ed area,
+ *	and that it is big enough to cover the range starting at
+ *	@uaddr in @vma. Will return failure if that criteria isn't
+ *	met.
  *
  *	Similar to remap_pfn_range() (see mm/memory.c)
  */
-int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
-						unsigned long pgoff)
+int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
+				void *kaddr, unsigned long size)
 {
 	struct vm_struct *area;
-	unsigned long uaddr = vma->vm_start;
-	unsigned long usize = vma->vm_end - vma->vm_start;
 
-	if ((PAGE_SIZE-1) & (unsigned long)addr)
+	size = PAGE_ALIGN(size);
+
+	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
+	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
 		return -EINVAL;
 
-	area = find_vm_area(addr);
+	area = find_vm_area(kaddr);
 	if (!area)
 		return -EINVAL;
 
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (usize + (pgoff << PAGE_SHIFT) > area->size - PAGE_SIZE)
+	if (kaddr + size > area->addr + area->size)
 		return -EINVAL;
 
-	addr += pgoff << PAGE_SHIFT;
 	do {
-		struct page *page = vmalloc_to_page(addr);
+		struct page *page = vmalloc_to_page(kaddr);
 		int ret;
 
 		ret = vm_insert_page(vma, uaddr, page);
@@ -2191,14 +2193,37 @@ int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 			return ret;
 
 		uaddr += PAGE_SIZE;
-		addr += PAGE_SIZE;
-		usize -= PAGE_SIZE;
-	} while (usize > 0);
+		kaddr += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	} while (size > 0);
 
 	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
 
 	return 0;
 }
+EXPORT_SYMBOL(remap_vmalloc_range_partial);
+
+/**
+ *	remap_vmalloc_range  -  map vmalloc pages to userspace
+ *	@vma:		vma to cover (map full range of vma)
+ *	@addr:		vmalloc memory
+ *	@pgoff:		number of pages into addr before first page to map
+ *
+ *	Returns:	0 for success, -Exxx on failure
+ *
+ *	This function checks that addr is a valid vmalloc'ed area, and
+ *	that it is big enough to cover the vma. Will return failure if
+ *	that criteria isn't met.
+ *
+ *	Similar to remap_pfn_range() (see mm/memory.c)
+ */
+int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
+						unsigned long pgoff)
+{
+	return remap_vmalloc_range_partial(vma, vma->vm_start,
+					   addr + (pgoff << PAGE_SHIFT),
+					   vma->vm_end - vma->vm_start);
+}
 EXPORT_SYMBOL(remap_vmalloc_range);
 
 /*


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for
old kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of
the buffers are in per-cpu area.

ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.

This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
to the ELF note segment buffer and its size. There's no longer the
vmcore object that corresponds to the ELF note segment in
vmcore_list. Accordingly, read_vmcore() has new case for ELF note
segment and set_vmcore_list_offsets_elf{64,32}() and other helper
functions starts calculating offset from sum of size of ELF headers
and size of ELF note segment.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |  355 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 288 insertions(+), 67 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 686068d..937709d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -34,6 +34,9 @@ static char *elfcorebuf;
 static size_t elfcorebuf_sz;
 static size_t elfcorebuf_sz_orig;
 
+static char *elfnotes_buf;
+static size_t elfnotes_sz;
+
 /* Total size of vmcore file. */
 static u64 vmcore_size;
 
@@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
+	/* Read Elf note segment */
+	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
+		void *kaddr;
+
+		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
+		if (buflen < tsz)
+			tsz = buflen;
+		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
+		if (copy_to_user(buffer, kaddr, tsz))
+			return -EFAULT;
+		buflen -= tsz;
+		*fpos += tsz;
+		buffer += tsz;
+		acc += tsz;
+
+		/* leave now if filled buffer already */
+		if (buflen == 0)
+			return acc;
+	}
+
 	list_for_each_entry(m, &vmcore_list, list) {
 		if (*fpos < m->offset + m->size) {
 			tsz = m->offset + m->size - *fpos;
@@ -221,27 +244,27 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 	return size;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+/**
+ * update_note_header_size_elf64 - update p_memsz member of each PT_NOTE entry
+ *
+ * @ehdr_ptr: ELF header
+ *
+ * This function updates p_memsz member of each PT_NOTE entry in the
+ * program header table pointed to by @ehdr_ptr to real size of ELF
+ * note segment.
+ */
+static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
-	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr phdr, *phdr_ptr;
+	int i, rc=0;
+	Elf64_Phdr *phdr_ptr;
 	Elf64_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
+	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
-		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -253,7 +276,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			return rc;
 		}
 		nhdr_ptr = notes_section;
-		for (j = 0; j < max_sz; j += sz) {
+		while (real_sz < max_sz) {
 			if (nhdr_ptr->n_namesz == 0)
 				break;
 			sz = sizeof(Elf64_Nhdr) +
@@ -262,26 +285,122 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
-		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
+		phdr_ptr->p_memsz = real_sz;
+	}
+
+	return 0;
+}
+
+/**
+ * get_note_number_and_size_elf64 - get the number of PT_NOTE program
+ * headers and sum of real size of their ELF note segment headers and
+ * data.
+ *
+ * @ehdr_ptr: ELF header
+ * @nr_ptnote: buffer for the number of PT_NOTE program headers
+ * @sz_ptnote: buffer for size of unique PT_NOTE program header
+ *
+ * This function is used to merge multiple PT_NOTE program headers
+ * into a unique single one. The resulting unique entry will have
+ * @sz_ptnote in its phdr->p_mem.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf64
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
+						 int *nr_ptnote, u64 *sz_ptnote)
+{
+	int i;
+	Elf64_Phdr *phdr_ptr;
+
+	*nr_ptnote = *sz_ptnote = 0;
+
+	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		*nr_ptnote += 1;
+		*sz_ptnote += phdr_ptr->p_memsz;
 	}
 
+	return 0;
+}
+
+/**
+ * copy_notes_elf64 - copy ELF note segments in a given buffer
+ *
+ * @ehdr_ptr: ELF header
+ * @notes_buf: buffer into which ELF note segments are copied
+ *
+ * This function is used to copy ELF note segment in the 1st kernel
+ * into the buffer @notes_buf in the 2nd kernel. It is assumed that
+ * size of the buffer @notes_buf is equal to or larger than sum of the
+ * real ELF note segment headers and data.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf64
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
+{
+	int i, rc=0;
+	Elf64_Phdr *phdr_ptr;
+
+	phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
+
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 offset;
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		offset = phdr_ptr->p_offset;
+		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+		if (rc < 0)
+			return rc;
+		notes_buf += phdr_ptr->p_memsz;
+	}
+
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz)
+{
+	int i, nr_ptnote=0, rc=0;
+	char *tmp;
+	Elf64_Ehdr *ehdr_ptr;
+	Elf64_Phdr phdr;
+	u64 phdr_sz = 0, note_off;
+
+	ehdr_ptr = (Elf64_Ehdr *)elfptr;
+
+	rc = update_note_header_size_elf64(ehdr_ptr);
+	if (rc < 0)
+		return rc;
+
+	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
+	if (rc < 0)
+		return rc;
+
+	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
+	*notes_buf = vzalloc(*notes_sz);
+	if (!*notes_buf)
+		return -ENOMEM;
+
+	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
+	if (rc < 0)
+		return rc;
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf64_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -304,27 +423,27 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	return 0;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+/**
+ * update_note_header_size_elf32 - update p_memsz member of each PT_NOTE entry
+ *
+ * @ehdr_ptr: ELF header
+ *
+ * This function updates p_memsz member of each PT_NOTE entry in the
+ * program header table pointed to by @ehdr_ptr to real size of ELF
+ * note segment.
+ */
+static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
-	Elf32_Ehdr *ehdr_ptr;
-	Elf32_Phdr phdr, *phdr_ptr;
+	int i, rc=0;
+	Elf32_Phdr *phdr_ptr;
 	Elf32_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
+	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
-		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -336,7 +455,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			return rc;
 		}
 		nhdr_ptr = notes_section;
-		for (j = 0; j < max_sz; j += sz) {
+		while (real_sz < max_sz) {
 			if (nhdr_ptr->n_namesz == 0)
 				break;
 			sz = sizeof(Elf32_Nhdr) +
@@ -345,26 +464,122 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
-		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
+		phdr_ptr->p_memsz = real_sz;
+	}
+
+	return 0;
+}
+
+/**
+ * get_note_number_and_size_elf32 - get the number of PT_NOTE program
+ * headers and sum of real size of their ELF note segment headers and
+ * data.
+ *
+ * @ehdr_ptr: ELF header
+ * @nr_ptnote: buffer for the number of PT_NOTE program headers
+ * @sz_ptnote: buffer for size of unique PT_NOTE program header
+ *
+ * This function is used to merge multiple PT_NOTE program headers
+ * into a unique single one. The resulting unique entry will have
+ * @sz_ptnote in its phdr->p_mem.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf32
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init get_note_number_and_size_elf32(const Elf32_Ehdr *ehdr_ptr,
+						 int *nr_ptnote, u64 *sz_ptnote)
+{
+	int i;
+	Elf32_Phdr *phdr_ptr;
+
+	*nr_ptnote = *sz_ptnote = 0;
+
+	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		*nr_ptnote += 1;
+		*sz_ptnote += phdr_ptr->p_memsz;
 	}
 
+	return 0;
+}
+
+/**
+ * copy_notes_elf32 - copy ELF note segments in a given buffer
+ *
+ * @ehdr_ptr: ELF header
+ * @notes_buf: buffer into which ELF note segments are copied
+ *
+ * This function is used to copy ELF note segment in the 1st kernel
+ * into the buffer @notes_buf in the 2nd kernel. It is assumed that
+ * size of the buffer @notes_buf is equal to or larger than sum of the
+ * real ELF note segment headers and data.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf32
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
+{
+	int i, rc=0;
+	Elf32_Phdr *phdr_ptr;
+
+	phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
+
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 offset;
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		offset = phdr_ptr->p_offset;
+		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+		if (rc < 0)
+			return rc;
+		notes_buf += phdr_ptr->p_memsz;
+	}
+
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz)
+{
+	int i, nr_ptnote=0, rc=0;
+	char *tmp;
+	Elf32_Ehdr *ehdr_ptr;
+	Elf32_Phdr phdr;
+	u64 phdr_sz = 0, note_off;
+
+	ehdr_ptr = (Elf32_Ehdr *)elfptr;
+
+	rc = update_note_header_size_elf32(ehdr_ptr);
+	if (rc < 0)
+		return rc;
+
+	rc = get_note_number_and_size_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz);
+	if (rc < 0)
+		return rc;
+
+	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
+	*notes_buf = vzalloc(*notes_sz);
+	if (!*notes_buf)
+		return -ENOMEM;
+
+	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
+	if (rc < 0)
+		return rc;
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf32_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -391,6 +606,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
  * the new offset fields of exported program headers. */
 static int __init process_ptload_program_headers_elf64(char *elfptr,
 						size_t elfsz,
+						size_t elfnotes_sz,
 						struct list_head *vc_list)
 {
 	int i;
@@ -402,9 +618,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
-	/* First program header is PT_NOTE header. */
-	vmcore_off = elfsz +
-			phdr_ptr->p_memsz; /* Note sections */
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		u64 paddr, start, end, size;
@@ -434,6 +649,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 
 static int __init process_ptload_program_headers_elf32(char *elfptr,
 						size_t elfsz,
+						size_t elfnotes_sz,
 						struct list_head *vc_list)
 {
 	int i;
@@ -445,9 +661,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
 
-	/* First program header is PT_NOTE header. */
-	vmcore_off = elfsz +
-			phdr_ptr->p_memsz; /* Note sections */
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		u64 paddr, start, end, size;
@@ -476,14 +691,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets(size_t elfsz,
+static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
 					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
 	struct vmcore *m;
 
-	/* Skip Elf header and program headers. */
-	vmcore_off = elfsz;
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -534,20 +749,22 @@ static int __init parse_crash_elf64_headers(void)
 	}
 
 	/* Merge all PT_NOTE headers into one. */
-	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
+				      &elfnotes_buf, &elfnotes_sz);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
-							&vmcore_list);
+						  elfnotes_sz,
+						  &vmcore_list);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
 	return 0;
 }
 
@@ -594,20 +811,22 @@ static int __init parse_crash_elf32_headers(void)
 	}
 
 	/* Merge all PT_NOTE headers into one. */
-	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
+				      &elfnotes_buf, &elfnotes_sz);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
-								&vmcore_list);
+						  elfnotes_sz,
+						  &vmcore_list);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
 	return 0;
 }
 
@@ -686,6 +905,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
+	vfree(elfnotes_buf);
+	elfnotes_buf = NULL;
 	free_pages((unsigned long)elfcorebuf,
 		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for
old kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of
the buffers are in per-cpu area.

ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.

This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
to the ELF note segment buffer and its size. There's no longer the
vmcore object that corresponds to the ELF note segment in
vmcore_list. Accordingly, read_vmcore() has new case for ELF note
segment and set_vmcore_list_offsets_elf{64,32}() and other helper
functions starts calculating offset from sum of size of ELF headers
and size of ELF note segment.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |  355 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 288 insertions(+), 67 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 686068d..937709d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -34,6 +34,9 @@ static char *elfcorebuf;
 static size_t elfcorebuf_sz;
 static size_t elfcorebuf_sz_orig;
 
+static char *elfnotes_buf;
+static size_t elfnotes_sz;
+
 /* Total size of vmcore file. */
 static u64 vmcore_size;
 
@@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
+	/* Read Elf note segment */
+	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
+		void *kaddr;
+
+		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
+		if (buflen < tsz)
+			tsz = buflen;
+		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
+		if (copy_to_user(buffer, kaddr, tsz))
+			return -EFAULT;
+		buflen -= tsz;
+		*fpos += tsz;
+		buffer += tsz;
+		acc += tsz;
+
+		/* leave now if filled buffer already */
+		if (buflen == 0)
+			return acc;
+	}
+
 	list_for_each_entry(m, &vmcore_list, list) {
 		if (*fpos < m->offset + m->size) {
 			tsz = m->offset + m->size - *fpos;
@@ -221,27 +244,27 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 	return size;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+/**
+ * update_note_header_size_elf64 - update p_memsz member of each PT_NOTE entry
+ *
+ * @ehdr_ptr: ELF header
+ *
+ * This function updates p_memsz member of each PT_NOTE entry in the
+ * program header table pointed to by @ehdr_ptr to real size of ELF
+ * note segment.
+ */
+static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
-	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr phdr, *phdr_ptr;
+	int i, rc=0;
+	Elf64_Phdr *phdr_ptr;
 	Elf64_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
+	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
-		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -253,7 +276,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			return rc;
 		}
 		nhdr_ptr = notes_section;
-		for (j = 0; j < max_sz; j += sz) {
+		while (real_sz < max_sz) {
 			if (nhdr_ptr->n_namesz == 0)
 				break;
 			sz = sizeof(Elf64_Nhdr) +
@@ -262,26 +285,122 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
-		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
+		phdr_ptr->p_memsz = real_sz;
+	}
+
+	return 0;
+}
+
+/**
+ * get_note_number_and_size_elf64 - get the number of PT_NOTE program
+ * headers and sum of real size of their ELF note segment headers and
+ * data.
+ *
+ * @ehdr_ptr: ELF header
+ * @nr_ptnote: buffer for the number of PT_NOTE program headers
+ * @sz_ptnote: buffer for size of unique PT_NOTE program header
+ *
+ * This function is used to merge multiple PT_NOTE program headers
+ * into a unique single one. The resulting unique entry will have
+ * @sz_ptnote in its phdr->p_mem.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf64
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
+						 int *nr_ptnote, u64 *sz_ptnote)
+{
+	int i;
+	Elf64_Phdr *phdr_ptr;
+
+	*nr_ptnote = *sz_ptnote = 0;
+
+	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		*nr_ptnote += 1;
+		*sz_ptnote += phdr_ptr->p_memsz;
 	}
 
+	return 0;
+}
+
+/**
+ * copy_notes_elf64 - copy ELF note segments in a given buffer
+ *
+ * @ehdr_ptr: ELF header
+ * @notes_buf: buffer into which ELF note segments are copied
+ *
+ * This function is used to copy ELF note segment in the 1st kernel
+ * into the buffer @notes_buf in the 2nd kernel. It is assumed that
+ * size of the buffer @notes_buf is equal to or larger than sum of the
+ * real ELF note segment headers and data.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf64
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
+{
+	int i, rc=0;
+	Elf64_Phdr *phdr_ptr;
+
+	phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
+
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 offset;
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		offset = phdr_ptr->p_offset;
+		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+		if (rc < 0)
+			return rc;
+		notes_buf += phdr_ptr->p_memsz;
+	}
+
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz)
+{
+	int i, nr_ptnote=0, rc=0;
+	char *tmp;
+	Elf64_Ehdr *ehdr_ptr;
+	Elf64_Phdr phdr;
+	u64 phdr_sz = 0, note_off;
+
+	ehdr_ptr = (Elf64_Ehdr *)elfptr;
+
+	rc = update_note_header_size_elf64(ehdr_ptr);
+	if (rc < 0)
+		return rc;
+
+	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
+	if (rc < 0)
+		return rc;
+
+	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
+	*notes_buf = vzalloc(*notes_sz);
+	if (!*notes_buf)
+		return -ENOMEM;
+
+	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
+	if (rc < 0)
+		return rc;
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf64_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -304,27 +423,27 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	return 0;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+/**
+ * update_note_header_size_elf32 - update p_memsz member of each PT_NOTE entry
+ *
+ * @ehdr_ptr: ELF header
+ *
+ * This function updates p_memsz member of each PT_NOTE entry in the
+ * program header table pointed to by @ehdr_ptr to real size of ELF
+ * note segment.
+ */
+static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
-	Elf32_Ehdr *ehdr_ptr;
-	Elf32_Phdr phdr, *phdr_ptr;
+	int i, rc=0;
+	Elf32_Phdr *phdr_ptr;
 	Elf32_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
+	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
-		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -336,7 +455,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			return rc;
 		}
 		nhdr_ptr = notes_section;
-		for (j = 0; j < max_sz; j += sz) {
+		while (real_sz < max_sz) {
 			if (nhdr_ptr->n_namesz == 0)
 				break;
 			sz = sizeof(Elf32_Nhdr) +
@@ -345,26 +464,122 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
-		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
+		phdr_ptr->p_memsz = real_sz;
+	}
+
+	return 0;
+}
+
+/**
+ * get_note_number_and_size_elf32 - get the number of PT_NOTE program
+ * headers and sum of real size of their ELF note segment headers and
+ * data.
+ *
+ * @ehdr_ptr: ELF header
+ * @nr_ptnote: buffer for the number of PT_NOTE program headers
+ * @sz_ptnote: buffer for size of unique PT_NOTE program header
+ *
+ * This function is used to merge multiple PT_NOTE program headers
+ * into a unique single one. The resulting unique entry will have
+ * @sz_ptnote in its phdr->p_mem.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf32
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init get_note_number_and_size_elf32(const Elf32_Ehdr *ehdr_ptr,
+						 int *nr_ptnote, u64 *sz_ptnote)
+{
+	int i;
+	Elf32_Phdr *phdr_ptr;
+
+	*nr_ptnote = *sz_ptnote = 0;
+
+	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		*nr_ptnote += 1;
+		*sz_ptnote += phdr_ptr->p_memsz;
 	}
 
+	return 0;
+}
+
+/**
+ * copy_notes_elf32 - copy ELF note segments in a given buffer
+ *
+ * @ehdr_ptr: ELF header
+ * @notes_buf: buffer into which ELF note segments are copied
+ *
+ * This function is used to copy ELF note segment in the 1st kernel
+ * into the buffer @notes_buf in the 2nd kernel. It is assumed that
+ * size of the buffer @notes_buf is equal to or larger than sum of the
+ * real ELF note segment headers and data.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf32
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
+{
+	int i, rc=0;
+	Elf32_Phdr *phdr_ptr;
+
+	phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
+
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 offset;
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		offset = phdr_ptr->p_offset;
+		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+		if (rc < 0)
+			return rc;
+		notes_buf += phdr_ptr->p_memsz;
+	}
+
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz)
+{
+	int i, nr_ptnote=0, rc=0;
+	char *tmp;
+	Elf32_Ehdr *ehdr_ptr;
+	Elf32_Phdr phdr;
+	u64 phdr_sz = 0, note_off;
+
+	ehdr_ptr = (Elf32_Ehdr *)elfptr;
+
+	rc = update_note_header_size_elf32(ehdr_ptr);
+	if (rc < 0)
+		return rc;
+
+	rc = get_note_number_and_size_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz);
+	if (rc < 0)
+		return rc;
+
+	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
+	*notes_buf = vzalloc(*notes_sz);
+	if (!*notes_buf)
+		return -ENOMEM;
+
+	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
+	if (rc < 0)
+		return rc;
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf32_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -391,6 +606,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
  * the new offset fields of exported program headers. */
 static int __init process_ptload_program_headers_elf64(char *elfptr,
 						size_t elfsz,
+						size_t elfnotes_sz,
 						struct list_head *vc_list)
 {
 	int i;
@@ -402,9 +618,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
-	/* First program header is PT_NOTE header. */
-	vmcore_off = elfsz +
-			phdr_ptr->p_memsz; /* Note sections */
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		u64 paddr, start, end, size;
@@ -434,6 +649,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 
 static int __init process_ptload_program_headers_elf32(char *elfptr,
 						size_t elfsz,
+						size_t elfnotes_sz,
 						struct list_head *vc_list)
 {
 	int i;
@@ -445,9 +661,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
 
-	/* First program header is PT_NOTE header. */
-	vmcore_off = elfsz +
-			phdr_ptr->p_memsz; /* Note sections */
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		u64 paddr, start, end, size;
@@ -476,14 +691,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets(size_t elfsz,
+static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
 					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
 	struct vmcore *m;
 
-	/* Skip Elf header and program headers. */
-	vmcore_off = elfsz;
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -534,20 +749,22 @@ static int __init parse_crash_elf64_headers(void)
 	}
 
 	/* Merge all PT_NOTE headers into one. */
-	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
+				      &elfnotes_buf, &elfnotes_sz);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
-							&vmcore_list);
+						  elfnotes_sz,
+						  &vmcore_list);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
 	return 0;
 }
 
@@ -594,20 +811,22 @@ static int __init parse_crash_elf32_headers(void)
 	}
 
 	/* Merge all PT_NOTE headers into one. */
-	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
+				      &elfnotes_buf, &elfnotes_sz);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
-								&vmcore_list);
+						  elfnotes_sz,
+						  &vmcore_list);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
 	return 0;
 }
 
@@ -686,6 +905,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
+	vfree(elfnotes_buf);
+	elfnotes_buf = NULL;
 	free_pages((unsigned long)elfcorebuf,
 		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for
old kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of
the buffers are in per-cpu area.

ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.

This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
to the ELF note segment buffer and its size. There's no longer the
vmcore object that corresponds to the ELF note segment in
vmcore_list. Accordingly, read_vmcore() has new case for ELF note
segment and set_vmcore_list_offsets_elf{64,32}() and other helper
functions starts calculating offset from sum of size of ELF headers
and size of ELF note segment.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |  355 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 288 insertions(+), 67 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 686068d..937709d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -34,6 +34,9 @@ static char *elfcorebuf;
 static size_t elfcorebuf_sz;
 static size_t elfcorebuf_sz_orig;
 
+static char *elfnotes_buf;
+static size_t elfnotes_sz;
+
 /* Total size of vmcore file. */
 static u64 vmcore_size;
 
@@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
+	/* Read Elf note segment */
+	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
+		void *kaddr;
+
+		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
+		if (buflen < tsz)
+			tsz = buflen;
+		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
+		if (copy_to_user(buffer, kaddr, tsz))
+			return -EFAULT;
+		buflen -= tsz;
+		*fpos += tsz;
+		buffer += tsz;
+		acc += tsz;
+
+		/* leave now if filled buffer already */
+		if (buflen == 0)
+			return acc;
+	}
+
 	list_for_each_entry(m, &vmcore_list, list) {
 		if (*fpos < m->offset + m->size) {
 			tsz = m->offset + m->size - *fpos;
@@ -221,27 +244,27 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
 	return size;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+/**
+ * update_note_header_size_elf64 - update p_memsz member of each PT_NOTE entry
+ *
+ * @ehdr_ptr: ELF header
+ *
+ * This function updates p_memsz member of each PT_NOTE entry in the
+ * program header table pointed to by @ehdr_ptr to real size of ELF
+ * note segment.
+ */
+static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
-	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr phdr, *phdr_ptr;
+	int i, rc=0;
+	Elf64_Phdr *phdr_ptr;
 	Elf64_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
 
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
+	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
-		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -253,7 +276,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			return rc;
 		}
 		nhdr_ptr = notes_section;
-		for (j = 0; j < max_sz; j += sz) {
+		while (real_sz < max_sz) {
 			if (nhdr_ptr->n_namesz == 0)
 				break;
 			sz = sizeof(Elf64_Nhdr) +
@@ -262,26 +285,122 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
-		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
+		phdr_ptr->p_memsz = real_sz;
+	}
+
+	return 0;
+}
+
+/**
+ * get_note_number_and_size_elf64 - get the number of PT_NOTE program
+ * headers and sum of real size of their ELF note segment headers and
+ * data.
+ *
+ * @ehdr_ptr: ELF header
+ * @nr_ptnote: buffer for the number of PT_NOTE program headers
+ * @sz_ptnote: buffer for size of unique PT_NOTE program header
+ *
+ * This function is used to merge multiple PT_NOTE program headers
+ * into a unique single one. The resulting unique entry will have
+ * @sz_ptnote in its phdr->p_mem.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf64
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
+						 int *nr_ptnote, u64 *sz_ptnote)
+{
+	int i;
+	Elf64_Phdr *phdr_ptr;
+
+	*nr_ptnote = *sz_ptnote = 0;
+
+	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		*nr_ptnote += 1;
+		*sz_ptnote += phdr_ptr->p_memsz;
 	}
 
+	return 0;
+}
+
+/**
+ * copy_notes_elf64 - copy ELF note segments in a given buffer
+ *
+ * @ehdr_ptr: ELF header
+ * @notes_buf: buffer into which ELF note segments are copied
+ *
+ * This function is used to copy ELF note segment in the 1st kernel
+ * into the buffer @notes_buf in the 2nd kernel. It is assumed that
+ * size of the buffer @notes_buf is equal to or larger than sum of the
+ * real ELF note segment headers and data.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf64
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
+{
+	int i, rc=0;
+	Elf64_Phdr *phdr_ptr;
+
+	phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
+
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 offset;
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		offset = phdr_ptr->p_offset;
+		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+		if (rc < 0)
+			return rc;
+		notes_buf += phdr_ptr->p_memsz;
+	}
+
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz)
+{
+	int i, nr_ptnote=0, rc=0;
+	char *tmp;
+	Elf64_Ehdr *ehdr_ptr;
+	Elf64_Phdr phdr;
+	u64 phdr_sz = 0, note_off;
+
+	ehdr_ptr = (Elf64_Ehdr *)elfptr;
+
+	rc = update_note_header_size_elf64(ehdr_ptr);
+	if (rc < 0)
+		return rc;
+
+	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
+	if (rc < 0)
+		return rc;
+
+	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
+	*notes_buf = vzalloc(*notes_sz);
+	if (!*notes_buf)
+		return -ENOMEM;
+
+	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
+	if (rc < 0)
+		return rc;
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf64_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -304,27 +423,27 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	return 0;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+/**
+ * update_note_header_size_elf32 - update p_memsz member of each PT_NOTE entry
+ *
+ * @ehdr_ptr: ELF header
+ *
+ * This function updates p_memsz member of each PT_NOTE entry in the
+ * program header table pointed to by @ehdr_ptr to real size of ELF
+ * note segment.
+ */
+static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
-	Elf32_Ehdr *ehdr_ptr;
-	Elf32_Phdr phdr, *phdr_ptr;
+	int i, rc=0;
+	Elf32_Phdr *phdr_ptr;
 	Elf32_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
+	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
-		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -336,7 +455,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			return rc;
 		}
 		nhdr_ptr = notes_section;
-		for (j = 0; j < max_sz; j += sz) {
+		while (real_sz < max_sz) {
 			if (nhdr_ptr->n_namesz == 0)
 				break;
 			sz = sizeof(Elf32_Nhdr) +
@@ -345,26 +464,122 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
-		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
+		phdr_ptr->p_memsz = real_sz;
+	}
+
+	return 0;
+}
+
+/**
+ * get_note_number_and_size_elf32 - get the number of PT_NOTE program
+ * headers and sum of real size of their ELF note segment headers and
+ * data.
+ *
+ * @ehdr_ptr: ELF header
+ * @nr_ptnote: buffer for the number of PT_NOTE program headers
+ * @sz_ptnote: buffer for size of unique PT_NOTE program header
+ *
+ * This function is used to merge multiple PT_NOTE program headers
+ * into a unique single one. The resulting unique entry will have
+ * @sz_ptnote in its phdr->p_mem.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf32
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init get_note_number_and_size_elf32(const Elf32_Ehdr *ehdr_ptr,
+						 int *nr_ptnote, u64 *sz_ptnote)
+{
+	int i;
+	Elf32_Phdr *phdr_ptr;
+
+	*nr_ptnote = *sz_ptnote = 0;
+
+	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		*nr_ptnote += 1;
+		*sz_ptnote += phdr_ptr->p_memsz;
 	}
 
+	return 0;
+}
+
+/**
+ * copy_notes_elf32 - copy ELF note segments in a given buffer
+ *
+ * @ehdr_ptr: ELF header
+ * @notes_buf: buffer into which ELF note segments are copied
+ *
+ * This function is used to copy ELF note segment in the 1st kernel
+ * into the buffer @notes_buf in the 2nd kernel. It is assumed that
+ * size of the buffer @notes_buf is equal to or larger than sum of the
+ * real ELF note segment headers and data.
+ *
+ * It is assumed that program headers with PT_NOTE type pointed to by
+ * @ehdr_ptr has already been updated by update_note_header_size_elf32
+ * and each of PT_NOTE program headers has actual ELF note segment
+ * size in its p_memsz member.
+ */
+static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
+{
+	int i, rc=0;
+	Elf32_Phdr *phdr_ptr;
+
+	phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
+
+	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 offset;
+		if (phdr_ptr->p_type != PT_NOTE)
+			continue;
+		offset = phdr_ptr->p_offset;
+		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
+		if (rc < 0)
+			return rc;
+		notes_buf += phdr_ptr->p_memsz;
+	}
+
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
+					   char **notes_buf, size_t *notes_sz)
+{
+	int i, nr_ptnote=0, rc=0;
+	char *tmp;
+	Elf32_Ehdr *ehdr_ptr;
+	Elf32_Phdr phdr;
+	u64 phdr_sz = 0, note_off;
+
+	ehdr_ptr = (Elf32_Ehdr *)elfptr;
+
+	rc = update_note_header_size_elf32(ehdr_ptr);
+	if (rc < 0)
+		return rc;
+
+	rc = get_note_number_and_size_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz);
+	if (rc < 0)
+		return rc;
+
+	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
+	*notes_buf = vzalloc(*notes_sz);
+	if (!*notes_buf)
+		return -ENOMEM;
+
+	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
+	if (rc < 0)
+		return rc;
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf32_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -391,6 +606,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
  * the new offset fields of exported program headers. */
 static int __init process_ptload_program_headers_elf64(char *elfptr,
 						size_t elfsz,
+						size_t elfnotes_sz,
 						struct list_head *vc_list)
 {
 	int i;
@@ -402,9 +618,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
-	/* First program header is PT_NOTE header. */
-	vmcore_off = elfsz +
-			phdr_ptr->p_memsz; /* Note sections */
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		u64 paddr, start, end, size;
@@ -434,6 +649,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 
 static int __init process_ptload_program_headers_elf32(char *elfptr,
 						size_t elfsz,
+						size_t elfnotes_sz,
 						struct list_head *vc_list)
 {
 	int i;
@@ -445,9 +661,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
 
-	/* First program header is PT_NOTE header. */
-	vmcore_off = elfsz +
-			phdr_ptr->p_memsz; /* Note sections */
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		u64 paddr, start, end, size;
@@ -476,14 +691,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets(size_t elfsz,
+static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
 					   struct list_head *vc_list)
 {
 	loff_t vmcore_off;
 	struct vmcore *m;
 
-	/* Skip Elf header and program headers. */
-	vmcore_off = elfsz;
+	/* Skip Elf header, program headers and Elf note segment. */
+	vmcore_off = elfsz + elfnotes_sz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
@@ -534,20 +749,22 @@ static int __init parse_crash_elf64_headers(void)
 	}
 
 	/* Merge all PT_NOTE headers into one. */
-	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
+				      &elfnotes_buf, &elfnotes_sz);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
-							&vmcore_list);
+						  elfnotes_sz,
+						  &vmcore_list);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
 	return 0;
 }
 
@@ -594,20 +811,22 @@ static int __init parse_crash_elf32_headers(void)
 	}
 
 	/* Merge all PT_NOTE headers into one. */
-	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
+	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
+				      &elfnotes_buf, &elfnotes_sz);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
-								&vmcore_list);
+						  elfnotes_sz,
+						  &vmcore_list);
 	if (rc) {
 		free_pages((unsigned long)elfcorebuf,
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
+	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
 	return 0;
 }
 
@@ -686,6 +905,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
+	vfree(elfnotes_buf);
+	elfnotes_buf = NULL;
 	free_pages((unsigned long)elfcorebuf,
 		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Now ELF note segment has been copied in the buffer on vmalloc
memory. To allow user process to remap the ELF note segment buffer
with remap_vmalloc_page, the corresponding VM area object has to have
VM_USERMAP flag set.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 937709d..9de4d91 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -375,6 +375,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	Elf64_Ehdr *ehdr_ptr;
 	Elf64_Phdr phdr;
 	u64 phdr_sz = 0, note_off;
+	struct vm_struct *vm;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 
@@ -391,6 +392,12 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	if (!*notes_buf)
 		return -ENOMEM;
 
+	/* Allow users to remap ELF note segment buffer on vmalloc
+	 * memory using remap_vmalloc_range. */
+	vm = find_vm_area(*notes_buf);
+	BUG_ON(!vm);
+	vm->flags |= VM_USERMAP;
+
 	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
 	if (rc < 0)
 		return rc;
@@ -554,6 +561,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	Elf32_Ehdr *ehdr_ptr;
 	Elf32_Phdr phdr;
 	u64 phdr_sz = 0, note_off;
+	struct vm_struct *vm;
 
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 
@@ -570,6 +578,12 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	if (!*notes_buf)
 		return -ENOMEM;
 
+	/* Allow users to remap ELF note segment buffer on vmalloc
+	 * memory using remap_vmalloc_range. */
+	vm = find_vm_area(*notes_buf);
+	BUG_ON(!vm);
+	vm->flags |= VM_USERMAP;
+
 	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
 	if (rc < 0)
 		return rc;


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

Now ELF note segment has been copied in the buffer on vmalloc
memory. To allow user process to remap the ELF note segment buffer
with remap_vmalloc_page, the corresponding VM area object has to have
VM_USERMAP flag set.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 937709d..9de4d91 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -375,6 +375,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	Elf64_Ehdr *ehdr_ptr;
 	Elf64_Phdr phdr;
 	u64 phdr_sz = 0, note_off;
+	struct vm_struct *vm;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 
@@ -391,6 +392,12 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	if (!*notes_buf)
 		return -ENOMEM;
 
+	/* Allow users to remap ELF note segment buffer on vmalloc
+	 * memory using remap_vmalloc_range. */
+	vm = find_vm_area(*notes_buf);
+	BUG_ON(!vm);
+	vm->flags |= VM_USERMAP;
+
 	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
 	if (rc < 0)
 		return rc;
@@ -554,6 +561,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	Elf32_Ehdr *ehdr_ptr;
 	Elf32_Phdr phdr;
 	u64 phdr_sz = 0, note_off;
+	struct vm_struct *vm;
 
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 
@@ -570,6 +578,12 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	if (!*notes_buf)
 		return -ENOMEM;
 
+	/* Allow users to remap ELF note segment buffer on vmalloc
+	 * memory using remap_vmalloc_range. */
+	vm = find_vm_area(*notes_buf);
+	BUG_ON(!vm);
+	vm->flags |= VM_USERMAP;
+
 	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
 	if (rc < 0)
 		return rc;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

Now ELF note segment has been copied in the buffer on vmalloc
memory. To allow user process to remap the ELF note segment buffer
with remap_vmalloc_page, the corresponding VM area object has to have
VM_USERMAP flag set.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 937709d..9de4d91 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -375,6 +375,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	Elf64_Ehdr *ehdr_ptr;
 	Elf64_Phdr phdr;
 	u64 phdr_sz = 0, note_off;
+	struct vm_struct *vm;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 
@@ -391,6 +392,12 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	if (!*notes_buf)
 		return -ENOMEM;
 
+	/* Allow users to remap ELF note segment buffer on vmalloc
+	 * memory using remap_vmalloc_range. */
+	vm = find_vm_area(*notes_buf);
+	BUG_ON(!vm);
+	vm->flags |= VM_USERMAP;
+
 	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
 	if (rc < 0)
 		return rc;
@@ -554,6 +561,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	Elf32_Ehdr *ehdr_ptr;
 	Elf32_Phdr phdr;
 	u64 phdr_sz = 0, note_off;
+	struct vm_struct *vm;
 
 	ehdr_ptr = (Elf32_Ehdr *)elfptr;
 
@@ -570,6 +578,12 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 	if (!*notes_buf)
 		return -ENOMEM;
 
+	/* Allow users to remap ELF note segment buffer on vmalloc
+	 * memory using remap_vmalloc_range. */
+	vm = find_vm_area(*notes_buf);
+	BUG_ON(!vm);
+	vm->flags |= VM_USERMAP;
+
 	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
 	if (rc < 0)
 		return rc;


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:

1) supporse m as a poitner to the last vmcore object in vmcore_list.
, then file size is (m->offset + m->size), or

2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.

Although 1) is more direct and simpler than 2), 2) seems better in
that it reflects internal object structure of /proc/vmcore. Thus, this
patch changes get_vmcore_size_elf{64, 32} so that it calculates size
in the way of 2).

As a result, both get_vmcore_size_elf{64, 32} have the same
definition. Merge them as get_vmcore_size.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   44 +++++++++++---------------------------------
 1 files changed, 11 insertions(+), 33 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 9de4d91..f71157d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -210,36 +210,15 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
+static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
+				  struct list_head *vc_list)
 {
-	int i;
 	u64 size;
-	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr *phdr_ptr;
-
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = elfsz;
-	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
-		phdr_ptr++;
-	}
-	return size;
-}
-
-static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
-{
-	int i;
-	u64 size;
-	Elf32_Ehdr *ehdr_ptr;
-	Elf32_Phdr *phdr_ptr;
+	struct vmcore *m;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-	size = elfsz;
-	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
-		phdr_ptr++;
+	size = elfsz + elfnotesegsz;
+	list_for_each_entry(m, vc_list, list) {
+		size += m->size;
 	}
 	return size;
 }
@@ -863,20 +842,19 @@ static int __init parse_crash_elf_headers(void)
 		rc = parse_crash_elf64_headers();
 		if (rc)
 			return rc;
-
-		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)
 			return rc;
-
-		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
 	} else {
 		pr_warn("Warning: Core image elf header is not sane\n");
 		return -EINVAL;
 	}
+
+	/* Determine vmcore size. */
+	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+				      &vmcore_list);
+
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:

1) supporse m as a poitner to the last vmcore object in vmcore_list.
, then file size is (m->offset + m->size), or

2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.

Although 1) is more direct and simpler than 2), 2) seems better in
that it reflects internal object structure of /proc/vmcore. Thus, this
patch changes get_vmcore_size_elf{64, 32} so that it calculates size
in the way of 2).

As a result, both get_vmcore_size_elf{64, 32} have the same
definition. Merge them as get_vmcore_size.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   44 +++++++++++---------------------------------
 1 files changed, 11 insertions(+), 33 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 9de4d91..f71157d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -210,36 +210,15 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
+static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
+				  struct list_head *vc_list)
 {
-	int i;
 	u64 size;
-	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr *phdr_ptr;
-
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = elfsz;
-	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
-		phdr_ptr++;
-	}
-	return size;
-}
-
-static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
-{
-	int i;
-	u64 size;
-	Elf32_Ehdr *ehdr_ptr;
-	Elf32_Phdr *phdr_ptr;
+	struct vmcore *m;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-	size = elfsz;
-	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
-		phdr_ptr++;
+	size = elfsz + elfnotesegsz;
+	list_for_each_entry(m, vc_list, list) {
+		size += m->size;
 	}
 	return size;
 }
@@ -863,20 +842,19 @@ static int __init parse_crash_elf_headers(void)
 		rc = parse_crash_elf64_headers();
 		if (rc)
 			return rc;
-
-		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)
 			return rc;
-
-		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
 	} else {
 		pr_warn("Warning: Core image elf header is not sane\n");
 		return -EINVAL;
 	}
+
+	/* Determine vmcore size. */
+	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+				      &vmcore_list);
+
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:

1) supporse m as a poitner to the last vmcore object in vmcore_list.
, then file size is (m->offset + m->size), or

2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.

Although 1) is more direct and simpler than 2), 2) seems better in
that it reflects internal object structure of /proc/vmcore. Thus, this
patch changes get_vmcore_size_elf{64, 32} so that it calculates size
in the way of 2).

As a result, both get_vmcore_size_elf{64, 32} have the same
definition. Merge them as get_vmcore_size.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   44 +++++++++++---------------------------------
 1 files changed, 11 insertions(+), 33 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 9de4d91..f71157d 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -210,36 +210,15 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
+static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
+				  struct list_head *vc_list)
 {
-	int i;
 	u64 size;
-	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr *phdr_ptr;
-
-	ehdr_ptr = (Elf64_Ehdr *)elfptr;
-	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = elfsz;
-	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
-		phdr_ptr++;
-	}
-	return size;
-}
-
-static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
-{
-	int i;
-	u64 size;
-	Elf32_Ehdr *ehdr_ptr;
-	Elf32_Phdr *phdr_ptr;
+	struct vmcore *m;
 
-	ehdr_ptr = (Elf32_Ehdr *)elfptr;
-	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
-	size = elfsz;
-	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
-		phdr_ptr++;
+	size = elfsz + elfnotesegsz;
+	list_for_each_entry(m, vc_list, list) {
+		size += m->size;
 	}
 	return size;
 }
@@ -863,20 +842,19 @@ static int __init parse_crash_elf_headers(void)
 		rc = parse_crash_elf64_headers();
 		if (rc)
 			return rc;
-
-		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)
 			return rc;
-
-		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
 	} else {
 		pr_warn("Warning: Core image elf header is not sane\n");
 		return -EINVAL;
 	}
+
+	/* Determine vmcore size. */
+	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
+				      &vmcore_list);
+
 	return 0;
 }
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

This patch introduces mmap_vmcore().

Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory.
Non-writable mapping is also requirement of remap_pfn_range() when
mapping linear pages on non-consecutive physical pages; see
is_cow_mapping().

Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single
vma. do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index f71157d..80221d7 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/crash_dump.h>
 #include <linux/list.h>
+#include <linux/vmalloc.h>
 #include <asm/uaccess.h>
 #include <asm/io.h>
 #include "internal.h"
@@ -200,9 +201,94 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return acc;
 }
 
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+	size_t size = vma->vm_end - vma->vm_start;
+	u64 start, end, len, tsz;
+	struct vmcore *m;
+
+	start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	end = start + size;
+
+	if (size > vmcore_size || end > vmcore_size)
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+	vma->vm_flags |= VM_MIXEDMAP;
+
+	len = 0;
+
+	if (start < elfcorebuf_sz) {
+		u64 pfn;
+
+		tsz = elfcorebuf_sz - start;
+		if (size < tsz)
+			tsz = size;
+		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+				    vma->vm_page_prot))
+			return -EAGAIN;
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	if (start < elfcorebuf_sz + elfnotes_sz) {
+		void *kaddr;
+
+		tsz = elfcorebuf_sz + elfnotes_sz - start;
+		if (size < tsz)
+			tsz = size;
+		kaddr = elfnotes_buf + start - elfcorebuf_sz;
+		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
+						kaddr, tsz)) {
+			do_munmap(vma->vm_mm, vma->vm_start, len);
+			return -EAGAIN;
+		}
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (start < m->offset + m->size) {
+			u64 paddr = 0;
+
+			tsz = m->offset + m->size - start;
+			if (size < tsz)
+				tsz = size;
+			paddr = m->paddr + start - m->offset;
+			if (remap_pfn_range(vma, vma->vm_start + len,
+					    paddr >> PAGE_SHIFT, tsz,
+					    vma->vm_page_prot)) {
+				do_munmap(vma->vm_mm, vma->vm_start, len);
+				return -EAGAIN;
+			}
+			size -= tsz;
+			start += tsz;
+			len += tsz;
+
+			if (size == 0)
+				return 0;
+		}
+	}
+
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
+	.mmap		= mmap_vmcore,
 };
 
 static struct vmcore* __init get_new_element(void)


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel,
	zhangyanfei, jingbai.ma, linux-mm, riel, walken, hughd,
	kosaki.motohiro

This patch introduces mmap_vmcore().

Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory.
Non-writable mapping is also requirement of remap_pfn_range() when
mapping linear pages on non-consecutive physical pages; see
is_cow_mapping().

Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single
vma. do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index f71157d..80221d7 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/crash_dump.h>
 #include <linux/list.h>
+#include <linux/vmalloc.h>
 #include <asm/uaccess.h>
 #include <asm/io.h>
 #include "internal.h"
@@ -200,9 +201,94 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return acc;
 }
 
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+	size_t size = vma->vm_end - vma->vm_start;
+	u64 start, end, len, tsz;
+	struct vmcore *m;
+
+	start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	end = start + size;
+
+	if (size > vmcore_size || end > vmcore_size)
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+	vma->vm_flags |= VM_MIXEDMAP;
+
+	len = 0;
+
+	if (start < elfcorebuf_sz) {
+		u64 pfn;
+
+		tsz = elfcorebuf_sz - start;
+		if (size < tsz)
+			tsz = size;
+		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+				    vma->vm_page_prot))
+			return -EAGAIN;
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	if (start < elfcorebuf_sz + elfnotes_sz) {
+		void *kaddr;
+
+		tsz = elfcorebuf_sz + elfnotes_sz - start;
+		if (size < tsz)
+			tsz = size;
+		kaddr = elfnotes_buf + start - elfcorebuf_sz;
+		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
+						kaddr, tsz)) {
+			do_munmap(vma->vm_mm, vma->vm_start, len);
+			return -EAGAIN;
+		}
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (start < m->offset + m->size) {
+			u64 paddr = 0;
+
+			tsz = m->offset + m->size - start;
+			if (size < tsz)
+				tsz = size;
+			paddr = m->paddr + start - m->offset;
+			if (remap_pfn_range(vma, vma->vm_start + len,
+					    paddr >> PAGE_SHIFT, tsz,
+					    vma->vm_page_prot)) {
+				do_munmap(vma->vm_mm, vma->vm_start, len);
+				return -EAGAIN;
+			}
+			size -= tsz;
+			start += tsz;
+			len += tsz;
+
+			if (size == 0)
+				return 0;
+		}
+	}
+
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
+	.mmap		= mmap_vmcore,
 };
 
 static struct vmcore* __init get_new_element(void)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-23  5:25   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-23  5:25 UTC (permalink / raw)
  To: vgoyal, ebiederm, akpm
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	zhangyanfei, kosaki.motohiro, kumagai-atsushi, walken, cpw,
	jingbai.ma

This patch introduces mmap_vmcore().

Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory.
Non-writable mapping is also requirement of remap_pfn_range() when
mapping linear pages on non-consecutive physical pages; see
is_cow_mapping().

Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single
vma. do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
---

 fs/proc/vmcore.c |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index f71157d..80221d7 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/crash_dump.h>
 #include <linux/list.h>
+#include <linux/vmalloc.h>
 #include <asm/uaccess.h>
 #include <asm/io.h>
 #include "internal.h"
@@ -200,9 +201,94 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return acc;
 }
 
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+	size_t size = vma->vm_end - vma->vm_start;
+	u64 start, end, len, tsz;
+	struct vmcore *m;
+
+	start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	end = start + size;
+
+	if (size > vmcore_size || end > vmcore_size)
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+	vma->vm_flags |= VM_MIXEDMAP;
+
+	len = 0;
+
+	if (start < elfcorebuf_sz) {
+		u64 pfn;
+
+		tsz = elfcorebuf_sz - start;
+		if (size < tsz)
+			tsz = size;
+		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+				    vma->vm_page_prot))
+			return -EAGAIN;
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	if (start < elfcorebuf_sz + elfnotes_sz) {
+		void *kaddr;
+
+		tsz = elfcorebuf_sz + elfnotes_sz - start;
+		if (size < tsz)
+			tsz = size;
+		kaddr = elfnotes_buf + start - elfcorebuf_sz;
+		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
+						kaddr, tsz)) {
+			do_munmap(vma->vm_mm, vma->vm_start, len);
+			return -EAGAIN;
+		}
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (start < m->offset + m->size) {
+			u64 paddr = 0;
+
+			tsz = m->offset + m->size - start;
+			if (size < tsz)
+				tsz = size;
+			paddr = m->paddr + start - m->offset;
+			if (remap_pfn_range(vma, vma->vm_start + len,
+					    paddr >> PAGE_SHIFT, tsz,
+					    vma->vm_page_prot)) {
+				do_munmap(vma->vm_mm, vma->vm_start, len);
+				return -EAGAIN;
+			}
+			size -= tsz;
+			start += tsz;
+			len += tsz;
+
+			if (size == 0)
+				return 0;
+		}
+	}
+
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
+	.mmap		= mmap_vmcore,
 };
 
 static struct vmcore* __init get_new_element(void)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 14:22     ` Vivek Goyal
  -1 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:22 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:07PM +0900, HATAYAMA Daisuke wrote:
> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   94 ++++++++++++++++++++++++------------------------------
>  1 files changed, 42 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index ab0c92e..80fea97 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
>  /* Stores the pointer to the buffer containing kernel elf core headers. */
>  static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>  
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
> @@ -186,7 +187,7 @@ static struct vmcore* __init get_new_element(void)
>  	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
>  }
>  
> -static u64 __init get_vmcore_size_elf64(char *elfptr)
> +static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
>  {
>  	int i;
>  	u64 size;
> @@ -195,7 +196,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>  
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> -	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> +	size = elfsz;
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>  		size += phdr_ptr->p_memsz;
>  		phdr_ptr++;
> @@ -203,7 +204,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>  	return size;
>  }
>  
> -static u64 __init get_vmcore_size_elf32(char *elfptr)
> +static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
>  {
>  	int i;
>  	u64 size;
> @@ -212,7 +213,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>  
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> -	size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> +	size = elfsz;
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>  		size += phdr_ptr->p_memsz;
>  		phdr_ptr++;
> @@ -294,6 +295,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>  	*elfsz = *elfsz - i;
>  	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> +	memset(elfptr + *elfsz, 0, i);
> +	*elfsz = roundup(*elfsz, PAGE_SIZE);
>  
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -375,6 +378,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>  	*elfsz = *elfsz - i;
>  	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
> +	memset(elfptr + *elfsz, 0, i);
> +	*elfsz = roundup(*elfsz, PAGE_SIZE);
>  
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -398,8 +403,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>  
>  	/* First program header is PT_NOTE header. */
> -	vmcore_off = sizeof(Elf64_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
> +	vmcore_off = elfsz +
>  			phdr_ptr->p_memsz; /* Note sections */
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -435,8 +439,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>  
>  	/* First program header is PT_NOTE header. */
> -	vmcore_off = sizeof(Elf32_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
> +	vmcore_off = elfsz +
>  			phdr_ptr->p_memsz; /* Note sections */
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -459,38 +462,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  }
>  
>  /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> -						struct list_head *vc_list)
> +static void __init set_vmcore_list_offsets(size_t elfsz,
> +					   struct list_head *vc_list)
>  {
>  	loff_t vmcore_off;
> -	Elf64_Ehdr *ehdr_ptr;
>  	struct vmcore *m;
>  
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -
> -	/* Skip Elf header and program headers. */
> -	vmcore_off = sizeof(Elf64_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
> -
> -	list_for_each_entry(m, vc_list, list) {
> -		m->offset = vmcore_off;
> -		vmcore_off += m->size;
> -	}
> -}
> -
> -/* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf32(char *elfptr,
> -						struct list_head *vc_list)
> -{
> -	loff_t vmcore_off;
> -	Elf32_Ehdr *ehdr_ptr;
> -	struct vmcore *m;
> -
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -
>  	/* Skip Elf header and program headers. */
> -	vmcore_off = sizeof(Elf32_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
> +	vmcore_off = elfsz;
>  
>  	list_for_each_entry(m, vc_list, list) {
>  		m->offset = vmcore_off;
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>  							&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>  								&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
> @@ -683,7 +672,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> -	kfree(elfcorebuf);
> +	free_pages((unsigned long)elfcorebuf,
> +		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;
>  }
>  EXPORT_SYMBOL_GPL(vmcore_cleanup);

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-05-23 14:22     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:22 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:07PM +0900, HATAYAMA Daisuke wrote:
> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   94 ++++++++++++++++++++++++------------------------------
>  1 files changed, 42 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index ab0c92e..80fea97 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
>  /* Stores the pointer to the buffer containing kernel elf core headers. */
>  static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>  
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
> @@ -186,7 +187,7 @@ static struct vmcore* __init get_new_element(void)
>  	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
>  }
>  
> -static u64 __init get_vmcore_size_elf64(char *elfptr)
> +static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
>  {
>  	int i;
>  	u64 size;
> @@ -195,7 +196,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>  
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> -	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> +	size = elfsz;
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>  		size += phdr_ptr->p_memsz;
>  		phdr_ptr++;
> @@ -203,7 +204,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>  	return size;
>  }
>  
> -static u64 __init get_vmcore_size_elf32(char *elfptr)
> +static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
>  {
>  	int i;
>  	u64 size;
> @@ -212,7 +213,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>  
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> -	size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> +	size = elfsz;
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>  		size += phdr_ptr->p_memsz;
>  		phdr_ptr++;
> @@ -294,6 +295,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>  	*elfsz = *elfsz - i;
>  	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> +	memset(elfptr + *elfsz, 0, i);
> +	*elfsz = roundup(*elfsz, PAGE_SIZE);
>  
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -375,6 +378,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>  	*elfsz = *elfsz - i;
>  	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
> +	memset(elfptr + *elfsz, 0, i);
> +	*elfsz = roundup(*elfsz, PAGE_SIZE);
>  
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -398,8 +403,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>  
>  	/* First program header is PT_NOTE header. */
> -	vmcore_off = sizeof(Elf64_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
> +	vmcore_off = elfsz +
>  			phdr_ptr->p_memsz; /* Note sections */
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -435,8 +439,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>  
>  	/* First program header is PT_NOTE header. */
> -	vmcore_off = sizeof(Elf32_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
> +	vmcore_off = elfsz +
>  			phdr_ptr->p_memsz; /* Note sections */
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -459,38 +462,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  }
>  
>  /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> -						struct list_head *vc_list)
> +static void __init set_vmcore_list_offsets(size_t elfsz,
> +					   struct list_head *vc_list)
>  {
>  	loff_t vmcore_off;
> -	Elf64_Ehdr *ehdr_ptr;
>  	struct vmcore *m;
>  
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -
> -	/* Skip Elf header and program headers. */
> -	vmcore_off = sizeof(Elf64_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
> -
> -	list_for_each_entry(m, vc_list, list) {
> -		m->offset = vmcore_off;
> -		vmcore_off += m->size;
> -	}
> -}
> -
> -/* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf32(char *elfptr,
> -						struct list_head *vc_list)
> -{
> -	loff_t vmcore_off;
> -	Elf32_Ehdr *ehdr_ptr;
> -	struct vmcore *m;
> -
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -
>  	/* Skip Elf header and program headers. */
> -	vmcore_off = sizeof(Elf32_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
> +	vmcore_off = elfsz;
>  
>  	list_for_each_entry(m, vc_list, list) {
>  		m->offset = vmcore_off;
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>  							&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>  								&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
> @@ -683,7 +672,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> -	kfree(elfcorebuf);
> +	free_pages((unsigned long)elfcorebuf,
> +		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;
>  }
>  EXPORT_SYMBOL_GPL(vmcore_cleanup);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-05-23 14:22     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:22 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	kumagai-atsushi, ebiederm, kosaki.motohiro, zhangyanfei, akpm,
	walken, cpw, jingbai.ma

On Thu, May 23, 2013 at 02:25:07PM +0900, HATAYAMA Daisuke wrote:
> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   94 ++++++++++++++++++++++++------------------------------
>  1 files changed, 42 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index ab0c92e..80fea97 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -32,6 +32,7 @@ static LIST_HEAD(vmcore_list);
>  /* Stores the pointer to the buffer containing kernel elf core headers. */
>  static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>  
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
> @@ -186,7 +187,7 @@ static struct vmcore* __init get_new_element(void)
>  	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
>  }
>  
> -static u64 __init get_vmcore_size_elf64(char *elfptr)
> +static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
>  {
>  	int i;
>  	u64 size;
> @@ -195,7 +196,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>  
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> -	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
> +	size = elfsz;
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>  		size += phdr_ptr->p_memsz;
>  		phdr_ptr++;
> @@ -203,7 +204,7 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
>  	return size;
>  }
>  
> -static u64 __init get_vmcore_size_elf32(char *elfptr)
> +static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
>  {
>  	int i;
>  	u64 size;
> @@ -212,7 +213,7 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
>  
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> -	size = sizeof(Elf32_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr));
> +	size = elfsz;
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
>  		size += phdr_ptr->p_memsz;
>  		phdr_ptr++;
> @@ -294,6 +295,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>  	*elfsz = *elfsz - i;
>  	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> +	memset(elfptr + *elfsz, 0, i);
> +	*elfsz = roundup(*elfsz, PAGE_SIZE);
>  
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -375,6 +378,8 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	i = (nr_ptnote - 1) * sizeof(Elf32_Phdr);
>  	*elfsz = *elfsz - i;
>  	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf32_Ehdr)-sizeof(Elf32_Phdr)));
> +	memset(elfptr + *elfsz, 0, i);
> +	*elfsz = roundup(*elfsz, PAGE_SIZE);
>  
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
> @@ -398,8 +403,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>  
>  	/* First program header is PT_NOTE header. */
> -	vmcore_off = sizeof(Elf64_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
> +	vmcore_off = elfsz +
>  			phdr_ptr->p_memsz; /* Note sections */
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -435,8 +439,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>  
>  	/* First program header is PT_NOTE header. */
> -	vmcore_off = sizeof(Elf32_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr) +
> +	vmcore_off = elfsz +
>  			phdr_ptr->p_memsz; /* Note sections */
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -459,38 +462,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  }
>  
>  /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf64(char *elfptr,
> -						struct list_head *vc_list)
> +static void __init set_vmcore_list_offsets(size_t elfsz,
> +					   struct list_head *vc_list)
>  {
>  	loff_t vmcore_off;
> -	Elf64_Ehdr *ehdr_ptr;
>  	struct vmcore *m;
>  
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -
> -	/* Skip Elf header and program headers. */
> -	vmcore_off = sizeof(Elf64_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
> -
> -	list_for_each_entry(m, vc_list, list) {
> -		m->offset = vmcore_off;
> -		vmcore_off += m->size;
> -	}
> -}
> -
> -/* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets_elf32(char *elfptr,
> -						struct list_head *vc_list)
> -{
> -	loff_t vmcore_off;
> -	Elf32_Ehdr *ehdr_ptr;
> -	struct vmcore *m;
> -
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -
>  	/* Skip Elf header and program headers. */
> -	vmcore_off = sizeof(Elf32_Ehdr) +
> -			(ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
> +	vmcore_off = elfsz;
>  
>  	list_for_each_entry(m, vc_list, list) {
>  		m->offset = vmcore_off;
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>  							&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>  								&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
> @@ -683,7 +672,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> -	kfree(elfcorebuf);
> +	free_pages((unsigned long)elfcorebuf,
> +		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;
>  }
>  EXPORT_SYMBOL_GPL(vmcore_cleanup);

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 14:28     ` Vivek Goyal
  -1 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:28 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:30PM +0900, HATAYAMA Daisuke wrote:
> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |  355 ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 288 insertions(+), 67 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 686068d..937709d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -34,6 +34,9 @@ static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
>  static size_t elfcorebuf_sz_orig;
>  
> +static char *elfnotes_buf;
> +static size_t elfnotes_sz;
> +
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
>  
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> +	/* Read Elf note segment */
> +	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> +		void *kaddr;
> +
> +		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> +		if (buflen < tsz)
> +			tsz = buflen;
> +		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
> +		if (copy_to_user(buffer, kaddr, tsz))
> +			return -EFAULT;
> +		buflen -= tsz;
> +		*fpos += tsz;
> +		buffer += tsz;
> +		acc += tsz;
> +
> +		/* leave now if filled buffer already */
> +		if (buflen == 0)
> +			return acc;
> +	}
> +
>  	list_for_each_entry(m, &vmcore_list, list) {
>  		if (*fpos < m->offset + m->size) {
>  			tsz = m->offset + m->size - *fpos;
> @@ -221,27 +244,27 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
>  	return size;
>  }
>  
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> -						struct list_head *vc_list)
> +/**
> + * update_note_header_size_elf64 - update p_memsz member of each PT_NOTE entry
> + *
> + * @ehdr_ptr: ELF header
> + *
> + * This function updates p_memsz member of each PT_NOTE entry in the
> + * program header table pointed to by @ehdr_ptr to real size of ELF
> + * note segment.
> + */
> +static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>  {
> -	int i, nr_ptnote=0, rc=0;
> -	char *tmp;
> -	Elf64_Ehdr *ehdr_ptr;
> -	Elf64_Phdr phdr, *phdr_ptr;
> +	int i, rc=0;
> +	Elf64_Phdr *phdr_ptr;
>  	Elf64_Nhdr *nhdr_ptr;
> -	u64 phdr_sz = 0, note_off;
>  
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> +	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> -		int j;
>  		void *notes_section;
> -		struct vmcore *new;
>  		u64 offset, max_sz, sz, real_sz = 0;
>  		if (phdr_ptr->p_type != PT_NOTE)
>  			continue;
> -		nr_ptnote++;
>  		max_sz = phdr_ptr->p_memsz;
>  		offset = phdr_ptr->p_offset;
>  		notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -253,7 +276,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  			return rc;
>  		}
>  		nhdr_ptr = notes_section;
> -		for (j = 0; j < max_sz; j += sz) {
> +		while (real_sz < max_sz) {
>  			if (nhdr_ptr->n_namesz == 0)
>  				break;
>  			sz = sizeof(Elf64_Nhdr) +
> @@ -262,26 +285,122 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  			real_sz += sz;
>  			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
>  		}
> -
> -		/* Add this contiguous chunk of notes section to vmcore list.*/
> -		new = get_new_element();
> -		if (!new) {
> -			kfree(notes_section);
> -			return -ENOMEM;
> -		}
> -		new->paddr = phdr_ptr->p_offset;
> -		new->size = real_sz;
> -		list_add_tail(&new->list, vc_list);
> -		phdr_sz += real_sz;
>  		kfree(notes_section);
> +		phdr_ptr->p_memsz = real_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * get_note_number_and_size_elf64 - get the number of PT_NOTE program
> + * headers and sum of real size of their ELF note segment headers and
> + * data.
> + *
> + * @ehdr_ptr: ELF header
> + * @nr_ptnote: buffer for the number of PT_NOTE program headers
> + * @sz_ptnote: buffer for size of unique PT_NOTE program header
> + *
> + * This function is used to merge multiple PT_NOTE program headers
> + * into a unique single one. The resulting unique entry will have
> + * @sz_ptnote in its phdr->p_mem.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf64
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
> +						 int *nr_ptnote, u64 *sz_ptnote)
> +{
> +	int i;
> +	Elf64_Phdr *phdr_ptr;
> +
> +	*nr_ptnote = *sz_ptnote = 0;
> +
> +	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		*nr_ptnote += 1;
> +		*sz_ptnote += phdr_ptr->p_memsz;
>  	}
>  
> +	return 0;
> +}
> +
> +/**
> + * copy_notes_elf64 - copy ELF note segments in a given buffer
> + *
> + * @ehdr_ptr: ELF header
> + * @notes_buf: buffer into which ELF note segments are copied
> + *
> + * This function is used to copy ELF note segment in the 1st kernel
> + * into the buffer @notes_buf in the 2nd kernel. It is assumed that
> + * size of the buffer @notes_buf is equal to or larger than sum of the
> + * real ELF note segment headers and data.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf64
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
> +{
> +	int i, rc=0;
> +	Elf64_Phdr *phdr_ptr;
> +
> +	phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
> +
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		u64 offset;
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		offset = phdr_ptr->p_offset;
> +		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> +		if (rc < 0)
> +			return rc;
> +		notes_buf += phdr_ptr->p_memsz;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf64_Ehdr *ehdr_ptr;
> +	Elf64_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf64(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);
> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -304,27 +423,27 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	return 0;
>  }
>  
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> -						struct list_head *vc_list)
> +/**
> + * update_note_header_size_elf32 - update p_memsz member of each PT_NOTE entry
> + *
> + * @ehdr_ptr: ELF header
> + *
> + * This function updates p_memsz member of each PT_NOTE entry in the
> + * program header table pointed to by @ehdr_ptr to real size of ELF
> + * note segment.
> + */
> +static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
>  {
> -	int i, nr_ptnote=0, rc=0;
> -	char *tmp;
> -	Elf32_Ehdr *ehdr_ptr;
> -	Elf32_Phdr phdr, *phdr_ptr;
> +	int i, rc=0;
> +	Elf32_Phdr *phdr_ptr;
>  	Elf32_Nhdr *nhdr_ptr;
> -	u64 phdr_sz = 0, note_off;
>  
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> +	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> -		int j;
>  		void *notes_section;
> -		struct vmcore *new;
>  		u64 offset, max_sz, sz, real_sz = 0;
>  		if (phdr_ptr->p_type != PT_NOTE)
>  			continue;
> -		nr_ptnote++;
>  		max_sz = phdr_ptr->p_memsz;
>  		offset = phdr_ptr->p_offset;
>  		notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -336,7 +455,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			return rc;
>  		}
>  		nhdr_ptr = notes_section;
> -		for (j = 0; j < max_sz; j += sz) {
> +		while (real_sz < max_sz) {
>  			if (nhdr_ptr->n_namesz == 0)
>  				break;
>  			sz = sizeof(Elf32_Nhdr) +
> @@ -345,26 +464,122 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			real_sz += sz;
>  			nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
>  		}
> -
> -		/* Add this contiguous chunk of notes section to vmcore list.*/
> -		new = get_new_element();
> -		if (!new) {
> -			kfree(notes_section);
> -			return -ENOMEM;
> -		}
> -		new->paddr = phdr_ptr->p_offset;
> -		new->size = real_sz;
> -		list_add_tail(&new->list, vc_list);
> -		phdr_sz += real_sz;
>  		kfree(notes_section);
> +		phdr_ptr->p_memsz = real_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * get_note_number_and_size_elf32 - get the number of PT_NOTE program
> + * headers and sum of real size of their ELF note segment headers and
> + * data.
> + *
> + * @ehdr_ptr: ELF header
> + * @nr_ptnote: buffer for the number of PT_NOTE program headers
> + * @sz_ptnote: buffer for size of unique PT_NOTE program header
> + *
> + * This function is used to merge multiple PT_NOTE program headers
> + * into a unique single one. The resulting unique entry will have
> + * @sz_ptnote in its phdr->p_mem.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf32
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init get_note_number_and_size_elf32(const Elf32_Ehdr *ehdr_ptr,
> +						 int *nr_ptnote, u64 *sz_ptnote)
> +{
> +	int i;
> +	Elf32_Phdr *phdr_ptr;
> +
> +	*nr_ptnote = *sz_ptnote = 0;
> +
> +	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		*nr_ptnote += 1;
> +		*sz_ptnote += phdr_ptr->p_memsz;
>  	}
>  
> +	return 0;
> +}
> +
> +/**
> + * copy_notes_elf32 - copy ELF note segments in a given buffer
> + *
> + * @ehdr_ptr: ELF header
> + * @notes_buf: buffer into which ELF note segments are copied
> + *
> + * This function is used to copy ELF note segment in the 1st kernel
> + * into the buffer @notes_buf in the 2nd kernel. It is assumed that
> + * size of the buffer @notes_buf is equal to or larger than sum of the
> + * real ELF note segment headers and data.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf32
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
> +{
> +	int i, rc=0;
> +	Elf32_Phdr *phdr_ptr;
> +
> +	phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
> +
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		u64 offset;
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		offset = phdr_ptr->p_offset;
> +		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> +		if (rc < 0)
> +			return rc;
> +		notes_buf += phdr_ptr->p_memsz;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf32_Ehdr *ehdr_ptr;
> +	Elf32_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf32(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);
> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf32_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -391,6 +606,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>   * the new offset fields of exported program headers. */
>  static int __init process_ptload_program_headers_elf64(char *elfptr,
>  						size_t elfsz,
> +						size_t elfnotes_sz,
>  						struct list_head *vc_list)
>  {
>  	int i;
> @@ -402,9 +618,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>  
> -	/* First program header is PT_NOTE header. */
> -	vmcore_off = elfsz +
> -			phdr_ptr->p_memsz; /* Note sections */
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>  		u64 paddr, start, end, size;
> @@ -434,6 +649,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  
>  static int __init process_ptload_program_headers_elf32(char *elfptr,
>  						size_t elfsz,
> +						size_t elfnotes_sz,
>  						struct list_head *vc_list)
>  {
>  	int i;
> @@ -445,9 +661,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>  
> -	/* First program header is PT_NOTE header. */
> -	vmcore_off = elfsz +
> -			phdr_ptr->p_memsz; /* Note sections */
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>  		u64 paddr, start, end, size;
> @@ -476,14 +691,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  }
>  
>  /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets(size_t elfsz,
> +static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
>  					   struct list_head *vc_list)
>  {
>  	loff_t vmcore_off;
>  	struct vmcore *m;
>  
> -	/* Skip Elf header and program headers. */
> -	vmcore_off = elfsz;
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	list_for_each_entry(m, vc_list, list) {
>  		m->offset = vmcore_off;
> @@ -534,20 +749,22 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
> -	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> +	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
> +				      &elfnotes_buf, &elfnotes_sz);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
> -							&vmcore_list);
> +						  elfnotes_sz,
> +						  &vmcore_list);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -594,20 +811,22 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
> -	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> +	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
> +				      &elfnotes_buf, &elfnotes_sz);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
> -								&vmcore_list);
> +						  elfnotes_sz,
> +						  &vmcore_list);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -686,6 +905,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> +	vfree(elfnotes_buf);
> +	elfnotes_buf = NULL;
>  	free_pages((unsigned long)elfcorebuf,
>  		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
@ 2013-05-23 14:28     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:28 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:30PM +0900, HATAYAMA Daisuke wrote:
> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |  355 ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 288 insertions(+), 67 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 686068d..937709d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -34,6 +34,9 @@ static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
>  static size_t elfcorebuf_sz_orig;
>  
> +static char *elfnotes_buf;
> +static size_t elfnotes_sz;
> +
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
>  
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> +	/* Read Elf note segment */
> +	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> +		void *kaddr;
> +
> +		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> +		if (buflen < tsz)
> +			tsz = buflen;
> +		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
> +		if (copy_to_user(buffer, kaddr, tsz))
> +			return -EFAULT;
> +		buflen -= tsz;
> +		*fpos += tsz;
> +		buffer += tsz;
> +		acc += tsz;
> +
> +		/* leave now if filled buffer already */
> +		if (buflen == 0)
> +			return acc;
> +	}
> +
>  	list_for_each_entry(m, &vmcore_list, list) {
>  		if (*fpos < m->offset + m->size) {
>  			tsz = m->offset + m->size - *fpos;
> @@ -221,27 +244,27 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
>  	return size;
>  }
>  
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> -						struct list_head *vc_list)
> +/**
> + * update_note_header_size_elf64 - update p_memsz member of each PT_NOTE entry
> + *
> + * @ehdr_ptr: ELF header
> + *
> + * This function updates p_memsz member of each PT_NOTE entry in the
> + * program header table pointed to by @ehdr_ptr to real size of ELF
> + * note segment.
> + */
> +static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>  {
> -	int i, nr_ptnote=0, rc=0;
> -	char *tmp;
> -	Elf64_Ehdr *ehdr_ptr;
> -	Elf64_Phdr phdr, *phdr_ptr;
> +	int i, rc=0;
> +	Elf64_Phdr *phdr_ptr;
>  	Elf64_Nhdr *nhdr_ptr;
> -	u64 phdr_sz = 0, note_off;
>  
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> +	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> -		int j;
>  		void *notes_section;
> -		struct vmcore *new;
>  		u64 offset, max_sz, sz, real_sz = 0;
>  		if (phdr_ptr->p_type != PT_NOTE)
>  			continue;
> -		nr_ptnote++;
>  		max_sz = phdr_ptr->p_memsz;
>  		offset = phdr_ptr->p_offset;
>  		notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -253,7 +276,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  			return rc;
>  		}
>  		nhdr_ptr = notes_section;
> -		for (j = 0; j < max_sz; j += sz) {
> +		while (real_sz < max_sz) {
>  			if (nhdr_ptr->n_namesz == 0)
>  				break;
>  			sz = sizeof(Elf64_Nhdr) +
> @@ -262,26 +285,122 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  			real_sz += sz;
>  			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
>  		}
> -
> -		/* Add this contiguous chunk of notes section to vmcore list.*/
> -		new = get_new_element();
> -		if (!new) {
> -			kfree(notes_section);
> -			return -ENOMEM;
> -		}
> -		new->paddr = phdr_ptr->p_offset;
> -		new->size = real_sz;
> -		list_add_tail(&new->list, vc_list);
> -		phdr_sz += real_sz;
>  		kfree(notes_section);
> +		phdr_ptr->p_memsz = real_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * get_note_number_and_size_elf64 - get the number of PT_NOTE program
> + * headers and sum of real size of their ELF note segment headers and
> + * data.
> + *
> + * @ehdr_ptr: ELF header
> + * @nr_ptnote: buffer for the number of PT_NOTE program headers
> + * @sz_ptnote: buffer for size of unique PT_NOTE program header
> + *
> + * This function is used to merge multiple PT_NOTE program headers
> + * into a unique single one. The resulting unique entry will have
> + * @sz_ptnote in its phdr->p_mem.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf64
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
> +						 int *nr_ptnote, u64 *sz_ptnote)
> +{
> +	int i;
> +	Elf64_Phdr *phdr_ptr;
> +
> +	*nr_ptnote = *sz_ptnote = 0;
> +
> +	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		*nr_ptnote += 1;
> +		*sz_ptnote += phdr_ptr->p_memsz;
>  	}
>  
> +	return 0;
> +}
> +
> +/**
> + * copy_notes_elf64 - copy ELF note segments in a given buffer
> + *
> + * @ehdr_ptr: ELF header
> + * @notes_buf: buffer into which ELF note segments are copied
> + *
> + * This function is used to copy ELF note segment in the 1st kernel
> + * into the buffer @notes_buf in the 2nd kernel. It is assumed that
> + * size of the buffer @notes_buf is equal to or larger than sum of the
> + * real ELF note segment headers and data.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf64
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
> +{
> +	int i, rc=0;
> +	Elf64_Phdr *phdr_ptr;
> +
> +	phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
> +
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		u64 offset;
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		offset = phdr_ptr->p_offset;
> +		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> +		if (rc < 0)
> +			return rc;
> +		notes_buf += phdr_ptr->p_memsz;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf64_Ehdr *ehdr_ptr;
> +	Elf64_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf64(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);
> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -304,27 +423,27 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	return 0;
>  }
>  
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> -						struct list_head *vc_list)
> +/**
> + * update_note_header_size_elf32 - update p_memsz member of each PT_NOTE entry
> + *
> + * @ehdr_ptr: ELF header
> + *
> + * This function updates p_memsz member of each PT_NOTE entry in the
> + * program header table pointed to by @ehdr_ptr to real size of ELF
> + * note segment.
> + */
> +static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
>  {
> -	int i, nr_ptnote=0, rc=0;
> -	char *tmp;
> -	Elf32_Ehdr *ehdr_ptr;
> -	Elf32_Phdr phdr, *phdr_ptr;
> +	int i, rc=0;
> +	Elf32_Phdr *phdr_ptr;
>  	Elf32_Nhdr *nhdr_ptr;
> -	u64 phdr_sz = 0, note_off;
>  
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> +	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> -		int j;
>  		void *notes_section;
> -		struct vmcore *new;
>  		u64 offset, max_sz, sz, real_sz = 0;
>  		if (phdr_ptr->p_type != PT_NOTE)
>  			continue;
> -		nr_ptnote++;
>  		max_sz = phdr_ptr->p_memsz;
>  		offset = phdr_ptr->p_offset;
>  		notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -336,7 +455,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			return rc;
>  		}
>  		nhdr_ptr = notes_section;
> -		for (j = 0; j < max_sz; j += sz) {
> +		while (real_sz < max_sz) {
>  			if (nhdr_ptr->n_namesz == 0)
>  				break;
>  			sz = sizeof(Elf32_Nhdr) +
> @@ -345,26 +464,122 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			real_sz += sz;
>  			nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
>  		}
> -
> -		/* Add this contiguous chunk of notes section to vmcore list.*/
> -		new = get_new_element();
> -		if (!new) {
> -			kfree(notes_section);
> -			return -ENOMEM;
> -		}
> -		new->paddr = phdr_ptr->p_offset;
> -		new->size = real_sz;
> -		list_add_tail(&new->list, vc_list);
> -		phdr_sz += real_sz;
>  		kfree(notes_section);
> +		phdr_ptr->p_memsz = real_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * get_note_number_and_size_elf32 - get the number of PT_NOTE program
> + * headers and sum of real size of their ELF note segment headers and
> + * data.
> + *
> + * @ehdr_ptr: ELF header
> + * @nr_ptnote: buffer for the number of PT_NOTE program headers
> + * @sz_ptnote: buffer for size of unique PT_NOTE program header
> + *
> + * This function is used to merge multiple PT_NOTE program headers
> + * into a unique single one. The resulting unique entry will have
> + * @sz_ptnote in its phdr->p_mem.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf32
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init get_note_number_and_size_elf32(const Elf32_Ehdr *ehdr_ptr,
> +						 int *nr_ptnote, u64 *sz_ptnote)
> +{
> +	int i;
> +	Elf32_Phdr *phdr_ptr;
> +
> +	*nr_ptnote = *sz_ptnote = 0;
> +
> +	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		*nr_ptnote += 1;
> +		*sz_ptnote += phdr_ptr->p_memsz;
>  	}
>  
> +	return 0;
> +}
> +
> +/**
> + * copy_notes_elf32 - copy ELF note segments in a given buffer
> + *
> + * @ehdr_ptr: ELF header
> + * @notes_buf: buffer into which ELF note segments are copied
> + *
> + * This function is used to copy ELF note segment in the 1st kernel
> + * into the buffer @notes_buf in the 2nd kernel. It is assumed that
> + * size of the buffer @notes_buf is equal to or larger than sum of the
> + * real ELF note segment headers and data.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf32
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
> +{
> +	int i, rc=0;
> +	Elf32_Phdr *phdr_ptr;
> +
> +	phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
> +
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		u64 offset;
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		offset = phdr_ptr->p_offset;
> +		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> +		if (rc < 0)
> +			return rc;
> +		notes_buf += phdr_ptr->p_memsz;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf32_Ehdr *ehdr_ptr;
> +	Elf32_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf32(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);
> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf32_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -391,6 +606,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>   * the new offset fields of exported program headers. */
>  static int __init process_ptload_program_headers_elf64(char *elfptr,
>  						size_t elfsz,
> +						size_t elfnotes_sz,
>  						struct list_head *vc_list)
>  {
>  	int i;
> @@ -402,9 +618,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>  
> -	/* First program header is PT_NOTE header. */
> -	vmcore_off = elfsz +
> -			phdr_ptr->p_memsz; /* Note sections */
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>  		u64 paddr, start, end, size;
> @@ -434,6 +649,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  
>  static int __init process_ptload_program_headers_elf32(char *elfptr,
>  						size_t elfsz,
> +						size_t elfnotes_sz,
>  						struct list_head *vc_list)
>  {
>  	int i;
> @@ -445,9 +661,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>  
> -	/* First program header is PT_NOTE header. */
> -	vmcore_off = elfsz +
> -			phdr_ptr->p_memsz; /* Note sections */
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>  		u64 paddr, start, end, size;
> @@ -476,14 +691,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  }
>  
>  /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets(size_t elfsz,
> +static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
>  					   struct list_head *vc_list)
>  {
>  	loff_t vmcore_off;
>  	struct vmcore *m;
>  
> -	/* Skip Elf header and program headers. */
> -	vmcore_off = elfsz;
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	list_for_each_entry(m, vc_list, list) {
>  		m->offset = vmcore_off;
> @@ -534,20 +749,22 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
> -	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> +	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
> +				      &elfnotes_buf, &elfnotes_sz);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
> -							&vmcore_list);
> +						  elfnotes_sz,
> +						  &vmcore_list);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -594,20 +811,22 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
> -	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> +	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
> +				      &elfnotes_buf, &elfnotes_sz);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
> -								&vmcore_list);
> +						  elfnotes_sz,
> +						  &vmcore_list);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -686,6 +905,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> +	vfree(elfnotes_buf);
> +	elfnotes_buf = NULL;
>  	free_pages((unsigned long)elfcorebuf,
>  		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
@ 2013-05-23 14:28     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:28 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	kumagai-atsushi, ebiederm, kosaki.motohiro, zhangyanfei, akpm,
	walken, cpw, jingbai.ma

On Thu, May 23, 2013 at 02:25:30PM +0900, HATAYAMA Daisuke wrote:
> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |  355 ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 288 insertions(+), 67 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 686068d..937709d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -34,6 +34,9 @@ static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
>  static size_t elfcorebuf_sz_orig;
>  
> +static char *elfnotes_buf;
> +static size_t elfnotes_sz;
> +
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
>  
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> +	/* Read Elf note segment */
> +	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> +		void *kaddr;
> +
> +		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> +		if (buflen < tsz)
> +			tsz = buflen;
> +		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
> +		if (copy_to_user(buffer, kaddr, tsz))
> +			return -EFAULT;
> +		buflen -= tsz;
> +		*fpos += tsz;
> +		buffer += tsz;
> +		acc += tsz;
> +
> +		/* leave now if filled buffer already */
> +		if (buflen == 0)
> +			return acc;
> +	}
> +
>  	list_for_each_entry(m, &vmcore_list, list) {
>  		if (*fpos < m->offset + m->size) {
>  			tsz = m->offset + m->size - *fpos;
> @@ -221,27 +244,27 @@ static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
>  	return size;
>  }
>  
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> -						struct list_head *vc_list)
> +/**
> + * update_note_header_size_elf64 - update p_memsz member of each PT_NOTE entry
> + *
> + * @ehdr_ptr: ELF header
> + *
> + * This function updates p_memsz member of each PT_NOTE entry in the
> + * program header table pointed to by @ehdr_ptr to real size of ELF
> + * note segment.
> + */
> +static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>  {
> -	int i, nr_ptnote=0, rc=0;
> -	char *tmp;
> -	Elf64_Ehdr *ehdr_ptr;
> -	Elf64_Phdr phdr, *phdr_ptr;
> +	int i, rc=0;
> +	Elf64_Phdr *phdr_ptr;
>  	Elf64_Nhdr *nhdr_ptr;
> -	u64 phdr_sz = 0, note_off;
>  
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> +	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> -		int j;
>  		void *notes_section;
> -		struct vmcore *new;
>  		u64 offset, max_sz, sz, real_sz = 0;
>  		if (phdr_ptr->p_type != PT_NOTE)
>  			continue;
> -		nr_ptnote++;
>  		max_sz = phdr_ptr->p_memsz;
>  		offset = phdr_ptr->p_offset;
>  		notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -253,7 +276,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  			return rc;
>  		}
>  		nhdr_ptr = notes_section;
> -		for (j = 0; j < max_sz; j += sz) {
> +		while (real_sz < max_sz) {
>  			if (nhdr_ptr->n_namesz == 0)
>  				break;
>  			sz = sizeof(Elf64_Nhdr) +
> @@ -262,26 +285,122 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  			real_sz += sz;
>  			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
>  		}
> -
> -		/* Add this contiguous chunk of notes section to vmcore list.*/
> -		new = get_new_element();
> -		if (!new) {
> -			kfree(notes_section);
> -			return -ENOMEM;
> -		}
> -		new->paddr = phdr_ptr->p_offset;
> -		new->size = real_sz;
> -		list_add_tail(&new->list, vc_list);
> -		phdr_sz += real_sz;
>  		kfree(notes_section);
> +		phdr_ptr->p_memsz = real_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * get_note_number_and_size_elf64 - get the number of PT_NOTE program
> + * headers and sum of real size of their ELF note segment headers and
> + * data.
> + *
> + * @ehdr_ptr: ELF header
> + * @nr_ptnote: buffer for the number of PT_NOTE program headers
> + * @sz_ptnote: buffer for size of unique PT_NOTE program header
> + *
> + * This function is used to merge multiple PT_NOTE program headers
> + * into a unique single one. The resulting unique entry will have
> + * @sz_ptnote in its phdr->p_mem.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf64
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init get_note_number_and_size_elf64(const Elf64_Ehdr *ehdr_ptr,
> +						 int *nr_ptnote, u64 *sz_ptnote)
> +{
> +	int i;
> +	Elf64_Phdr *phdr_ptr;
> +
> +	*nr_ptnote = *sz_ptnote = 0;
> +
> +	phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		*nr_ptnote += 1;
> +		*sz_ptnote += phdr_ptr->p_memsz;
>  	}
>  
> +	return 0;
> +}
> +
> +/**
> + * copy_notes_elf64 - copy ELF note segments in a given buffer
> + *
> + * @ehdr_ptr: ELF header
> + * @notes_buf: buffer into which ELF note segments are copied
> + *
> + * This function is used to copy ELF note segment in the 1st kernel
> + * into the buffer @notes_buf in the 2nd kernel. It is assumed that
> + * size of the buffer @notes_buf is equal to or larger than sum of the
> + * real ELF note segment headers and data.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf64
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init copy_notes_elf64(const Elf64_Ehdr *ehdr_ptr, char *notes_buf)
> +{
> +	int i, rc=0;
> +	Elf64_Phdr *phdr_ptr;
> +
> +	phdr_ptr = (Elf64_Phdr*)(ehdr_ptr + 1);
> +
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		u64 offset;
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		offset = phdr_ptr->p_offset;
> +		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> +		if (rc < 0)
> +			return rc;
> +		notes_buf += phdr_ptr->p_memsz;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf64_Ehdr *ehdr_ptr;
> +	Elf64_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf64(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);
> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -304,27 +423,27 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	return 0;
>  }
>  
> -/* Merges all the PT_NOTE headers into one. */
> -static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> -						struct list_head *vc_list)
> +/**
> + * update_note_header_size_elf32 - update p_memsz member of each PT_NOTE entry
> + *
> + * @ehdr_ptr: ELF header
> + *
> + * This function updates p_memsz member of each PT_NOTE entry in the
> + * program header table pointed to by @ehdr_ptr to real size of ELF
> + * note segment.
> + */
> +static int __init update_note_header_size_elf32(const Elf32_Ehdr *ehdr_ptr)
>  {
> -	int i, nr_ptnote=0, rc=0;
> -	char *tmp;
> -	Elf32_Ehdr *ehdr_ptr;
> -	Elf32_Phdr phdr, *phdr_ptr;
> +	int i, rc=0;
> +	Elf32_Phdr *phdr_ptr;
>  	Elf32_Nhdr *nhdr_ptr;
> -	u64 phdr_sz = 0, note_off;
>  
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> +	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> -		int j;
>  		void *notes_section;
> -		struct vmcore *new;
>  		u64 offset, max_sz, sz, real_sz = 0;
>  		if (phdr_ptr->p_type != PT_NOTE)
>  			continue;
> -		nr_ptnote++;
>  		max_sz = phdr_ptr->p_memsz;
>  		offset = phdr_ptr->p_offset;
>  		notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -336,7 +455,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			return rc;
>  		}
>  		nhdr_ptr = notes_section;
> -		for (j = 0; j < max_sz; j += sz) {
> +		while (real_sz < max_sz) {
>  			if (nhdr_ptr->n_namesz == 0)
>  				break;
>  			sz = sizeof(Elf32_Nhdr) +
> @@ -345,26 +464,122 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			real_sz += sz;
>  			nhdr_ptr = (Elf32_Nhdr*)((char*)nhdr_ptr + sz);
>  		}
> -
> -		/* Add this contiguous chunk of notes section to vmcore list.*/
> -		new = get_new_element();
> -		if (!new) {
> -			kfree(notes_section);
> -			return -ENOMEM;
> -		}
> -		new->paddr = phdr_ptr->p_offset;
> -		new->size = real_sz;
> -		list_add_tail(&new->list, vc_list);
> -		phdr_sz += real_sz;
>  		kfree(notes_section);
> +		phdr_ptr->p_memsz = real_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * get_note_number_and_size_elf32 - get the number of PT_NOTE program
> + * headers and sum of real size of their ELF note segment headers and
> + * data.
> + *
> + * @ehdr_ptr: ELF header
> + * @nr_ptnote: buffer for the number of PT_NOTE program headers
> + * @sz_ptnote: buffer for size of unique PT_NOTE program header
> + *
> + * This function is used to merge multiple PT_NOTE program headers
> + * into a unique single one. The resulting unique entry will have
> + * @sz_ptnote in its phdr->p_mem.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf32
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init get_note_number_and_size_elf32(const Elf32_Ehdr *ehdr_ptr,
> +						 int *nr_ptnote, u64 *sz_ptnote)
> +{
> +	int i;
> +	Elf32_Phdr *phdr_ptr;
> +
> +	*nr_ptnote = *sz_ptnote = 0;
> +
> +	phdr_ptr = (Elf32_Phdr *)(ehdr_ptr + 1);
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		*nr_ptnote += 1;
> +		*sz_ptnote += phdr_ptr->p_memsz;
>  	}
>  
> +	return 0;
> +}
> +
> +/**
> + * copy_notes_elf32 - copy ELF note segments in a given buffer
> + *
> + * @ehdr_ptr: ELF header
> + * @notes_buf: buffer into which ELF note segments are copied
> + *
> + * This function is used to copy ELF note segment in the 1st kernel
> + * into the buffer @notes_buf in the 2nd kernel. It is assumed that
> + * size of the buffer @notes_buf is equal to or larger than sum of the
> + * real ELF note segment headers and data.
> + *
> + * It is assumed that program headers with PT_NOTE type pointed to by
> + * @ehdr_ptr has already been updated by update_note_header_size_elf32
> + * and each of PT_NOTE program headers has actual ELF note segment
> + * size in its p_memsz member.
> + */
> +static int __init copy_notes_elf32(const Elf32_Ehdr *ehdr_ptr, char *notes_buf)
> +{
> +	int i, rc=0;
> +	Elf32_Phdr *phdr_ptr;
> +
> +	phdr_ptr = (Elf32_Phdr*)(ehdr_ptr + 1);
> +
> +	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> +		u64 offset;
> +		if (phdr_ptr->p_type != PT_NOTE)
> +			continue;
> +		offset = phdr_ptr->p_offset;
> +		rc = read_from_oldmem(notes_buf, phdr_ptr->p_memsz, &offset, 0);
> +		if (rc < 0)
> +			return rc;
> +		notes_buf += phdr_ptr->p_memsz;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf32_Ehdr *ehdr_ptr;
> +	Elf32_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf32(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf32(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);
> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf32_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf32_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -391,6 +606,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>   * the new offset fields of exported program headers. */
>  static int __init process_ptload_program_headers_elf64(char *elfptr,
>  						size_t elfsz,
> +						size_t elfnotes_sz,
>  						struct list_head *vc_list)
>  {
>  	int i;
> @@ -402,9 +618,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
>  
> -	/* First program header is PT_NOTE header. */
> -	vmcore_off = elfsz +
> -			phdr_ptr->p_memsz; /* Note sections */
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>  		u64 paddr, start, end, size;
> @@ -434,6 +649,7 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
>  
>  static int __init process_ptload_program_headers_elf32(char *elfptr,
>  						size_t elfsz,
> +						size_t elfnotes_sz,
>  						struct list_head *vc_list)
>  {
>  	int i;
> @@ -445,9 +661,8 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr)); /* PT_NOTE hdr */
>  
> -	/* First program header is PT_NOTE header. */
> -	vmcore_off = elfsz +
> -			phdr_ptr->p_memsz; /* Note sections */
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>  		u64 paddr, start, end, size;
> @@ -476,14 +691,14 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  }
>  
>  /* Sets offset fields of vmcore elements. */
> -static void __init set_vmcore_list_offsets(size_t elfsz,
> +static void __init set_vmcore_list_offsets(size_t elfsz, size_t elfnotes_sz,
>  					   struct list_head *vc_list)
>  {
>  	loff_t vmcore_off;
>  	struct vmcore *m;
>  
> -	/* Skip Elf header and program headers. */
> -	vmcore_off = elfsz;
> +	/* Skip Elf header, program headers and Elf note segment. */
> +	vmcore_off = elfsz + elfnotes_sz;
>  
>  	list_for_each_entry(m, vc_list, list) {
>  		m->offset = vmcore_off;
> @@ -534,20 +749,22 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
> -	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> +	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz,
> +				      &elfnotes_buf, &elfnotes_sz);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
> -							&vmcore_list);
> +						  elfnotes_sz,
> +						  &vmcore_list);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -594,20 +811,22 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
> -	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
> +	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz,
> +				      &elfnotes_buf, &elfnotes_sz);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
> -								&vmcore_list);
> +						  elfnotes_sz,
> +						  &vmcore_list);
>  	if (rc) {
>  		free_pages((unsigned long)elfcorebuf,
>  			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, elfnotes_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -686,6 +905,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> +	vfree(elfnotes_buf);
> +	elfnotes_buf = NULL;
>  	free_pages((unsigned long)elfcorebuf,
>  		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 14:32     ` Vivek Goyal
  -1 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:32 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:36PM +0900, HATAYAMA Daisuke wrote:
> Now ELF note segment has been copied in the buffer on vmalloc
> memory. To allow user process to remap the ELF note segment buffer
> with remap_vmalloc_page, the corresponding VM area object has to have
> VM_USERMAP flag set.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 937709d..9de4d91 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -375,6 +375,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	Elf64_Ehdr *ehdr_ptr;
>  	Elf64_Phdr phdr;
>  	u64 phdr_sz = 0, note_off;
> +	struct vm_struct *vm;
>  
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  
> @@ -391,6 +392,12 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	if (!*notes_buf)
>  		return -ENOMEM;
>  
> +	/* Allow users to remap ELF note segment buffer on vmalloc
> +	 * memory using remap_vmalloc_range. */
> +	vm = find_vm_area(*notes_buf);
> +	BUG_ON(!vm);
> +	vm->flags |= VM_USERMAP;
> +
>  	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
>  	if (rc < 0)
>  		return rc;
> @@ -554,6 +561,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	Elf32_Ehdr *ehdr_ptr;
>  	Elf32_Phdr phdr;
>  	u64 phdr_sz = 0, note_off;
> +	struct vm_struct *vm;
>  
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  
> @@ -570,6 +578,12 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	if (!*notes_buf)
>  		return -ENOMEM;
>  
> +	/* Allow users to remap ELF note segment buffer on vmalloc
> +	 * memory using remap_vmalloc_range. */
> +	vm = find_vm_area(*notes_buf);
> +	BUG_ON(!vm);
> +	vm->flags |= VM_USERMAP;
> +
>  	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
>  	if (rc < 0)
>  		return rc;

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer
@ 2013-05-23 14:32     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:32 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:36PM +0900, HATAYAMA Daisuke wrote:
> Now ELF note segment has been copied in the buffer on vmalloc
> memory. To allow user process to remap the ELF note segment buffer
> with remap_vmalloc_page, the corresponding VM area object has to have
> VM_USERMAP flag set.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 937709d..9de4d91 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -375,6 +375,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	Elf64_Ehdr *ehdr_ptr;
>  	Elf64_Phdr phdr;
>  	u64 phdr_sz = 0, note_off;
> +	struct vm_struct *vm;
>  
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  
> @@ -391,6 +392,12 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	if (!*notes_buf)
>  		return -ENOMEM;
>  
> +	/* Allow users to remap ELF note segment buffer on vmalloc
> +	 * memory using remap_vmalloc_range. */
> +	vm = find_vm_area(*notes_buf);
> +	BUG_ON(!vm);
> +	vm->flags |= VM_USERMAP;
> +
>  	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
>  	if (rc < 0)
>  		return rc;
> @@ -554,6 +561,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	Elf32_Ehdr *ehdr_ptr;
>  	Elf32_Phdr phdr;
>  	u64 phdr_sz = 0, note_off;
> +	struct vm_struct *vm;
>  
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  
> @@ -570,6 +578,12 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	if (!*notes_buf)
>  		return -ENOMEM;
>  
> +	/* Allow users to remap ELF note segment buffer on vmalloc
> +	 * memory using remap_vmalloc_range. */
> +	vm = find_vm_area(*notes_buf);
> +	BUG_ON(!vm);
> +	vm->flags |= VM_USERMAP;
> +
>  	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
>  	if (rc < 0)
>  		return rc;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer
@ 2013-05-23 14:32     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:32 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	kumagai-atsushi, ebiederm, kosaki.motohiro, zhangyanfei, akpm,
	walken, cpw, jingbai.ma

On Thu, May 23, 2013 at 02:25:36PM +0900, HATAYAMA Daisuke wrote:
> Now ELF note segment has been copied in the buffer on vmalloc
> memory. To allow user process to remap the ELF note segment buffer
> with remap_vmalloc_page, the corresponding VM area object has to have
> VM_USERMAP flag set.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 937709d..9de4d91 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -375,6 +375,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	Elf64_Ehdr *ehdr_ptr;
>  	Elf64_Phdr phdr;
>  	u64 phdr_sz = 0, note_off;
> +	struct vm_struct *vm;
>  
>  	ehdr_ptr = (Elf64_Ehdr *)elfptr;
>  
> @@ -391,6 +392,12 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	if (!*notes_buf)
>  		return -ENOMEM;
>  
> +	/* Allow users to remap ELF note segment buffer on vmalloc
> +	 * memory using remap_vmalloc_range. */
> +	vm = find_vm_area(*notes_buf);
> +	BUG_ON(!vm);
> +	vm->flags |= VM_USERMAP;
> +
>  	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
>  	if (rc < 0)
>  		return rc;
> @@ -554,6 +561,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	Elf32_Ehdr *ehdr_ptr;
>  	Elf32_Phdr phdr;
>  	u64 phdr_sz = 0, note_off;
> +	struct vm_struct *vm;
>  
>  	ehdr_ptr = (Elf32_Ehdr *)elfptr;
>  
> @@ -570,6 +578,12 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  	if (!*notes_buf)
>  		return -ENOMEM;
>  
> +	/* Allow users to remap ELF note segment buffer on vmalloc
> +	 * memory using remap_vmalloc_range. */
> +	vm = find_vm_area(*notes_buf);
> +	BUG_ON(!vm);
> +	vm->flags |= VM_USERMAP;
> +
>  	rc = copy_notes_elf32(ehdr_ptr, *notes_buf);
>  	if (rc < 0)
>  		return rc;

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 14:34     ` Vivek Goyal
  -1 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:42PM +0900, HATAYAMA Daisuke wrote:
> The previous patches newly added holes before each chunk of memory and
> the holes need to be count in vmcore file size. There are two ways to
> count file size in such a way:
> 
> 1) supporse m as a poitner to the last vmcore object in vmcore_list.
> , then file size is (m->offset + m->size), or
> 
> 2) calculate sum of size of buffers for ELF header, program headers,
> ELF note segments and objects in vmcore_list.
> 
> Although 1) is more direct and simpler than 2), 2) seems better in
> that it reflects internal object structure of /proc/vmcore. Thus, this
> patch changes get_vmcore_size_elf{64, 32} so that it calculates size
> in the way of 2).
> 
> As a result, both get_vmcore_size_elf{64, 32} have the same
> definition. Merge them as get_vmcore_size.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   44 +++++++++++---------------------------------
>  1 files changed, 11 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 9de4d91..f71157d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -210,36 +210,15 @@ static struct vmcore* __init get_new_element(void)
>  	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
>  }
>  
> -static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
> +static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
> +				  struct list_head *vc_list)
>  {
> -	int i;
>  	u64 size;
> -	Elf64_Ehdr *ehdr_ptr;
> -	Elf64_Phdr *phdr_ptr;
> -
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> -	size = elfsz;
> -	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> -		size += phdr_ptr->p_memsz;
> -		phdr_ptr++;
> -	}
> -	return size;
> -}
> -
> -static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
> -{
> -	int i;
> -	u64 size;
> -	Elf32_Ehdr *ehdr_ptr;
> -	Elf32_Phdr *phdr_ptr;
> +	struct vmcore *m;
>  
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> -	size = elfsz;
> -	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> -		size += phdr_ptr->p_memsz;
> -		phdr_ptr++;
> +	size = elfsz + elfnotesegsz;
> +	list_for_each_entry(m, vc_list, list) {
> +		size += m->size;
>  	}
>  	return size;
>  }
> @@ -863,20 +842,19 @@ static int __init parse_crash_elf_headers(void)
>  		rc = parse_crash_elf64_headers();
>  		if (rc)
>  			return rc;
> -
> -		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
> -
> -		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
>  	}
> +
> +	/* Determine vmcore size. */
> +	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
> +				      &vmcore_list);
> +
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
@ 2013-05-23 14:34     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:25:42PM +0900, HATAYAMA Daisuke wrote:
> The previous patches newly added holes before each chunk of memory and
> the holes need to be count in vmcore file size. There are two ways to
> count file size in such a way:
> 
> 1) supporse m as a poitner to the last vmcore object in vmcore_list.
> , then file size is (m->offset + m->size), or
> 
> 2) calculate sum of size of buffers for ELF header, program headers,
> ELF note segments and objects in vmcore_list.
> 
> Although 1) is more direct and simpler than 2), 2) seems better in
> that it reflects internal object structure of /proc/vmcore. Thus, this
> patch changes get_vmcore_size_elf{64, 32} so that it calculates size
> in the way of 2).
> 
> As a result, both get_vmcore_size_elf{64, 32} have the same
> definition. Merge them as get_vmcore_size.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   44 +++++++++++---------------------------------
>  1 files changed, 11 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 9de4d91..f71157d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -210,36 +210,15 @@ static struct vmcore* __init get_new_element(void)
>  	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
>  }
>  
> -static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
> +static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
> +				  struct list_head *vc_list)
>  {
> -	int i;
>  	u64 size;
> -	Elf64_Ehdr *ehdr_ptr;
> -	Elf64_Phdr *phdr_ptr;
> -
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> -	size = elfsz;
> -	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> -		size += phdr_ptr->p_memsz;
> -		phdr_ptr++;
> -	}
> -	return size;
> -}
> -
> -static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
> -{
> -	int i;
> -	u64 size;
> -	Elf32_Ehdr *ehdr_ptr;
> -	Elf32_Phdr *phdr_ptr;
> +	struct vmcore *m;
>  
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> -	size = elfsz;
> -	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> -		size += phdr_ptr->p_memsz;
> -		phdr_ptr++;
> +	size = elfsz + elfnotesegsz;
> +	list_for_each_entry(m, vc_list, list) {
> +		size += m->size;
>  	}
>  	return size;
>  }
> @@ -863,20 +842,19 @@ static int __init parse_crash_elf_headers(void)
>  		rc = parse_crash_elf64_headers();
>  		if (rc)
>  			return rc;
> -
> -		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
> -
> -		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
>  	}
> +
> +	/* Determine vmcore size. */
> +	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
> +				      &vmcore_list);
> +
>  	return 0;
>  }
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
@ 2013-05-23 14:34     ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	kumagai-atsushi, ebiederm, kosaki.motohiro, zhangyanfei, akpm,
	walken, cpw, jingbai.ma

On Thu, May 23, 2013 at 02:25:42PM +0900, HATAYAMA Daisuke wrote:
> The previous patches newly added holes before each chunk of memory and
> the holes need to be count in vmcore file size. There are two ways to
> count file size in such a way:
> 
> 1) supporse m as a poitner to the last vmcore object in vmcore_list.
> , then file size is (m->offset + m->size), or
> 
> 2) calculate sum of size of buffers for ELF header, program headers,
> ELF note segments and objects in vmcore_list.
> 
> Although 1) is more direct and simpler than 2), 2) seems better in
> that it reflects internal object structure of /proc/vmcore. Thus, this
> patch changes get_vmcore_size_elf{64, 32} so that it calculates size
> in the way of 2).
> 
> As a result, both get_vmcore_size_elf{64, 32} have the same
> definition. Merge them as get_vmcore_size.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
> 
>  fs/proc/vmcore.c |   44 +++++++++++---------------------------------
>  1 files changed, 11 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 9de4d91..f71157d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -210,36 +210,15 @@ static struct vmcore* __init get_new_element(void)
>  	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
>  }
>  
> -static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
> +static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
> +				  struct list_head *vc_list)
>  {
> -	int i;
>  	u64 size;
> -	Elf64_Ehdr *ehdr_ptr;
> -	Elf64_Phdr *phdr_ptr;
> -
> -	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> -	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
> -	size = elfsz;
> -	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> -		size += phdr_ptr->p_memsz;
> -		phdr_ptr++;
> -	}
> -	return size;
> -}
> -
> -static u64 __init get_vmcore_size_elf32(char *elfptr, size_t elfsz)
> -{
> -	int i;
> -	u64 size;
> -	Elf32_Ehdr *ehdr_ptr;
> -	Elf32_Phdr *phdr_ptr;
> +	struct vmcore *m;
>  
> -	ehdr_ptr = (Elf32_Ehdr *)elfptr;
> -	phdr_ptr = (Elf32_Phdr*)(elfptr + sizeof(Elf32_Ehdr));
> -	size = elfsz;
> -	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
> -		size += phdr_ptr->p_memsz;
> -		phdr_ptr++;
> +	size = elfsz + elfnotesegsz;
> +	list_for_each_entry(m, vc_list, list) {
> +		size += m->size;
>  	}
>  	return size;
>  }
> @@ -863,20 +842,19 @@ static int __init parse_crash_elf_headers(void)
>  		rc = parse_crash_elf64_headers();
>  		if (rc)
>  			return rc;
> -
> -		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
> -
> -		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
>  	}
> +
> +	/* Determine vmcore size. */
> +	vmcore_size = get_vmcore_size(elfcorebuf_sz, elfnotes_sz,
> +				      &vmcore_list);
> +
>  	return 0;
>  }
>  

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore
  2013-05-23  5:24 ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 14:35   ` Vivek Goyal
  -1 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:35 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:24:55PM +0900, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation due to repeated page
> table changes, TLB flush and build-up of VM related objects.
> 
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance under sufficiently large mapping size.
> 
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and I/O work.
> 

Thanks hatayam,

Thanks for the patches. This series looks good to me. I think we just
need an ack from mm folks on patch 5 which introduces
remap_vmalloc_range_partial().

Thanks
Vivek

> Benchmark
> =========
> 
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimental kernel-side memory filtering.
> 
> - makedumpfile mmap() benchmark, by Jingbai Ma
>   https://lkml.org/lkml/2013/3/27/19
> 
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
>   https://lkml.org/lkml/2013/3/26/914
> 
> ChangeLog
> =========
> 
> v7 => v8)
> 
> - Unify set_vmcore_list_offsets_elf{64,32} as set_vmcore_list_offsets.
>   [Patch 2/9]
> - Introduce update_note_header_size_elf{64,32} and cleanup
>   get_note_number_and_size_elf{64,32} and copy_notes_elf{64,32}.
>   [Patch 6/9]
> - Create new patch that sets VM_USERMAP flag in VM object for ELF note
>   segment buffer.
>   [Patch 7/9]
> - Unify get_vmcore_size_elf{64,32} as get_vmcore_size.
>   [Patch 8/9]
> 
> v6 => v7)
> 
> - Rebase 3.10-rc2.
> - Move roundup operation to note segment from patch 2/8 to patch 6/8.
> - Rewrite get_note_number_and_size_elf{64,32} and
>   copy_notes_elf{64,32} in patch 6/8.
> 
> v5 => v6)
> 
> - Change patch order: clenaup patch => PT_LOAD change patch =>
>   vmalloc-related patch => mmap patch.
> - Some cleanups: improve symbol names simply, add helper functoins for
>   processing ELF note segment and add comments for the helper
>   functions.
> - Fix patch description of patch 7/8.
> 
> v4 => v5)
> 
> - Rebase 3.10-rc1.
> - Introduce remap_vmalloc_range_partial() in order to remap vmalloc
>   memory in a part of vma area.
> - Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
>   remap_vmalloc_range_partial() to remap the memory to userspace.
> 
> v3 => v4)
> 
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 2nd kernel just as in v1. Allocate
>   vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
>   end of the region doesn't fit inside page boundary, no longer copy
>   them as the previous v3. Then, holes, outside OS memory, are visible
>   from /proc/vmcore. => See [PATCH 7/8]
> 
> v2 => v3)
> 
> - Rebase 3.9-rc3.
> - Copy program headers separately from e_phoff in ELF note segment
>   buffer. Now there's no risk to allocate huge memory if program
>   header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
>   runtime. Instead, use the variable that has original buffer size.
> 
> v1 => v2)
> 
> - Clean up the existing codes: use e_phoff, and remove the assumption
>   on PT_NOTE entries.
> - Fix potential bug that ELF header size is not included in exported
>   vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
>   code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
>   instead of copying them into the buffer on the 2nd kernel.
> 
> Test
> ====
> 
> This patch set is composed based on v3.10-rc2, tested on x86_64,
> x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.
> 
> ---
> 
> HATAYAMA Daisuke (9):
>       vmcore: support mmap() on /proc/vmcore
>       vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
>       vmcore: Allow user process to remap ELF note segment buffer
>       vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
>       vmalloc: introduce remap_vmalloc_range_partial
>       vmalloc: make find_vm_area check in range
>       vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
>       vmcore: allocate buffer for ELF headers on page-size alignment
>       vmcore: clean up read_vmcore()
> 
> 
>  fs/proc/vmcore.c        |  657 +++++++++++++++++++++++++++++++++--------------
>  include/linux/vmalloc.h |    4 
>  mm/vmalloc.c            |   65 +++--
>  3 files changed, 515 insertions(+), 211 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-05-23 14:35   ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:35 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:24:55PM +0900, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation due to repeated page
> table changes, TLB flush and build-up of VM related objects.
> 
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance under sufficiently large mapping size.
> 
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and I/O work.
> 

Thanks hatayam,

Thanks for the patches. This series looks good to me. I think we just
need an ack from mm folks on patch 5 which introduces
remap_vmalloc_range_partial().

Thanks
Vivek

> Benchmark
> =========
> 
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimental kernel-side memory filtering.
> 
> - makedumpfile mmap() benchmark, by Jingbai Ma
>   https://lkml.org/lkml/2013/3/27/19
> 
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
>   https://lkml.org/lkml/2013/3/26/914
> 
> ChangeLog
> =========
> 
> v7 => v8)
> 
> - Unify set_vmcore_list_offsets_elf{64,32} as set_vmcore_list_offsets.
>   [Patch 2/9]
> - Introduce update_note_header_size_elf{64,32} and cleanup
>   get_note_number_and_size_elf{64,32} and copy_notes_elf{64,32}.
>   [Patch 6/9]
> - Create new patch that sets VM_USERMAP flag in VM object for ELF note
>   segment buffer.
>   [Patch 7/9]
> - Unify get_vmcore_size_elf{64,32} as get_vmcore_size.
>   [Patch 8/9]
> 
> v6 => v7)
> 
> - Rebase 3.10-rc2.
> - Move roundup operation to note segment from patch 2/8 to patch 6/8.
> - Rewrite get_note_number_and_size_elf{64,32} and
>   copy_notes_elf{64,32} in patch 6/8.
> 
> v5 => v6)
> 
> - Change patch order: clenaup patch => PT_LOAD change patch =>
>   vmalloc-related patch => mmap patch.
> - Some cleanups: improve symbol names simply, add helper functoins for
>   processing ELF note segment and add comments for the helper
>   functions.
> - Fix patch description of patch 7/8.
> 
> v4 => v5)
> 
> - Rebase 3.10-rc1.
> - Introduce remap_vmalloc_range_partial() in order to remap vmalloc
>   memory in a part of vma area.
> - Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
>   remap_vmalloc_range_partial() to remap the memory to userspace.
> 
> v3 => v4)
> 
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 2nd kernel just as in v1. Allocate
>   vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
>   end of the region doesn't fit inside page boundary, no longer copy
>   them as the previous v3. Then, holes, outside OS memory, are visible
>   from /proc/vmcore. => See [PATCH 7/8]
> 
> v2 => v3)
> 
> - Rebase 3.9-rc3.
> - Copy program headers separately from e_phoff in ELF note segment
>   buffer. Now there's no risk to allocate huge memory if program
>   header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
>   runtime. Instead, use the variable that has original buffer size.
> 
> v1 => v2)
> 
> - Clean up the existing codes: use e_phoff, and remove the assumption
>   on PT_NOTE entries.
> - Fix potential bug that ELF header size is not included in exported
>   vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
>   code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
>   instead of copying them into the buffer on the 2nd kernel.
> 
> Test
> ====
> 
> This patch set is composed based on v3.10-rc2, tested on x86_64,
> x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.
> 
> ---
> 
> HATAYAMA Daisuke (9):
>       vmcore: support mmap() on /proc/vmcore
>       vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
>       vmcore: Allow user process to remap ELF note segment buffer
>       vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
>       vmalloc: introduce remap_vmalloc_range_partial
>       vmalloc: make find_vm_area check in range
>       vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
>       vmcore: allocate buffer for ELF headers on page-size alignment
>       vmcore: clean up read_vmcore()
> 
> 
>  fs/proc/vmcore.c        |  657 +++++++++++++++++++++++++++++++++--------------
>  include/linux/vmalloc.h |    4 
>  mm/vmalloc.c            |   65 +++--
>  3 files changed, 515 insertions(+), 211 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-05-23 14:35   ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-23 14:35 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	kumagai-atsushi, ebiederm, kosaki.motohiro, zhangyanfei, akpm,
	walken, cpw, jingbai.ma

On Thu, May 23, 2013 at 02:24:55PM +0900, HATAYAMA Daisuke wrote:
> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation due to repeated page
> table changes, TLB flush and build-up of VM related objects.
> 
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance under sufficiently large mapping size.
> 
> In particular, the current main user of this mmap() is makedumpfile,
> which not only reads memory from /proc/vmcore but also does other
> processing like filtering, compression and I/O work.
> 

Thanks hatayam,

Thanks for the patches. This series looks good to me. I think we just
need an ack from mm folks on patch 5 which introduces
remap_vmalloc_range_partial().

Thanks
Vivek

> Benchmark
> =========
> 
> You can see two benchmarks on terabyte memory system. Both show about
> 40 seconds on 2TB system. This is almost equal to performance by
> experimental kernel-side memory filtering.
> 
> - makedumpfile mmap() benchmark, by Jingbai Ma
>   https://lkml.org/lkml/2013/3/27/19
> 
> - makedumpfile: benchmark on mmap() with /proc/vmcore on 2TB memory system
>   https://lkml.org/lkml/2013/3/26/914
> 
> ChangeLog
> =========
> 
> v7 => v8)
> 
> - Unify set_vmcore_list_offsets_elf{64,32} as set_vmcore_list_offsets.
>   [Patch 2/9]
> - Introduce update_note_header_size_elf{64,32} and cleanup
>   get_note_number_and_size_elf{64,32} and copy_notes_elf{64,32}.
>   [Patch 6/9]
> - Create new patch that sets VM_USERMAP flag in VM object for ELF note
>   segment buffer.
>   [Patch 7/9]
> - Unify get_vmcore_size_elf{64,32} as get_vmcore_size.
>   [Patch 8/9]
> 
> v6 => v7)
> 
> - Rebase 3.10-rc2.
> - Move roundup operation to note segment from patch 2/8 to patch 6/8.
> - Rewrite get_note_number_and_size_elf{64,32} and
>   copy_notes_elf{64,32} in patch 6/8.
> 
> v5 => v6)
> 
> - Change patch order: clenaup patch => PT_LOAD change patch =>
>   vmalloc-related patch => mmap patch.
> - Some cleanups: improve symbol names simply, add helper functoins for
>   processing ELF note segment and add comments for the helper
>   functions.
> - Fix patch description of patch 7/8.
> 
> v4 => v5)
> 
> - Rebase 3.10-rc1.
> - Introduce remap_vmalloc_range_partial() in order to remap vmalloc
>   memory in a part of vma area.
> - Allocate buffer for ELF note segment at 2nd kernel by vmalloc(). Use
>   remap_vmalloc_range_partial() to remap the memory to userspace.
> 
> v3 => v4)
> 
> - Rebase 3.9-rc7.
> - Drop clean-up patches orthogonal to the main topic of this patch set.
> - Copy ELF note segments in the 2nd kernel just as in v1. Allocate
>   vmcore objects per pages. => See [PATCH 5/8]
> - Map memory referenced by PT_LOAD entry directly even if the start or
>   end of the region doesn't fit inside page boundary, no longer copy
>   them as the previous v3. Then, holes, outside OS memory, are visible
>   from /proc/vmcore. => See [PATCH 7/8]
> 
> v2 => v3)
> 
> - Rebase 3.9-rc3.
> - Copy program headers separately from e_phoff in ELF note segment
>   buffer. Now there's no risk to allocate huge memory if program
>   header table positions after memory segment.
> - Add cleanup patch that removes unnecessary variable.
> - Fix wrongly using the variable that is buffer size configurable at
>   runtime. Instead, use the variable that has original buffer size.
> 
> v1 => v2)
> 
> - Clean up the existing codes: use e_phoff, and remove the assumption
>   on PT_NOTE entries.
> - Fix potential bug that ELF header size is not included in exported
>   vmcoreinfo size.
> - Divide patch modifying read_vmcore() into two: clean-up and primary
>   code change.
> - Put ELF note segments in page-size boundary on the 1st kernel
>   instead of copying them into the buffer on the 2nd kernel.
> 
> Test
> ====
> 
> This patch set is composed based on v3.10-rc2, tested on x86_64,
> x86_32 both with 1GB and with 5GB (over 4GB) memory configurations.
> 
> ---
> 
> HATAYAMA Daisuke (9):
>       vmcore: support mmap() on /proc/vmcore
>       vmcore: calculate vmcore file size from buffer size and total size of vmcore objects
>       vmcore: Allow user process to remap ELF note segment buffer
>       vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
>       vmalloc: introduce remap_vmalloc_range_partial
>       vmalloc: make find_vm_area check in range
>       vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
>       vmcore: allocate buffer for ELF headers on page-size alignment
>       vmcore: clean up read_vmcore()
> 
> 
>  fs/proc/vmcore.c        |  657 +++++++++++++++++++++++++++++++++--------------
>  include/linux/vmalloc.h |    4 
>  mm/vmalloc.c            |   65 +++--
>  3 files changed, 515 insertions(+), 211 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 21:46     ` Andrew Morton
  -1 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 21:46 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:07 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> ...
>
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>  							&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>  								&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
> @@ -683,7 +672,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> -	kfree(elfcorebuf);
> +	free_pages((unsigned long)elfcorebuf,
> +		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;
>  }

- the amount of code duplication is excessive

- the code sometimes leaves elfcorebuf==NULL and sometimes doesn't.

Please review and test this cleanup:

--- a/fs/proc/vmcore.c~vmcore-allocate-buffer-for-elf-headers-on-page-size-alignment-fix
+++ a/fs/proc/vmcore.c
@@ -477,6 +477,12 @@ static void __init set_vmcore_list_offse
 	}
 }
 
+static void free_elfcorebuf(void)
+{
+	free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
+	elfcorebuf = NULL;
+}
+
 static int __init parse_crash_elf64_headers(void)
 {
 	int rc=0;
@@ -505,36 +511,31 @@ static int __init parse_crash_elf64_head
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) +
+				ehdr.e_phnum * sizeof(Elf64_Phdr);
 	elfcorebuf_sz = elfcorebuf_sz_orig;
-	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					       get_order(elfcorebuf_sz_orig));
+	elfcorebuf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
-	if (rc < 0) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc < 0)
+		goto fail;
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
+fail:
+	free_elfcorebuf();
+	return rc;
 }
 
 static int __init parse_crash_elf32_headers(void)
@@ -567,34 +568,28 @@ static int __init parse_crash_elf32_head
 	/* Read in all elf headers. */
 	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
 	elfcorebuf_sz = elfcorebuf_sz_orig;
-	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					       get_order(elfcorebuf_sz_orig));
+	elfcorebuf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
-	if (rc < 0) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc < 0)
+		goto fail;
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
+fail:
+	free_elfcorebuf();
+	return rc;
 }
 
 static int __init parse_crash_elf_headers(void)
@@ -672,8 +667,6 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	free_pages((unsigned long)elfcorebuf,
-		   get_order(elfcorebuf_sz_orig));
-	elfcorebuf = NULL;
+	free_elfcorebuf();
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);
_


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-05-23 21:46     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 21:46 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:07 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> ...
>
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>  							&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>  								&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
> @@ -683,7 +672,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> -	kfree(elfcorebuf);
> +	free_pages((unsigned long)elfcorebuf,
> +		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;
>  }

- the amount of code duplication is excessive

- the code sometimes leaves elfcorebuf==NULL and sometimes doesn't.

Please review and test this cleanup:

--- a/fs/proc/vmcore.c~vmcore-allocate-buffer-for-elf-headers-on-page-size-alignment-fix
+++ a/fs/proc/vmcore.c
@@ -477,6 +477,12 @@ static void __init set_vmcore_list_offse
 	}
 }
 
+static void free_elfcorebuf(void)
+{
+	free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
+	elfcorebuf = NULL;
+}
+
 static int __init parse_crash_elf64_headers(void)
 {
 	int rc=0;
@@ -505,36 +511,31 @@ static int __init parse_crash_elf64_head
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) +
+				ehdr.e_phnum * sizeof(Elf64_Phdr);
 	elfcorebuf_sz = elfcorebuf_sz_orig;
-	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					       get_order(elfcorebuf_sz_orig));
+	elfcorebuf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
-	if (rc < 0) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc < 0)
+		goto fail;
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
+fail:
+	free_elfcorebuf();
+	return rc;
 }
 
 static int __init parse_crash_elf32_headers(void)
@@ -567,34 +568,28 @@ static int __init parse_crash_elf32_head
 	/* Read in all elf headers. */
 	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
 	elfcorebuf_sz = elfcorebuf_sz_orig;
-	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					       get_order(elfcorebuf_sz_orig));
+	elfcorebuf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
-	if (rc < 0) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc < 0)
+		goto fail;
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
+fail:
+	free_elfcorebuf();
+	return rc;
 }
 
 static int __init parse_crash_elf_headers(void)
@@ -672,8 +667,6 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	free_pages((unsigned long)elfcorebuf,
-		   get_order(elfcorebuf_sz_orig));
-	elfcorebuf = NULL;
+	free_elfcorebuf();
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-05-23 21:46     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 21:46 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, vgoyal

On Thu, 23 May 2013 14:25:07 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Allocate ELF headers on page-size boundary using __get_free_pages()
> instead of kmalloc().
> 
> Later patch will merge PT_NOTE entries into a single unique one and
> decrease the buffer size actually used. Keep original buffer size in
> variable elfcorebuf_sz_orig to kfree the buffer later and actually
> used buffer size with rounded up to page-size boundary in variable
> elfcorebuf_sz separately.
> 
> The size of part of the ELF buffer exported from /proc/vmcore is
> elfcorebuf_sz.
> 
> The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
> elfcorebuf_sz_orig], is filled with 0.
> 
> Use size of the ELF headers as an initial offset value in
> set_vmcore_list_offsets_elf{64,32} and
> process_ptload_program_headers_elf{64,32} in order to indicate that
> the offset includes the holes towards the page boundary.
> 
> As a result, both set_vmcore_list_offsets_elf{64,32} have the same
> definition. Merge them as set_vmcore_list_offsets.
> 
> ...
>
> @@ -526,30 +505,35 @@ static int __init parse_crash_elf64_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
>  							&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -581,30 +565,35 @@ static int __init parse_crash_elf32_headers(void)
>  	}
>  
>  	/* Read in all elf headers. */
> -	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
> +	elfcorebuf_sz = elfcorebuf_sz_orig;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz_orig));
>  	if (!elfcorebuf)
>  		return -ENOMEM;
>  	addr = elfcorehdr_addr;
> -	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
> +	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
>  	if (rc < 0) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  
>  	/* Merge all PT_NOTE headers into one. */
>  	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
>  	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
>  								&vmcore_list);
>  	if (rc) {
> -		kfree(elfcorebuf);
> +		free_pages((unsigned long)elfcorebuf,
> +			   get_order(elfcorebuf_sz_orig));
>  		return rc;
>  	}
> -	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
> +	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
>  	return 0;
>  }
>  
> @@ -629,14 +618,14 @@ static int __init parse_crash_elf_headers(void)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
>  	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
>  		rc = parse_crash_elf32_headers();
>  		if (rc)
>  			return rc;
>  
>  		/* Determine vmcore size. */
> -		vmcore_size = get_vmcore_size_elf32(elfcorebuf);
> +		vmcore_size = get_vmcore_size_elf32(elfcorebuf, elfcorebuf_sz);
>  	} else {
>  		pr_warn("Warning: Core image elf header is not sane\n");
>  		return -EINVAL;
> @@ -683,7 +672,8 @@ void vmcore_cleanup(void)
>  		list_del(&m->list);
>  		kfree(m);
>  	}
> -	kfree(elfcorebuf);
> +	free_pages((unsigned long)elfcorebuf,
> +		   get_order(elfcorebuf_sz_orig));
>  	elfcorebuf = NULL;
>  }

- the amount of code duplication is excessive

- the code sometimes leaves elfcorebuf==NULL and sometimes doesn't.

Please review and test this cleanup:

--- a/fs/proc/vmcore.c~vmcore-allocate-buffer-for-elf-headers-on-page-size-alignment-fix
+++ a/fs/proc/vmcore.c
@@ -477,6 +477,12 @@ static void __init set_vmcore_list_offse
 	}
 }
 
+static void free_elfcorebuf(void)
+{
+	free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
+	elfcorebuf = NULL;
+}
+
 static int __init parse_crash_elf64_headers(void)
 {
 	int rc=0;
@@ -505,36 +511,31 @@ static int __init parse_crash_elf64_head
 	}
 
 	/* Read in all elf headers. */
-	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
+	elfcorebuf_sz_orig = sizeof(Elf64_Ehdr) +
+				ehdr.e_phnum * sizeof(Elf64_Phdr);
 	elfcorebuf_sz = elfcorebuf_sz_orig;
-	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					       get_order(elfcorebuf_sz_orig));
+	elfcorebuf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
-	if (rc < 0) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc < 0)
+		goto fail;
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
+fail:
+	free_elfcorebuf();
+	return rc;
 }
 
 static int __init parse_crash_elf32_headers(void)
@@ -567,34 +568,28 @@ static int __init parse_crash_elf32_head
 	/* Read in all elf headers. */
 	elfcorebuf_sz_orig = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
 	elfcorebuf_sz = elfcorebuf_sz_orig;
-	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					       get_order(elfcorebuf_sz_orig));
+	elfcorebuf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					      get_order(elfcorebuf_sz_orig));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz_orig, &addr, 0);
-	if (rc < 0) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc < 0)
+		goto fail;
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
-	if (rc) {
-		free_pages((unsigned long)elfcorebuf,
-			   get_order(elfcorebuf_sz_orig));
-		return rc;
-	}
+	if (rc)
+		goto fail;
 	set_vmcore_list_offsets(elfcorebuf_sz, &vmcore_list);
 	return 0;
+fail:
+	free_elfcorebuf();
+	return rc;
 }
 
 static int __init parse_crash_elf_headers(void)
@@ -672,8 +667,6 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	free_pages((unsigned long)elfcorebuf,
-		   get_order(elfcorebuf_sz_orig));
-	elfcorebuf = NULL;
+	free_elfcorebuf();
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);
_


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 21:49     ` Andrew Morton
  -1 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 21:49 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Treat memory chunks referenced by PT_LOAD program header entries in
> page-size boundary in vmcore_list. Formally, for each range [start,
> end], we set up the corresponding vmcore object in vmcore_list to
> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> 
> This change affects layout of /proc/vmcore.

Well, changing a userspace interface is generally unacceptable because
it can break existing userspace code.

If you think the risk is acceptable then please do explain why.  In
great detail!

> The gaps generated by the
> rearrangement are newly made visible to applications as
> holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
> start] and [end, roundup(end, PAGE_SIZE)].
> 
> Suppose variable m points at a vmcore object in vmcore_list, and
> variable phdr points at the program header of PT_LOAD type the
> variable m corresponds to. Then, pictorially:
> 
>   m->offset                    +---------------+
>                                | hole          |
> phdr->p_offset =               +---------------+
>   m->offset + (paddr - start)  |               |\
>                                | kernel memory | phdr->p_memsz
>                                |               |/
>                                +---------------+
>                                | hole          |
>   m->offset + m->size          +---------------+
> 
> where m->offset and m->offset + m->size are always page-size aligned.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-23 21:49     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 21:49 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Treat memory chunks referenced by PT_LOAD program header entries in
> page-size boundary in vmcore_list. Formally, for each range [start,
> end], we set up the corresponding vmcore object in vmcore_list to
> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> 
> This change affects layout of /proc/vmcore.

Well, changing a userspace interface is generally unacceptable because
it can break existing userspace code.

If you think the risk is acceptable then please do explain why.  In
great detail!

> The gaps generated by the
> rearrangement are newly made visible to applications as
> holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
> start] and [end, roundup(end, PAGE_SIZE)].
> 
> Suppose variable m points at a vmcore object in vmcore_list, and
> variable phdr points at the program header of PT_LOAD type the
> variable m corresponds to. Then, pictorially:
> 
>   m->offset                    +---------------+
>                                | hole          |
> phdr->p_offset =               +---------------+
>   m->offset + (paddr - start)  |               |\
>                                | kernel memory | phdr->p_memsz
>                                |               |/
>                                +---------------+
>                                | hole          |
>   m->offset + m->size          +---------------+
> 
> where m->offset and m->offset + m->size are always page-size aligned.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-23 21:49     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 21:49 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, vgoyal

On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Treat memory chunks referenced by PT_LOAD program header entries in
> page-size boundary in vmcore_list. Formally, for each range [start,
> end], we set up the corresponding vmcore object in vmcore_list to
> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> 
> This change affects layout of /proc/vmcore.

Well, changing a userspace interface is generally unacceptable because
it can break existing userspace code.

If you think the risk is acceptable then please do explain why.  In
great detail!

> The gaps generated by the
> rearrangement are newly made visible to applications as
> holes. Concretely, they are two ranges [rounddown(start, PAGE_SIZE),
> start] and [end, roundup(end, PAGE_SIZE)].
> 
> Suppose variable m points at a vmcore object in vmcore_list, and
> variable phdr points at the program header of PT_LOAD type the
> variable m corresponds to. Then, pictorially:
> 
>   m->offset                    +---------------+
>                                | hole          |
> phdr->p_offset =               +---------------+
>   m->offset + (paddr - start)  |               |\
>                                | kernel memory | phdr->p_memsz
>                                |               |/
>                                +---------------+
>                                | hole          |
>   m->offset + m->size          +---------------+
> 
> where m->offset and m->offset + m->size are always page-size aligned.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 22:00     ` Andrew Morton
  -1 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:00 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:24 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> We want to allocate ELF note segment buffer on the 2nd kernel in
> vmalloc space and remap it to user-space in order to reduce the risk
> that memory allocation fails on system with huge number of CPUs and so
> with huge ELF note segment that exceeds 11-order block size.
> 
> Although there's already remap_vmalloc_range for the purpose of
> remapping vmalloc memory to user-space, we need to specify user-space
> range via vma. Mmap on /proc/vmcore needs to remap range across
> multiple objects, so the interface that requires vma to cover full
> range is problematic.
> 
> This patch introduces remap_vmalloc_range_partial that receives
> user-space range as a pair of base address and size and can be used
> for mmap on /proc/vmcore case.
> 
> remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
> 
> ...
>
> +int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> +				void *kaddr, unsigned long size)
>  {
>  	struct vm_struct *area;
> -	unsigned long uaddr = vma->vm_start;
> -	unsigned long usize = vma->vm_end - vma->vm_start;
>  
> -	if ((PAGE_SIZE-1) & (unsigned long)addr)
> +	size = PAGE_ALIGN(size);
> +
> +	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
> +	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
>  		return -EINVAL;

hm, that's ugly.


Why don't we do this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: include/linux/mm.h: add PAGE_ALIGNED() helper

To test whether an address is aligned to PAGE_SIZE.

Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>, 
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    3 +++
 1 file changed, 3 insertions(+)

diff -puN include/linux/mm.h~a include/linux/mm.h
--- a/include/linux/mm.h~a
+++ a/include/linux/mm.h
@@ -52,6 +52,9 @@ extern unsigned long sysctl_admin_reserv
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
 
+/* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
+#define PAGE_ALIGNED(addr)	IS_ALIGNED((unsigned long)addr, PAGE_SIZE)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way
_


(I'd have thought we already had such a thing, but we don't seem to)


Then this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmalloc-introduce-remap_vmalloc_range_partial-fix

use PAGE_ALIGNED()

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmalloc.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff -puN include/linux/vmalloc.h~vmalloc-introduce-remap_vmalloc_range_partial-fix include/linux/vmalloc.h
diff -puN mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix mm/vmalloc.c
--- a/mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix
+++ a/mm/vmalloc.c
@@ -1476,10 +1476,9 @@ static void __vunmap(const void *addr, i
 	if (!addr)
 		return;
 
-	if ((PAGE_SIZE-1) & (unsigned long)addr) {
-		WARN(1, KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
+	if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
+			addr));
 		return;
-	}
 
 	area = remove_vm_area(addr);
 	if (unlikely(!area)) {
@@ -2170,8 +2169,7 @@ int remap_vmalloc_range_partial(struct v
 
 	size = PAGE_ALIGN(size);
 
-	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
-	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
+	if (!PAGE_ALIGNED(uaddr) || !PAGE_ALIGNED(kaddr))
 		return -EINVAL;
 
 	area = find_vm_area(kaddr);
_


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial
@ 2013-05-23 22:00     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:00 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:24 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> We want to allocate ELF note segment buffer on the 2nd kernel in
> vmalloc space and remap it to user-space in order to reduce the risk
> that memory allocation fails on system with huge number of CPUs and so
> with huge ELF note segment that exceeds 11-order block size.
> 
> Although there's already remap_vmalloc_range for the purpose of
> remapping vmalloc memory to user-space, we need to specify user-space
> range via vma. Mmap on /proc/vmcore needs to remap range across
> multiple objects, so the interface that requires vma to cover full
> range is problematic.
> 
> This patch introduces remap_vmalloc_range_partial that receives
> user-space range as a pair of base address and size and can be used
> for mmap on /proc/vmcore case.
> 
> remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
> 
> ...
>
> +int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> +				void *kaddr, unsigned long size)
>  {
>  	struct vm_struct *area;
> -	unsigned long uaddr = vma->vm_start;
> -	unsigned long usize = vma->vm_end - vma->vm_start;
>  
> -	if ((PAGE_SIZE-1) & (unsigned long)addr)
> +	size = PAGE_ALIGN(size);
> +
> +	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
> +	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
>  		return -EINVAL;

hm, that's ugly.


Why don't we do this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: include/linux/mm.h: add PAGE_ALIGNED() helper

To test whether an address is aligned to PAGE_SIZE.

Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>, 
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    3 +++
 1 file changed, 3 insertions(+)

diff -puN include/linux/mm.h~a include/linux/mm.h
--- a/include/linux/mm.h~a
+++ a/include/linux/mm.h
@@ -52,6 +52,9 @@ extern unsigned long sysctl_admin_reserv
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
 
+/* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
+#define PAGE_ALIGNED(addr)	IS_ALIGNED((unsigned long)addr, PAGE_SIZE)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way
_


(I'd have thought we already had such a thing, but we don't seem to)


Then this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmalloc-introduce-remap_vmalloc_range_partial-fix

use PAGE_ALIGNED()

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmalloc.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff -puN include/linux/vmalloc.h~vmalloc-introduce-remap_vmalloc_range_partial-fix include/linux/vmalloc.h
diff -puN mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix mm/vmalloc.c
--- a/mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix
+++ a/mm/vmalloc.c
@@ -1476,10 +1476,9 @@ static void __vunmap(const void *addr, i
 	if (!addr)
 		return;
 
-	if ((PAGE_SIZE-1) & (unsigned long)addr) {
-		WARN(1, KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
+	if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
+			addr));
 		return;
-	}
 
 	area = remove_vm_area(addr);
 	if (unlikely(!area)) {
@@ -2170,8 +2169,7 @@ int remap_vmalloc_range_partial(struct v
 
 	size = PAGE_ALIGN(size);
 
-	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
-	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
+	if (!PAGE_ALIGNED(uaddr) || !PAGE_ALIGNED(kaddr))
 		return -EINVAL;
 
 	area = find_vm_area(kaddr);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial
@ 2013-05-23 22:00     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:00 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, vgoyal

On Thu, 23 May 2013 14:25:24 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> We want to allocate ELF note segment buffer on the 2nd kernel in
> vmalloc space and remap it to user-space in order to reduce the risk
> that memory allocation fails on system with huge number of CPUs and so
> with huge ELF note segment that exceeds 11-order block size.
> 
> Although there's already remap_vmalloc_range for the purpose of
> remapping vmalloc memory to user-space, we need to specify user-space
> range via vma. Mmap on /proc/vmcore needs to remap range across
> multiple objects, so the interface that requires vma to cover full
> range is problematic.
> 
> This patch introduces remap_vmalloc_range_partial that receives
> user-space range as a pair of base address and size and can be used
> for mmap on /proc/vmcore case.
> 
> remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
> 
> ...
>
> +int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
> +				void *kaddr, unsigned long size)
>  {
>  	struct vm_struct *area;
> -	unsigned long uaddr = vma->vm_start;
> -	unsigned long usize = vma->vm_end - vma->vm_start;
>  
> -	if ((PAGE_SIZE-1) & (unsigned long)addr)
> +	size = PAGE_ALIGN(size);
> +
> +	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
> +	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
>  		return -EINVAL;

hm, that's ugly.


Why don't we do this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: include/linux/mm.h: add PAGE_ALIGNED() helper

To test whether an address is aligned to PAGE_SIZE.

Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>, 
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h |    3 +++
 1 file changed, 3 insertions(+)

diff -puN include/linux/mm.h~a include/linux/mm.h
--- a/include/linux/mm.h~a
+++ a/include/linux/mm.h
@@ -52,6 +52,9 @@ extern unsigned long sysctl_admin_reserv
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
 
+/* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */
+#define PAGE_ALIGNED(addr)	IS_ALIGNED((unsigned long)addr, PAGE_SIZE)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way
_


(I'd have thought we already had such a thing, but we don't seem to)


Then this:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmalloc-introduce-remap_vmalloc_range_partial-fix

use PAGE_ALIGNED()

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmalloc.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff -puN include/linux/vmalloc.h~vmalloc-introduce-remap_vmalloc_range_partial-fix include/linux/vmalloc.h
diff -puN mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix mm/vmalloc.c
--- a/mm/vmalloc.c~vmalloc-introduce-remap_vmalloc_range_partial-fix
+++ a/mm/vmalloc.c
@@ -1476,10 +1476,9 @@ static void __vunmap(const void *addr, i
 	if (!addr)
 		return;
 
-	if ((PAGE_SIZE-1) & (unsigned long)addr) {
-		WARN(1, KERN_ERR "Trying to vfree() bad address (%p)\n", addr);
+	if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\n",
+			addr));
 		return;
-	}
 
 	area = remove_vm_area(addr);
 	if (unlikely(!area)) {
@@ -2170,8 +2169,7 @@ int remap_vmalloc_range_partial(struct v
 
 	size = PAGE_ALIGN(size);
 
-	if (((PAGE_SIZE-1) & (unsigned long)uaddr) ||
-	    ((PAGE_SIZE-1) & (unsigned long)kaddr))
+	if (!PAGE_ALIGNED(uaddr) || !PAGE_ALIGNED(kaddr))
 		return -EINVAL;
 
 	area = find_vm_area(kaddr);
_


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 22:17     ` Andrew Morton
  -1 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:17 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:30 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> ...
>
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> +	/* Read Elf note segment */
> +	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> +		void *kaddr;
> +
> +		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> +		if (buflen < tsz)
> +			tsz = buflen;

We have min().

>
> ...
>
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf64_Ehdr *ehdr_ptr;
> +	Elf64_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf64(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);

I think this gets leaked in a number of places.

> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;

Please review and test:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix

use min(), fix error-path vzalloc() leaks

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/vmcore.c |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix
+++ a/fs/proc/vmcore.c
@@ -142,9 +142,7 @@ static ssize_t read_vmcore(struct file *
 
 	/* Read ELF core header */
 	if (*fpos < elfcorebuf_sz) {
-		tsz = elfcorebuf_sz - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
+		tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
 		if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
 			return -EFAULT;
 		buflen -= tsz;
@@ -161,9 +159,7 @@ static ssize_t read_vmcore(struct file *
 	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
-		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
+		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
 		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
 		if (copy_to_user(buffer, kaddr, tsz))
 			return -EFAULT;
@@ -179,9 +175,7 @@ static ssize_t read_vmcore(struct file *
 
 	list_for_each_entry(m, &vmcore_list, list) {
 		if (*fpos < m->offset + m->size) {
-			tsz = m->offset + m->size - *fpos;
-			if (buflen < tsz)
-				tsz = buflen;
+			tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
 			start = m->paddr + *fpos - m->offset;
 			tmp = read_from_oldmem(buffer, tsz, &start, 1);
 			if (tmp < 0)
@@ -710,6 +704,8 @@ static void free_elfcorebuf(void)
 {
 	free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
+	vfree(elfnotes_buf);
+	elfnotes_buf = NULL;
 }
 
 static int __init parse_crash_elf64_headers(void)
@@ -898,8 +894,6 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	vfree(elfnotes_buf);
-	elfnotes_buf = NULL;
 	free_elfcorebuf();
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);
_


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
@ 2013-05-23 22:17     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:17 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:30 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> ...
>
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> +	/* Read Elf note segment */
> +	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> +		void *kaddr;
> +
> +		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> +		if (buflen < tsz)
> +			tsz = buflen;

We have min().

>
> ...
>
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf64_Ehdr *ehdr_ptr;
> +	Elf64_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf64(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);

I think this gets leaked in a number of places.

> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;

Please review and test:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix

use min(), fix error-path vzalloc() leaks

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/vmcore.c |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix
+++ a/fs/proc/vmcore.c
@@ -142,9 +142,7 @@ static ssize_t read_vmcore(struct file *
 
 	/* Read ELF core header */
 	if (*fpos < elfcorebuf_sz) {
-		tsz = elfcorebuf_sz - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
+		tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
 		if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
 			return -EFAULT;
 		buflen -= tsz;
@@ -161,9 +159,7 @@ static ssize_t read_vmcore(struct file *
 	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
-		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
+		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
 		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
 		if (copy_to_user(buffer, kaddr, tsz))
 			return -EFAULT;
@@ -179,9 +175,7 @@ static ssize_t read_vmcore(struct file *
 
 	list_for_each_entry(m, &vmcore_list, list) {
 		if (*fpos < m->offset + m->size) {
-			tsz = m->offset + m->size - *fpos;
-			if (buflen < tsz)
-				tsz = buflen;
+			tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
 			start = m->paddr + *fpos - m->offset;
 			tmp = read_from_oldmem(buffer, tsz, &start, 1);
 			if (tmp < 0)
@@ -710,6 +704,8 @@ static void free_elfcorebuf(void)
 {
 	free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
+	vfree(elfnotes_buf);
+	elfnotes_buf = NULL;
 }
 
 static int __init parse_crash_elf64_headers(void)
@@ -898,8 +894,6 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	vfree(elfnotes_buf);
-	elfnotes_buf = NULL;
 	free_elfcorebuf();
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory
@ 2013-05-23 22:17     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:17 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, vgoyal

On Thu, 23 May 2013 14:25:30 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> The reasons why we don't allocate ELF note segment in the 1st kernel
> (old memory) on page boundary is to keep backward compatibility for
> old kernels, and that if doing so, we waste not a little memory due to
> round-up operation to fit the memory to page boundary since most of
> the buffers are in per-cpu area.
> 
> ELF notes are per-cpu, so total size of ELF note segments depends on
> number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
> and there's already system with 4192 CPUs in SGI, where total size
> amounts to 1MB. This can be larger in the near future or possibly even
> now on another architecture that has larger size of note per a single
> cpu. Thus, to avoid the case where memory allocation for large block
> fails, we allocate vmcore objects on vmalloc memory.
> 
> This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
> to the ELF note segment buffer and its size. There's no longer the
> vmcore object that corresponds to the ELF note segment in
> vmcore_list. Accordingly, read_vmcore() has new case for ELF note
> segment and set_vmcore_list_offsets_elf{64,32}() and other helper
> functions starts calculating offset from sum of size of ELF headers
> and size of ELF note segment.
> 
> ...
>
> @@ -154,6 +157,26 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> +	/* Read Elf note segment */
> +	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
> +		void *kaddr;
> +
> +		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
> +		if (buflen < tsz)
> +			tsz = buflen;

We have min().

>
> ...
>
> +/* Merges all the PT_NOTE headers into one. */
> +static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
> +					   char **notes_buf, size_t *notes_sz)
> +{
> +	int i, nr_ptnote=0, rc=0;
> +	char *tmp;
> +	Elf64_Ehdr *ehdr_ptr;
> +	Elf64_Phdr phdr;
> +	u64 phdr_sz = 0, note_off;
> +
> +	ehdr_ptr = (Elf64_Ehdr *)elfptr;
> +
> +	rc = update_note_header_size_elf64(ehdr_ptr);
> +	if (rc < 0)
> +		return rc;
> +
> +	rc = get_note_number_and_size_elf64(ehdr_ptr, &nr_ptnote, &phdr_sz);
> +	if (rc < 0)
> +		return rc;
> +
> +	*notes_sz = roundup(phdr_sz, PAGE_SIZE);
> +	*notes_buf = vzalloc(*notes_sz);

I think this gets leaked in a number of places.

> +	if (!*notes_buf)
> +		return -ENOMEM;
> +
> +	rc = copy_notes_elf64(ehdr_ptr, *notes_buf);
> +	if (rc < 0)
> +		return rc;
> +
>  	/* Prepare merged PT_NOTE program header. */
>  	phdr.p_type    = PT_NOTE;
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;

Please review and test:

From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix

use min(), fix error-path vzalloc() leaks

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/vmcore.c |   16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-allocate-elf-note-segment-in-the-2nd-kernel-vmalloc-memory-fix
+++ a/fs/proc/vmcore.c
@@ -142,9 +142,7 @@ static ssize_t read_vmcore(struct file *
 
 	/* Read ELF core header */
 	if (*fpos < elfcorebuf_sz) {
-		tsz = elfcorebuf_sz - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
+		tsz = min(elfcorebuf_sz - (size_t)*fpos, buflen);
 		if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
 			return -EFAULT;
 		buflen -= tsz;
@@ -161,9 +159,7 @@ static ssize_t read_vmcore(struct file *
 	if (*fpos < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
-		tsz = elfcorebuf_sz + elfnotes_sz - *fpos;
-		if (buflen < tsz)
-			tsz = buflen;
+		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)*fpos, buflen);
 		kaddr = elfnotes_buf + *fpos - elfcorebuf_sz;
 		if (copy_to_user(buffer, kaddr, tsz))
 			return -EFAULT;
@@ -179,9 +175,7 @@ static ssize_t read_vmcore(struct file *
 
 	list_for_each_entry(m, &vmcore_list, list) {
 		if (*fpos < m->offset + m->size) {
-			tsz = m->offset + m->size - *fpos;
-			if (buflen < tsz)
-				tsz = buflen;
+			tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);
 			start = m->paddr + *fpos - m->offset;
 			tmp = read_from_oldmem(buffer, tsz, &start, 1);
 			if (tmp < 0)
@@ -710,6 +704,8 @@ static void free_elfcorebuf(void)
 {
 	free_pages((unsigned long)elfcorebuf, get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
+	vfree(elfnotes_buf);
+	elfnotes_buf = NULL;
 }
 
 static int __init parse_crash_elf64_headers(void)
@@ -898,8 +894,6 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	vfree(elfnotes_buf);
-	elfnotes_buf = NULL;
 	free_elfcorebuf();
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);
_


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-05-23 22:24     ` Andrew Morton
  -1 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:24 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

More reviewing and testing, please.


From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmcore-support-mmap-on-proc-vmcore-fix

use min(), switch to conventional error-unwinding approach

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/vmcore.c |   27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix
+++ a/fs/proc/vmcore.c
@@ -218,9 +218,7 @@ static int mmap_vmcore(struct file *file
 	if (start < elfcorebuf_sz) {
 		u64 pfn;
 
-		tsz = elfcorebuf_sz - start;
-		if (size < tsz)
-			tsz = size;
+		tsz = min(elfcorebuf_sz - (size_t)start, size);
 		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
 		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
 				    vma->vm_page_prot))
@@ -236,15 +234,11 @@ static int mmap_vmcore(struct file *file
 	if (start < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
-		tsz = elfcorebuf_sz + elfnotes_sz - start;
-		if (size < tsz)
-			tsz = size;
+		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
 		kaddr = elfnotes_buf + start - elfcorebuf_sz;
 		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
-						kaddr, tsz)) {
-			do_munmap(vma->vm_mm, vma->vm_start, len);
-			return -EAGAIN;
-		}
+						kaddr, tsz))
+			goto fail;
 		size -= tsz;
 		start += tsz;
 		len += tsz;
@@ -257,16 +251,12 @@ static int mmap_vmcore(struct file *file
 		if (start < m->offset + m->size) {
 			u64 paddr = 0;
 
-			tsz = m->offset + m->size - start;
-			if (size < tsz)
-				tsz = size;
+			tsz = min_t(size_t, m->offset + m->size - start, size);
 			paddr = m->paddr + start - m->offset;
 			if (remap_pfn_range(vma, vma->vm_start + len,
 					    paddr >> PAGE_SHIFT, tsz,
-					    vma->vm_page_prot)) {
-				do_munmap(vma->vm_mm, vma->vm_start, len);
-				return -EAGAIN;
-			}
+					    vma->vm_page_prot))
+				goto fail;
 			size -= tsz;
 			start += tsz;
 			len += tsz;
@@ -277,6 +267,9 @@ static int mmap_vmcore(struct file *file
 	}
 
 	return 0;
+fail:
+	do_munmap(vma->vm_mm, vma->vm_start, len);
+	return -EAGAIN;
 }
 
 static const struct file_operations proc_vmcore_operations = {
_


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-23 22:24     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:24 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec,
	linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel, walken,
	hughd, kosaki.motohiro

On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

More reviewing and testing, please.


From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmcore-support-mmap-on-proc-vmcore-fix

use min(), switch to conventional error-unwinding approach

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/vmcore.c |   27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix
+++ a/fs/proc/vmcore.c
@@ -218,9 +218,7 @@ static int mmap_vmcore(struct file *file
 	if (start < elfcorebuf_sz) {
 		u64 pfn;
 
-		tsz = elfcorebuf_sz - start;
-		if (size < tsz)
-			tsz = size;
+		tsz = min(elfcorebuf_sz - (size_t)start, size);
 		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
 		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
 				    vma->vm_page_prot))
@@ -236,15 +234,11 @@ static int mmap_vmcore(struct file *file
 	if (start < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
-		tsz = elfcorebuf_sz + elfnotes_sz - start;
-		if (size < tsz)
-			tsz = size;
+		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
 		kaddr = elfnotes_buf + start - elfcorebuf_sz;
 		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
-						kaddr, tsz)) {
-			do_munmap(vma->vm_mm, vma->vm_start, len);
-			return -EAGAIN;
-		}
+						kaddr, tsz))
+			goto fail;
 		size -= tsz;
 		start += tsz;
 		len += tsz;
@@ -257,16 +251,12 @@ static int mmap_vmcore(struct file *file
 		if (start < m->offset + m->size) {
 			u64 paddr = 0;
 
-			tsz = m->offset + m->size - start;
-			if (size < tsz)
-				tsz = size;
+			tsz = min_t(size_t, m->offset + m->size - start, size);
 			paddr = m->paddr + start - m->offset;
 			if (remap_pfn_range(vma, vma->vm_start + len,
 					    paddr >> PAGE_SHIFT, tsz,
-					    vma->vm_page_prot)) {
-				do_munmap(vma->vm_mm, vma->vm_start, len);
-				return -EAGAIN;
-			}
+					    vma->vm_page_prot))
+				goto fail;
 			size -= tsz;
 			start += tsz;
 			len += tsz;
@@ -277,6 +267,9 @@ static int mmap_vmcore(struct file *file
 	}
 
 	return 0;
+fail:
+	do_munmap(vma->vm_mm, vma->vm_start, len);
+	return -EAGAIN;
 }
 
 static const struct file_operations proc_vmcore_operations = {
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-23 22:24     ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-05-23 22:24 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, vgoyal

On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

More reviewing and testing, please.


From: Andrew Morton <akpm@linux-foundation.org>
Subject: vmcore-support-mmap-on-proc-vmcore-fix

use min(), switch to conventional error-unwinding approach

Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/proc/vmcore.c |   27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff -puN fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix fs/proc/vmcore.c
--- a/fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix
+++ a/fs/proc/vmcore.c
@@ -218,9 +218,7 @@ static int mmap_vmcore(struct file *file
 	if (start < elfcorebuf_sz) {
 		u64 pfn;
 
-		tsz = elfcorebuf_sz - start;
-		if (size < tsz)
-			tsz = size;
+		tsz = min(elfcorebuf_sz - (size_t)start, size);
 		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
 		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
 				    vma->vm_page_prot))
@@ -236,15 +234,11 @@ static int mmap_vmcore(struct file *file
 	if (start < elfcorebuf_sz + elfnotes_sz) {
 		void *kaddr;
 
-		tsz = elfcorebuf_sz + elfnotes_sz - start;
-		if (size < tsz)
-			tsz = size;
+		tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start, size);
 		kaddr = elfnotes_buf + start - elfcorebuf_sz;
 		if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
-						kaddr, tsz)) {
-			do_munmap(vma->vm_mm, vma->vm_start, len);
-			return -EAGAIN;
-		}
+						kaddr, tsz))
+			goto fail;
 		size -= tsz;
 		start += tsz;
 		len += tsz;
@@ -257,16 +251,12 @@ static int mmap_vmcore(struct file *file
 		if (start < m->offset + m->size) {
 			u64 paddr = 0;
 
-			tsz = m->offset + m->size - start;
-			if (size < tsz)
-				tsz = size;
+			tsz = min_t(size_t, m->offset + m->size - start, size);
 			paddr = m->paddr + start - m->offset;
 			if (remap_pfn_range(vma, vma->vm_start + len,
 					    paddr >> PAGE_SHIFT, tsz,
-					    vma->vm_page_prot)) {
-				do_munmap(vma->vm_mm, vma->vm_start, len);
-				return -EAGAIN;
-			}
+					    vma->vm_page_prot))
+				goto fail;
 			size -= tsz;
 			start += tsz;
 			len += tsz;
@@ -277,6 +267,9 @@ static int mmap_vmcore(struct file *file
 	}
 
 	return 0;
+fail:
+	do_munmap(vma->vm_mm, vma->vm_start, len);
+	return -EAGAIN;
 }
 
 static const struct file_operations proc_vmcore_operations = {
_


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-23 22:24     ` Andrew Morton
  (?)
  (?)
@ 2013-05-24  9:02     ` Maxim Uvarov
  2013-05-27  1:49         ` HATAYAMA Daisuke
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxim Uvarov @ 2013-05-24  9:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: HATAYAMA Daisuke, riel, hughd, jingbai.ma, kexec, linux-kernel,
	lisa.mitchell, linux-mm, Atsushi Kumagai, Eric W. Biederman,
	kosaki.motohiro, zhangyanfei, walken, Cliff Wickman, Vivek Goyal

[-- Attachment #1: Type: text/plain, Size: 4894 bytes --]

2013/5/24 Andrew Morton <akpm@linux-foundation.org>

> On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> d.hatayama@jp.fujitsu.com> wrote:
>
> > This patch introduces mmap_vmcore().
> >
> > Don't permit writable nor executable mapping even with mprotect()
> > because this mmap() is aimed at reading crash dump memory.
> > Non-writable mapping is also requirement of remap_pfn_range() when
> > mapping linear pages on non-consecutive physical pages; see
> > is_cow_mapping().
> >
> > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> > remap_vmalloc_range_pertial at the same time for a single
> > vma. do_munmap() can correctly clean partially remapped vma with two
> > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> > their comments for details.
> >
> > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> > limitation comes from the fact that the third argument of
> > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>
> More reviewing and testing, please.
>
>
Do you have git pull for both kernel and userland changes? I would like to
do some more testing on my machines.

Maxim.


>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: vmcore-support-mmap-on-proc-vmcore-fix
>
> use min(), switch to conventional error-unwinding approach
>
> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Lisa Mitchell <lisa.mitchell@hp.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  fs/proc/vmcore.c |   27 ++++++++++-----------------
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff -puN fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix
> fs/proc/vmcore.c
> --- a/fs/proc/vmcore.c~vmcore-support-mmap-on-proc-vmcore-fix
> +++ a/fs/proc/vmcore.c
> @@ -218,9 +218,7 @@ static int mmap_vmcore(struct file *file
>         if (start < elfcorebuf_sz) {
>                 u64 pfn;
>
> -               tsz = elfcorebuf_sz - start;
> -               if (size < tsz)
> -                       tsz = size;
> +               tsz = min(elfcorebuf_sz - (size_t)start, size);
>                 pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
>                 if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
>                                     vma->vm_page_prot))
> @@ -236,15 +234,11 @@ static int mmap_vmcore(struct file *file
>         if (start < elfcorebuf_sz + elfnotes_sz) {
>                 void *kaddr;
>
> -               tsz = elfcorebuf_sz + elfnotes_sz - start;
> -               if (size < tsz)
> -                       tsz = size;
> +               tsz = min(elfcorebuf_sz + elfnotes_sz - (size_t)start,
> size);
>                 kaddr = elfnotes_buf + start - elfcorebuf_sz;
>                 if (remap_vmalloc_range_partial(vma, vma->vm_start + len,
> -                                               kaddr, tsz)) {
> -                       do_munmap(vma->vm_mm, vma->vm_start, len);
> -                       return -EAGAIN;
> -               }
> +                                               kaddr, tsz))
> +                       goto fail;
>                 size -= tsz;
>                 start += tsz;
>                 len += tsz;
> @@ -257,16 +251,12 @@ static int mmap_vmcore(struct file *file
>                 if (start < m->offset + m->size) {
>                         u64 paddr = 0;
>
> -                       tsz = m->offset + m->size - start;
> -                       if (size < tsz)
> -                               tsz = size;
> +                       tsz = min_t(size_t, m->offset + m->size - start,
> size);
>                         paddr = m->paddr + start - m->offset;
>                         if (remap_pfn_range(vma, vma->vm_start + len,
>                                             paddr >> PAGE_SHIFT, tsz,
> -                                           vma->vm_page_prot)) {
> -                               do_munmap(vma->vm_mm, vma->vm_start, len);
> -                               return -EAGAIN;
> -                       }
> +                                           vma->vm_page_prot))
> +                               goto fail;
>                         size -= tsz;
>                         start += tsz;
>                         len += tsz;
> @@ -277,6 +267,9 @@ static int mmap_vmcore(struct file *file
>         }
>
>         return 0;
> +fail:
> +       do_munmap(vma->vm_mm, vma->vm_start, len);
> +       return -EAGAIN;
>  }
>
>  static const struct file_operations proc_vmcore_operations = {
> _
>
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>



-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 6879 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
  2013-05-23 21:49     ` Andrew Morton
  (?)
@ 2013-05-24 13:12       ` Vivek Goyal
  -1 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-24 13:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: HATAYAMA Daisuke, ebiederm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:49:28PM -0700, Andrew Morton wrote:
> On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> 
> > Treat memory chunks referenced by PT_LOAD program header entries in
> > page-size boundary in vmcore_list. Formally, for each range [start,
> > end], we set up the corresponding vmcore object in vmcore_list to
> > [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> > 
> > This change affects layout of /proc/vmcore.
> 
> Well, changing a userspace interface is generally unacceptable because
> it can break existing userspace code.
> 
> If you think the risk is acceptable then please do explain why.  In
> great detail!

I think it should not be a problem as /proc/vmcore is useful only when
one parses the elf headers and then accesses the contents of file based
on the header information. This patch just introduces additional areas
in /proc/vmcore file and ELF headers still point to right contents. So
any tool parsing ELF headers and then accessing file contents based on
that info should still be fine.

AFAIK, no user space tool should be broken there.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-24 13:12       ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-24 13:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: HATAYAMA Daisuke, ebiederm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

On Thu, May 23, 2013 at 02:49:28PM -0700, Andrew Morton wrote:
> On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> 
> > Treat memory chunks referenced by PT_LOAD program header entries in
> > page-size boundary in vmcore_list. Formally, for each range [start,
> > end], we set up the corresponding vmcore object in vmcore_list to
> > [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> > 
> > This change affects layout of /proc/vmcore.
> 
> Well, changing a userspace interface is generally unacceptable because
> it can break existing userspace code.
> 
> If you think the risk is acceptable then please do explain why.  In
> great detail!

I think it should not be a problem as /proc/vmcore is useful only when
one parses the elf headers and then accesses the contents of file based
on the header information. This patch just introduces additional areas
in /proc/vmcore file and ELF headers still point to right contents. So
any tool parsing ELF headers and then accessing file contents based on
that info should still be fine.

AFAIK, no user space tool should be broken there.

Thanks
Vivek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-24 13:12       ` Vivek Goyal
  0 siblings, 0 replies; 103+ messages in thread
From: Vivek Goyal @ 2013-05-24 13:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, linux-mm,
	HATAYAMA Daisuke, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, jingbai.ma

On Thu, May 23, 2013 at 02:49:28PM -0700, Andrew Morton wrote:
> On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> 
> > Treat memory chunks referenced by PT_LOAD program header entries in
> > page-size boundary in vmcore_list. Formally, for each range [start,
> > end], we set up the corresponding vmcore object in vmcore_list to
> > [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
> > 
> > This change affects layout of /proc/vmcore.
> 
> Well, changing a userspace interface is generally unacceptable because
> it can break existing userspace code.
> 
> If you think the risk is acceptable then please do explain why.  In
> great detail!

I think it should not be a problem as /proc/vmcore is useful only when
one parses the elf headers and then accesses the contents of file based
on the header information. This patch just introduces additional areas
in /proc/vmcore file and ELF headers still point to right contents. So
any tool parsing ELF headers and then accessing file contents based on
that info should still be fine.

AFAIK, no user space tool should be broken there.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
  2013-05-24 13:12       ` Vivek Goyal
  (?)
@ 2013-05-27  0:13         ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-27  0:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vivek Goyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

(2013/05/24 22:12), Vivek Goyal wrote:
> On Thu, May 23, 2013 at 02:49:28PM -0700, Andrew Morton wrote:
>> On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>>
>>> Treat memory chunks referenced by PT_LOAD program header entries in
>>> page-size boundary in vmcore_list. Formally, for each range [start,
>>> end], we set up the corresponding vmcore object in vmcore_list to
>>> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
>>>
>>> This change affects layout of /proc/vmcore.
>>
>> Well, changing a userspace interface is generally unacceptable because
>> it can break existing userspace code.
>>
>> If you think the risk is acceptable then please do explain why.  In
>> great detail!
>
> I think it should not be a problem as /proc/vmcore is useful only when
> one parses the elf headers and then accesses the contents of file based
> on the header information. This patch just introduces additional areas
> in /proc/vmcore file and ELF headers still point to right contents. So
> any tool parsing ELF headers and then accessing file contents based on
> that info should still be fine.
>
> AFAIK, no user space tool should be broken there.
>
> Thanks
> Vivek
>

Yes, the changes are new introduction of holes between components of ELF
and tools doesn't reach the holes as long as by looking up program header
table and other tables. cp command touches the holes but trivially works
well.

-- 
Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-27  0:13         ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-27  0:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vivek Goyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

(2013/05/24 22:12), Vivek Goyal wrote:
> On Thu, May 23, 2013 at 02:49:28PM -0700, Andrew Morton wrote:
>> On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>>
>>> Treat memory chunks referenced by PT_LOAD program header entries in
>>> page-size boundary in vmcore_list. Formally, for each range [start,
>>> end], we set up the corresponding vmcore object in vmcore_list to
>>> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
>>>
>>> This change affects layout of /proc/vmcore.
>>
>> Well, changing a userspace interface is generally unacceptable because
>> it can break existing userspace code.
>>
>> If you think the risk is acceptable then please do explain why.  In
>> great detail!
>
> I think it should not be a problem as /proc/vmcore is useful only when
> one parses the elf headers and then accesses the contents of file based
> on the header information. This patch just introduces additional areas
> in /proc/vmcore file and ELF headers still point to right contents. So
> any tool parsing ELF headers and then accessing file contents based on
> that info should still be fine.
>
> AFAIK, no user space tool should be broken there.
>
> Thanks
> Vivek
>

Yes, the changes are new introduction of holes between components of ELF
and tools doesn't reach the holes as long as by looking up program header
table and other tables. cp command touches the holes but trivially works
well.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list
@ 2013-05-27  0:13         ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-27  0:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, walken, cpw, Vivek Goyal

(2013/05/24 22:12), Vivek Goyal wrote:
> On Thu, May 23, 2013 at 02:49:28PM -0700, Andrew Morton wrote:
>> On Thu, 23 May 2013 14:25:13 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>>
>>> Treat memory chunks referenced by PT_LOAD program header entries in
>>> page-size boundary in vmcore_list. Formally, for each range [start,
>>> end], we set up the corresponding vmcore object in vmcore_list to
>>> [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].
>>>
>>> This change affects layout of /proc/vmcore.
>>
>> Well, changing a userspace interface is generally unacceptable because
>> it can break existing userspace code.
>>
>> If you think the risk is acceptable then please do explain why.  In
>> great detail!
>
> I think it should not be a problem as /proc/vmcore is useful only when
> one parses the elf headers and then accesses the contents of file based
> on the header information. This patch just introduces additional areas
> in /proc/vmcore file and ELF headers still point to right contents. So
> any tool parsing ELF headers and then accessing file contents based on
> that info should still be fine.
>
> AFAIK, no user space tool should be broken there.
>
> Thanks
> Vivek
>

Yes, the changes are new introduction of holes between components of ELF
and tools doesn't reach the holes as long as by looking up program header
table and other tables. cp command touches the holes but trivially works
well.

-- 
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-24  9:02     ` Maxim Uvarov
  2013-05-27  1:49         ` HATAYAMA Daisuke
@ 2013-05-27  1:49         ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-27  1:49 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Andrew Morton, riel, hughd, jingbai.ma, kexec, linux-kernel,
	lisa.mitchell, linux-mm, Atsushi Kumagai, Eric W. Biederman,
	kosaki.motohiro, zhangyanfei, walken, Cliff Wickman, Vivek Goyal

(2013/05/24 18:02), Maxim Uvarov wrote:
>
>
>
> 2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>>
>
>     On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>> wrote:
>
>      > This patch introduces mmap_vmcore().
>      >
>      > Don't permit writable nor executable mapping even with mprotect()
>      > because this mmap() is aimed at reading crash dump memory.
>      > Non-writable mapping is also requirement of remap_pfn_range() when
>      > mapping linear pages on non-consecutive physical pages; see
>      > is_cow_mapping().
>      >
>      > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>      > remap_vmalloc_range_pertial at the same time for a single
>      > vma. do_munmap() can correctly clean partially remapped vma with two
>      > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>      > their comments for details.
>      >
>      > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>      > limitation comes from the fact that the third argument of
>      > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>
>     More reviewing and testing, please.
>
>
> Do you have git pull for both kernel and userland changes? I would like to do some more testing on my machines.
>
> Maxim.

Thanks! That's very helpful.

-- 
Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-27  1:49         ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-27  1:49 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Andrew Morton, riel, hughd, jingbai.ma, kexec, linux-kernel,
	lisa.mitchell, linux-mm, Atsushi Kumagai, Eric W. Biederman,
	kosaki.motohiro, zhangyanfei, walken, Cliff Wickman, Vivek Goyal

(2013/05/24 18:02), Maxim Uvarov wrote:
>
>
>
> 2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>>
>
>     On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>> wrote:
>
>      > This patch introduces mmap_vmcore().
>      >
>      > Don't permit writable nor executable mapping even with mprotect()
>      > because this mmap() is aimed at reading crash dump memory.
>      > Non-writable mapping is also requirement of remap_pfn_range() when
>      > mapping linear pages on non-consecutive physical pages; see
>      > is_cow_mapping().
>      >
>      > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>      > remap_vmalloc_range_pertial at the same time for a single
>      > vma. do_munmap() can correctly clean partially remapped vma with two
>      > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>      > their comments for details.
>      >
>      > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>      > limitation comes from the fact that the third argument of
>      > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>
>     More reviewing and testing, please.
>
>
> Do you have git pull for both kernel and userland changes? I would like to do some more testing on my machines.
>
> Maxim.

Thanks! That's very helpful.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-27  1:49         ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-05-27  1:49 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, Vivek Goyal,
	linux-mm, Atsushi Kumagai, Eric W. Biederman, kosaki.motohiro,
	zhangyanfei, Andrew Morton, walken, Cliff Wickman, jingbai.ma

(2013/05/24 18:02), Maxim Uvarov wrote:
>
>
>
> 2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>>
>
>     On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>> wrote:
>
>      > This patch introduces mmap_vmcore().
>      >
>      > Don't permit writable nor executable mapping even with mprotect()
>      > because this mmap() is aimed at reading crash dump memory.
>      > Non-writable mapping is also requirement of remap_pfn_range() when
>      > mapping linear pages on non-consecutive physical pages; see
>      > is_cow_mapping().
>      >
>      > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>      > remap_vmalloc_range_pertial at the same time for a single
>      > vma. do_munmap() can correctly clean partially remapped vma with two
>      > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>      > their comments for details.
>      >
>      > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>      > limitation comes from the fact that the third argument of
>      > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>
>     More reviewing and testing, please.
>
>
> Do you have git pull for both kernel and userland changes? I would like to do some more testing on my machines.
>
> Maxim.

Thanks! That's very helpful.

-- 
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-27  1:49         ` HATAYAMA Daisuke
  (?)
  (?)
@ 2013-05-30  9:14         ` Maxim Uvarov
  2013-05-30  9:26             ` Zhang Yanfei
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxim Uvarov @ 2013-05-30  9:14 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: Andrew Morton, riel, hughd, jingbai.ma, kexec, linux-kernel,
	lisa.mitchell, linux-mm, Atsushi Kumagai, Eric W. Biederman,
	kosaki.motohiro, zhangyanfei, walken, Cliff Wickman, Vivek Goyal

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

> (2013/05/24 18:02), Maxim Uvarov wrote:
>
>>
>>
>>
>> 2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
>> akpm@linux-foundation.**org <akpm@linux-foundation.org>>>
>>
>>
>>     On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
>> d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.**com<d.hatayama@jp.fujitsu.com>>>
>> wrote:
>>
>>      > This patch introduces mmap_vmcore().
>>      >
>>      > Don't permit writable nor executable mapping even with mprotect()
>>      > because this mmap() is aimed at reading crash dump memory.
>>      > Non-writable mapping is also requirement of remap_pfn_range() when
>>      > mapping linear pages on non-consecutive physical pages; see
>>      > is_cow_mapping().
>>      >
>>      > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>>      > remap_vmalloc_range_pertial at the same time for a single
>>      > vma. do_munmap() can correctly clean partially remapped vma with
>> two
>>      > functions in abnormal case. See zap_pte_range(), vm_normal_page()
>> and
>>      > their comments for details.
>>      >
>>      > On x86-32 PAE kernels, mmap() supports at most 16TB memory only.
>> This
>>      > limitation comes from the fact that the third argument of
>>      > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned
>> long.
>>
>>     More reviewing and testing, please.
>>
>>
>> Do you have git pull for both kernel and userland changes? I would like
>> to do some more testing on my machines.
>>
>> Maxim.
>>
>
> Thanks! That's very helpful.
>
> --
> Thanks.
> HATAYAMA, Daisuke
>
> Any update for this? Where can I checkout all sources?

-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 2637 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-30  9:14         ` Maxim Uvarov
  2013-05-30  9:26             ` Zhang Yanfei
@ 2013-05-30  9:26             ` Zhang Yanfei
  0 siblings, 0 replies; 103+ messages in thread
From: Zhang Yanfei @ 2013-05-30  9:26 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: HATAYAMA Daisuke, Andrew Morton, riel, hughd, jingbai.ma, kexec,
	linux-kernel, lisa.mitchell, linux-mm, Atsushi Kumagai,
	Eric W. Biederman, kosaki.motohiro, walken, Cliff Wickman,
	Vivek Goyal

On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> 
> 
> 
> 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>
> 
>     (2013/05/24 18:02), Maxim Uvarov wrote:
> 
> 
> 
> 
>         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:akpm@linux-foundation.org>>>
> 
> 
>             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> 
>              > This patch introduces mmap_vmcore().
>              >
>              > Don't permit writable nor executable mapping even with mprotect()
>              > because this mmap() is aimed at reading crash dump memory.
>              > Non-writable mapping is also requirement of remap_pfn_range() when
>              > mapping linear pages on non-consecutive physical pages; see
>              > is_cow_mapping().
>              >
>              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>              > remap_vmalloc_range_pertial at the same time for a single
>              > vma. do_munmap() can correctly clean partially remapped vma with two
>              > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>              > their comments for details.
>              >
>              > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>              > limitation comes from the fact that the third argument of
>              > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
> 
>             More reviewing and testing, please.
> 
> 
>         Do you have git pull for both kernel and userland changes? I would like to do some more testing on my machines.
> 
>         Maxim.
> 
> 
>     Thanks! That's very helpful.
> 
>     -- 
>     Thanks.
>     HATAYAMA, Daisuke
> 
> Any update for this? Where can I checkout all sources?

This series is now in Andrew Morton's -mm tree.

-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-30  9:26             ` Zhang Yanfei
  0 siblings, 0 replies; 103+ messages in thread
From: Zhang Yanfei @ 2013-05-30  9:26 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: HATAYAMA Daisuke, Andrew Morton, riel, hughd, jingbai.ma, kexec,
	linux-kernel, lisa.mitchell, linux-mm, Atsushi Kumagai,
	Eric W. Biederman, kosaki.motohiro, walken, Cliff Wickman,
	Vivek Goyal

On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> 
> 
> 
> 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>
> 
>     (2013/05/24 18:02), Maxim Uvarov wrote:
> 
> 
> 
> 
>         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:akpm@linux-foundation.org>>>
> 
> 
>             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> 
>              > This patch introduces mmap_vmcore().
>              >
>              > Don't permit writable nor executable mapping even with mprotect()
>              > because this mmap() is aimed at reading crash dump memory.
>              > Non-writable mapping is also requirement of remap_pfn_range() when
>              > mapping linear pages on non-consecutive physical pages; see
>              > is_cow_mapping().
>              >
>              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>              > remap_vmalloc_range_pertial at the same time for a single
>              > vma. do_munmap() can correctly clean partially remapped vma with two
>              > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>              > their comments for details.
>              >
>              > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>              > limitation comes from the fact that the third argument of
>              > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
> 
>             More reviewing and testing, please.
> 
> 
>         Do you have git pull for both kernel and userland changes? I would like to do some more testing on my machines.
> 
>         Maxim.
> 
> 
>     Thanks! That's very helpful.
> 
>     -- 
>     Thanks.
>     HATAYAMA, Daisuke
> 
> Any update for this? Where can I checkout all sources?

This series is now in Andrew Morton's -mm tree.

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-05-30  9:26             ` Zhang Yanfei
  0 siblings, 0 replies; 103+ messages in thread
From: Zhang Yanfei @ 2013-05-30  9:26 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: riel, hughd, kexec, linux-kernel, lisa.mitchell, Vivek Goyal,
	linux-mm, HATAYAMA Daisuke, Atsushi Kumagai, Eric W. Biederman,
	kosaki.motohiro, Andrew Morton, walken, Cliff Wickman,
	jingbai.ma

On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> 
> 
> 
> 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>
> 
>     (2013/05/24 18:02), Maxim Uvarov wrote:
> 
> 
> 
> 
>         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:akpm@linux-foundation.org>>>
> 
> 
>             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> 
>              > This patch introduces mmap_vmcore().
>              >
>              > Don't permit writable nor executable mapping even with mprotect()
>              > because this mmap() is aimed at reading crash dump memory.
>              > Non-writable mapping is also requirement of remap_pfn_range() when
>              > mapping linear pages on non-consecutive physical pages; see
>              > is_cow_mapping().
>              >
>              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>              > remap_vmalloc_range_pertial at the same time for a single
>              > vma. do_munmap() can correctly clean partially remapped vma with two
>              > functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>              > their comments for details.
>              >
>              > On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>              > limitation comes from the fact that the third argument of
>              > remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
> 
>             More reviewing and testing, please.
> 
> 
>         Do you have git pull for both kernel and userland changes? I would like to do some more testing on my machines.
> 
>         Maxim.
> 
> 
>     Thanks! That's very helpful.
> 
>     -- 
>     Thanks.
>     HATAYAMA, Daisuke
> 
> Any update for this? Where can I checkout all sources?

This series is now in Andrew Morton's -mm tree.

-- 
Thanks.
Zhang Yanfei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-30  9:26             ` Zhang Yanfei
  (?)
  (?)
@ 2013-05-30 10:30             ` Maxim Uvarov
  2013-06-03  8:43                 ` Atsushi Kumagai
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxim Uvarov @ 2013-05-30 10:30 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: HATAYAMA Daisuke, Andrew Morton, riel, hughd, jingbai.ma, kexec,
	linux-kernel, lisa.mitchell, linux-mm, Atsushi Kumagai,
	Eric W. Biederman, kosaki.motohiro, walken, Cliff Wickman,
	Vivek Goyal

[-- Attachment #1: Type: text/plain, Size: 2401 bytes --]

2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

> On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> >
> >
> >
> > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:
> d.hatayama@jp.fujitsu.com>>
> >
> >     (2013/05/24 18:02), Maxim Uvarov wrote:
> >
> >
> >
> >
> >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
> akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:
> akpm@linux-foundation.org>>>
> >
> >
> >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
> d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> >
> >              > This patch introduces mmap_vmcore().
> >              >
> >              > Don't permit writable nor executable mapping even with
> mprotect()
> >              > because this mmap() is aimed at reading crash dump memory.
> >              > Non-writable mapping is also requirement of
> remap_pfn_range() when
> >              > mapping linear pages on non-consecutive physical pages;
> see
> >              > is_cow_mapping().
> >              >
> >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
> and by
> >              > remap_vmalloc_range_pertial at the same time for a single
> >              > vma. do_munmap() can correctly clean partially remapped
> vma with two
> >              > functions in abnormal case. See zap_pte_range(),
> vm_normal_page() and
> >              > their comments for details.
> >              >
> >              > On x86-32 PAE kernels, mmap() supports at most 16TB
> memory only. This
> >              > limitation comes from the fact that the third argument of
> >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> unsigned long.
> >
> >             More reviewing and testing, please.
> >
> >
> >         Do you have git pull for both kernel and userland changes? I
> would like to do some more testing on my machines.
> >
> >         Maxim.
> >
> >
> >     Thanks! That's very helpful.
> >
> >     --
> >     Thanks.
> >     HATAYAMA, Daisuke
> >
> > Any update for this? Where can I checkout all sources?
>
> This series is now in Andrew Morton's -mm tree.
>
> Ok, and what about makedumpfile changes? Is it possible to fetch them from
somewhere?


> --
> Thanks.
> Zhang Yanfei
>



-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 3902 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-30 10:30             ` Maxim Uvarov
  2013-06-03  8:43                 ` Atsushi Kumagai
@ 2013-06-03  8:43                 ` Atsushi Kumagai
  0 siblings, 0 replies; 103+ messages in thread
From: Atsushi Kumagai @ 2013-06-03  8:43 UTC (permalink / raw)
  To: muvarov
  Cc: zhangyanfei, d.hatayama, akpm, riel, hughd, jingbai.ma, kexec,
	linux-kernel, lisa.mitchell, linux-mm, ebiederm, kosaki.motohiro,
	walken, cpw, vgoyal

Hello Maxim,

On Thu, 30 May 2013 14:30:01 +0400
Maxim Uvarov <muvarov@gmail.com> wrote:

> 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> 
> > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> > >
> > >
> > >
> > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:
> > d.hatayama@jp.fujitsu.com>>
> > >
> > >     (2013/05/24 18:02), Maxim Uvarov wrote:
> > >
> > >
> > >
> > >
> > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
> > akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:
> > akpm@linux-foundation.org>>>
> > >
> > >
> > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
> > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> > >
> > >              > This patch introduces mmap_vmcore().
> > >              >
> > >              > Don't permit writable nor executable mapping even with
> > mprotect()
> > >              > because this mmap() is aimed at reading crash dump memory.
> > >              > Non-writable mapping is also requirement of
> > remap_pfn_range() when
> > >              > mapping linear pages on non-consecutive physical pages;
> > see
> > >              > is_cow_mapping().
> > >              >
> > >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
> > and by
> > >              > remap_vmalloc_range_pertial at the same time for a single
> > >              > vma. do_munmap() can correctly clean partially remapped
> > vma with two
> > >              > functions in abnormal case. See zap_pte_range(),
> > vm_normal_page() and
> > >              > their comments for details.
> > >              >
> > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
> > memory only. This
> > >              > limitation comes from the fact that the third argument of
> > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> > unsigned long.
> > >
> > >             More reviewing and testing, please.
> > >
> > >
> > >         Do you have git pull for both kernel and userland changes? I
> > would like to do some more testing on my machines.
> > >
> > >         Maxim.
> > >
> > >
> > >     Thanks! That's very helpful.
> > >
> > >     --
> > >     Thanks.
> > >     HATAYAMA, Daisuke
> > >
> > > Any update for this? Where can I checkout all sources?
> >
> > This series is now in Andrew Morton's -mm tree.
> >
> > Ok, and what about makedumpfile changes? Is it possible to fetch them from
> somewhere?

You can fetch them from here, "mmap" branch is the change:

  git://git.code.sf.net/p/makedumpfile/code

And they will be merged into v1.5.4.


Thanks
Atsushi Kumagai

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-03  8:43                 ` Atsushi Kumagai
  0 siblings, 0 replies; 103+ messages in thread
From: Atsushi Kumagai @ 2013-06-03  8:43 UTC (permalink / raw)
  To: muvarov
  Cc: zhangyanfei, d.hatayama, akpm, riel, hughd, jingbai.ma, kexec,
	linux-kernel, lisa.mitchell, linux-mm, ebiederm, kosaki.motohiro,
	walken, cpw, vgoyal

Hello Maxim,

On Thu, 30 May 2013 14:30:01 +0400
Maxim Uvarov <muvarov@gmail.com> wrote:

> 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> 
> > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> > >
> > >
> > >
> > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:
> > d.hatayama@jp.fujitsu.com>>
> > >
> > >     (2013/05/24 18:02), Maxim Uvarov wrote:
> > >
> > >
> > >
> > >
> > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
> > akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:
> > akpm@linux-foundation.org>>>
> > >
> > >
> > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
> > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> > >
> > >              > This patch introduces mmap_vmcore().
> > >              >
> > >              > Don't permit writable nor executable mapping even with
> > mprotect()
> > >              > because this mmap() is aimed at reading crash dump memory.
> > >              > Non-writable mapping is also requirement of
> > remap_pfn_range() when
> > >              > mapping linear pages on non-consecutive physical pages;
> > see
> > >              > is_cow_mapping().
> > >              >
> > >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
> > and by
> > >              > remap_vmalloc_range_pertial at the same time for a single
> > >              > vma. do_munmap() can correctly clean partially remapped
> > vma with two
> > >              > functions in abnormal case. See zap_pte_range(),
> > vm_normal_page() and
> > >              > their comments for details.
> > >              >
> > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
> > memory only. This
> > >              > limitation comes from the fact that the third argument of
> > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> > unsigned long.
> > >
> > >             More reviewing and testing, please.
> > >
> > >
> > >         Do you have git pull for both kernel and userland changes? I
> > would like to do some more testing on my machines.
> > >
> > >         Maxim.
> > >
> > >
> > >     Thanks! That's very helpful.
> > >
> > >     --
> > >     Thanks.
> > >     HATAYAMA, Daisuke
> > >
> > > Any update for this? Where can I checkout all sources?
> >
> > This series is now in Andrew Morton's -mm tree.
> >
> > Ok, and what about makedumpfile changes? Is it possible to fetch them from
> somewhere?

You can fetch them from here, "mmap" branch is the change:

  git://git.code.sf.net/p/makedumpfile/code

And they will be merged into v1.5.4.


Thanks
Atsushi Kumagai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-03  8:43                 ` Atsushi Kumagai
  0 siblings, 0 replies; 103+ messages in thread
From: Atsushi Kumagai @ 2013-06-03  8:43 UTC (permalink / raw)
  To: muvarov
  Cc: riel, kexec, hughd, linux-kernel, lisa.mitchell, vgoyal,
	linux-mm, d.hatayama, zhangyanfei, ebiederm, kosaki.motohiro,
	akpm, walken, cpw, jingbai.ma

Hello Maxim,

On Thu, 30 May 2013 14:30:01 +0400
Maxim Uvarov <muvarov@gmail.com> wrote:

> 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> 
> > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> > >
> > >
> > >
> > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:
> > d.hatayama@jp.fujitsu.com>>
> > >
> > >     (2013/05/24 18:02), Maxim Uvarov wrote:
> > >
> > >
> > >
> > >
> > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
> > akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org <mailto:
> > akpm@linux-foundation.org>>>
> > >
> > >
> > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
> > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>> wrote:
> > >
> > >              > This patch introduces mmap_vmcore().
> > >              >
> > >              > Don't permit writable nor executable mapping even with
> > mprotect()
> > >              > because this mmap() is aimed at reading crash dump memory.
> > >              > Non-writable mapping is also requirement of
> > remap_pfn_range() when
> > >              > mapping linear pages on non-consecutive physical pages;
> > see
> > >              > is_cow_mapping().
> > >              >
> > >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
> > and by
> > >              > remap_vmalloc_range_pertial at the same time for a single
> > >              > vma. do_munmap() can correctly clean partially remapped
> > vma with two
> > >              > functions in abnormal case. See zap_pte_range(),
> > vm_normal_page() and
> > >              > their comments for details.
> > >              >
> > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
> > memory only. This
> > >              > limitation comes from the fact that the third argument of
> > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> > unsigned long.
> > >
> > >             More reviewing and testing, please.
> > >
> > >
> > >         Do you have git pull for both kernel and userland changes? I
> > would like to do some more testing on my machines.
> > >
> > >         Maxim.
> > >
> > >
> > >     Thanks! That's very helpful.
> > >
> > >     --
> > >     Thanks.
> > >     HATAYAMA, Daisuke
> > >
> > > Any update for this? Where can I checkout all sources?
> >
> > This series is now in Andrew Morton's -mm tree.
> >
> > Ok, and what about makedumpfile changes? Is it possible to fetch them from
> somewhere?

You can fetch them from here, "mmap" branch is the change:

  git://git.code.sf.net/p/makedumpfile/code

And they will be merged into v1.5.4.


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-03  8:43                 ` Atsushi Kumagai
  (?)
  (?)
@ 2013-06-04 15:34                 ` Maxim Uvarov
  2013-06-07  1:11                     ` Zhang Yanfei
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxim Uvarov @ 2013-06-04 15:34 UTC (permalink / raw)
  To: Atsushi Kumagai
  Cc: riel, kexec, hughd, linux-kernel, lisa.mitchell, Vivek Goyal,
	linux-mm, HATAYAMA Daisuke, Zhang Yanfei, Eric W. Biederman,
	kosaki.motohiro, Andrew Morton, walken, Cliff Wickman,
	jingbai.ma

[-- Attachment #1: Type: text/plain, Size: 3480 bytes --]

2013/6/3 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>

> Hello Maxim,
>
> On Thu, 30 May 2013 14:30:01 +0400
> Maxim Uvarov <muvarov@gmail.com> wrote:
>
> > 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> >
> > > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> > > >
> > > >
> > > >
> > > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:
> > > d.hatayama@jp.fujitsu.com>>
> > > >
> > > >     (2013/05/24 18:02), Maxim Uvarov wrote:
> > > >
> > > >
> > > >
> > > >
> > > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
> > > akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org
> <mailto:
> > > akpm@linux-foundation.org>>>
> > > >
> > > >
> > > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
> > > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>>
> wrote:
> > > >
> > > >              > This patch introduces mmap_vmcore().
> > > >              >
> > > >              > Don't permit writable nor executable mapping even with
> > > mprotect()
> > > >              > because this mmap() is aimed at reading crash dump
> memory.
> > > >              > Non-writable mapping is also requirement of
> > > remap_pfn_range() when
> > > >              > mapping linear pages on non-consecutive physical
> pages;
> > > see
> > > >              > is_cow_mapping().
> > > >              >
> > > >              > Set VM_MIXEDMAP flag to remap memory by
> remap_pfn_range
> > > and by
> > > >              > remap_vmalloc_range_pertial at the same time for a
> single
> > > >              > vma. do_munmap() can correctly clean partially
> remapped
> > > vma with two
> > > >              > functions in abnormal case. See zap_pte_range(),
> > > vm_normal_page() and
> > > >              > their comments for details.
> > > >              >
> > > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
> > > memory only. This
> > > >              > limitation comes from the fact that the third
> argument of
> > > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> > > unsigned long.
> > > >
> > > >             More reviewing and testing, please.
> > > >
> > > >
> > > >         Do you have git pull for both kernel and userland changes? I
> > > would like to do some more testing on my machines.
> > > >
> > > >         Maxim.
> > > >
> > > >
> > > >     Thanks! That's very helpful.
> > > >
> > > >     --
> > > >     Thanks.
> > > >     HATAYAMA, Daisuke
> > > >
> > > > Any update for this? Where can I checkout all sources?
> > >
> > > This series is now in Andrew Morton's -mm tree.
> > >
> > > Ok, and what about makedumpfile changes? Is it possible to fetch them
> from
> > somewhere?
>
> You can fetch them from here, "mmap" branch is the change:
>
>   git://git.code.sf.net/p/makedumpfile/code
>
> And they will be merged into v1.5.4.
>
>
thank you, got it. But still do not see kernel patches in akpm tree:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Should I look at different branch?

Maxim.



>
> Thanks
> Atsushi Kumagai
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>



-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 6257 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-05-23  5:25   ` HATAYAMA Daisuke
  (?)
@ 2013-06-06 21:31     ` Arnd Bergmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2013-06-06 21:31 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

On Thursday 23 May 2013 14:25:48 HATAYAMA Daisuke wrote:
> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Acked-by: Vivek Goyal <vgoyal@redhat.com>

I get build errors on 'make randconfig' from this, when building
NOMMU kernels on ARM. I suppose the new feature should be hidden
in #ifdef CONFIG_MMU.

	Arnd

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-06 21:31     ` Arnd Bergmann
  0 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2013-06-06 21:31 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

On Thursday 23 May 2013 14:25:48 HATAYAMA Daisuke wrote:
> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Acked-by: Vivek Goyal <vgoyal@redhat.com>

I get build errors on 'make randconfig' from this, when building
NOMMU kernels on ARM. I suppose the new feature should be hidden
in #ifdef CONFIG_MMU.

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-06 21:31     ` Arnd Bergmann
  0 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2013-06-06 21:31 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, akpm, walken, cpw, vgoyal

On Thursday 23 May 2013 14:25:48 HATAYAMA Daisuke wrote:
> This patch introduces mmap_vmcore().
> 
> Don't permit writable nor executable mapping even with mprotect()
> because this mmap() is aimed at reading crash dump memory.
> Non-writable mapping is also requirement of remap_pfn_range() when
> mapping linear pages on non-consecutive physical pages; see
> is_cow_mapping().
> 
> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
> remap_vmalloc_range_pertial at the same time for a single
> vma. do_munmap() can correctly clean partially remapped vma with two
> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
> their comments for details.
> 
> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
> limitation comes from the fact that the third argument of
> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Acked-by: Vivek Goyal <vgoyal@redhat.com>

I get build errors on 'make randconfig' from this, when building
NOMMU kernels on ARM. I suppose the new feature should be hidden
in #ifdef CONFIG_MMU.

	Arnd

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-06 21:31     ` Arnd Bergmann
  (?)
@ 2013-06-07  1:01       ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-07  1:01 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

(2013/06/07 6:31), Arnd Bergmann wrote:
> On Thursday 23 May 2013 14:25:48 HATAYAMA Daisuke wrote:
>> This patch introduces mmap_vmcore().
>>
>> Don't permit writable nor executable mapping even with mprotect()
>> because this mmap() is aimed at reading crash dump memory.
>> Non-writable mapping is also requirement of remap_pfn_range() when
>> mapping linear pages on non-consecutive physical pages; see
>> is_cow_mapping().
>>
>> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>> remap_vmalloc_range_pertial at the same time for a single
>> vma. do_munmap() can correctly clean partially remapped vma with two
>> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>> their comments for details.
>>
>> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>> limitation comes from the fact that the third argument of
>> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>>
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> Acked-by: Vivek Goyal <vgoyal@redhat.com>
>
> I get build errors on 'make randconfig' from this, when building
> NOMMU kernels on ARM. I suppose the new feature should be hidden
> in #ifdef CONFIG_MMU.
>
> 	Arnd
>

Thanks for trying the build and your report!

OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error. Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.

-- 
Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-07  1:01       ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-07  1:01 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

(2013/06/07 6:31), Arnd Bergmann wrote:
> On Thursday 23 May 2013 14:25:48 HATAYAMA Daisuke wrote:
>> This patch introduces mmap_vmcore().
>>
>> Don't permit writable nor executable mapping even with mprotect()
>> because this mmap() is aimed at reading crash dump memory.
>> Non-writable mapping is also requirement of remap_pfn_range() when
>> mapping linear pages on non-consecutive physical pages; see
>> is_cow_mapping().
>>
>> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>> remap_vmalloc_range_pertial at the same time for a single
>> vma. do_munmap() can correctly clean partially remapped vma with two
>> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>> their comments for details.
>>
>> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>> limitation comes from the fact that the third argument of
>> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>>
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> Acked-by: Vivek Goyal <vgoyal@redhat.com>
>
> I get build errors on 'make randconfig' from this, when building
> NOMMU kernels on ARM. I suppose the new feature should be hidden
> in #ifdef CONFIG_MMU.
>
> 	Arnd
>

Thanks for trying the build and your report!

OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error. Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-07  1:01       ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-07  1:01 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, akpm, walken, cpw, vgoyal

(2013/06/07 6:31), Arnd Bergmann wrote:
> On Thursday 23 May 2013 14:25:48 HATAYAMA Daisuke wrote:
>> This patch introduces mmap_vmcore().
>>
>> Don't permit writable nor executable mapping even with mprotect()
>> because this mmap() is aimed at reading crash dump memory.
>> Non-writable mapping is also requirement of remap_pfn_range() when
>> mapping linear pages on non-consecutive physical pages; see
>> is_cow_mapping().
>>
>> Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
>> remap_vmalloc_range_pertial at the same time for a single
>> vma. do_munmap() can correctly clean partially remapped vma with two
>> functions in abnormal case. See zap_pte_range(), vm_normal_page() and
>> their comments for details.
>>
>> On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
>> limitation comes from the fact that the third argument of
>> remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.
>>
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> Acked-by: Vivek Goyal <vgoyal@redhat.com>
>
> I get build errors on 'make randconfig' from this, when building
> NOMMU kernels on ARM. I suppose the new feature should be hidden
> in #ifdef CONFIG_MMU.
>
> 	Arnd
>

Thanks for trying the build and your report!

OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error. Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.

-- 
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-04 15:34                 ` Maxim Uvarov
  2013-06-07  1:11                     ` Zhang Yanfei
@ 2013-06-07  1:11                     ` Zhang Yanfei
  0 siblings, 0 replies; 103+ messages in thread
From: Zhang Yanfei @ 2013-06-07  1:11 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Atsushi Kumagai, riel, kexec, hughd, linux-kernel, lisa.mitchell,
	Vivek Goyal, linux-mm, HATAYAMA Daisuke, Eric W. Biederman,
	kosaki.motohiro, Andrew Morton, walken, Cliff Wickman,
	jingbai.ma

On 06/04/2013 11:34 PM, Maxim Uvarov wrote:
> 
> 
> 
> 2013/6/3 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp <mailto:kumagai-atsushi@mxc.nes.nec.co.jp>>
> 
>     Hello Maxim,
> 
>     On Thu, 30 May 2013 14:30:01 +0400
>     Maxim Uvarov <muvarov@gmail.com <mailto:muvarov@gmail.com>> wrote:
> 
>     > 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com <mailto:zhangyanfei@cn.fujitsu.com>>
>     >
>     > > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
>     > > >
>     > > >
>     > > >
>     > > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
>     > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>>
>     > > >
>     > > >     (2013/05/24 18:02), Maxim Uvarov wrote:
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org> <mailto:
>     > > akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> <mailto:akpm@linux-foundation. <mailto:akpm@linux-foundation.>__org <mailto:
>     > > akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>>>>
>     > > >
>     > > >
>     > > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
>     > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>> <mailto:
>     > > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>>> wrote:
>     > > >
>     > > >              > This patch introduces mmap_vmcore().
>     > > >              >
>     > > >              > Don't permit writable nor executable mapping even with
>     > > mprotect()
>     > > >              > because this mmap() is aimed at reading crash dump memory.
>     > > >              > Non-writable mapping is also requirement of
>     > > remap_pfn_range() when
>     > > >              > mapping linear pages on non-consecutive physical pages;
>     > > see
>     > > >              > is_cow_mapping().
>     > > >              >
>     > > >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
>     > > and by
>     > > >              > remap_vmalloc_range_pertial at the same time for a single
>     > > >              > vma. do_munmap() can correctly clean partially remapped
>     > > vma with two
>     > > >              > functions in abnormal case. See zap_pte_range(),
>     > > vm_normal_page() and
>     > > >              > their comments for details.
>     > > >              >
>     > > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
>     > > memory only. This
>     > > >              > limitation comes from the fact that the third argument of
>     > > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
>     > > unsigned long.
>     > > >
>     > > >             More reviewing and testing, please.
>     > > >
>     > > >
>     > > >         Do you have git pull for both kernel and userland changes? I
>     > > would like to do some more testing on my machines.
>     > > >
>     > > >         Maxim.
>     > > >
>     > > >
>     > > >     Thanks! That's very helpful.
>     > > >
>     > > >     --
>     > > >     Thanks.
>     > > >     HATAYAMA, Daisuke
>     > > >
>     > > > Any update for this? Where can I checkout all sources?
>     > >
>     > > This series is now in Andrew Morton's -mm tree.
>     > >
>     > > Ok, and what about makedumpfile changes? Is it possible to fetch them from
>     > somewhere?
> 
>     You can fetch them from here, "mmap" branch is the change:
> 
>       git://git.code.sf.net/p/makedumpfile/code <http://git.code.sf.net/p/makedumpfile/code>
> 
>     And they will be merged into v1.5.4.
> 
> 
> thank you, got it. But still do not see kernel patches in akpm tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 
> 
> Should I look at different branch?

Now it is merged into the next tree you list above. See the commit:

author	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>	2013-06-06 00:40:01 (GMT)
committer	Stephen Rothwell <sfr@canb.auug.org.au>	2013-06-06 05:50:03 (GMT)
commit	4be2c06c30e4c3994d86e0be24ff1af12d2c71d5 (patch)
tree	d7fb8c64c628600e8ba24481927f087fc11c2986
parent	99f80952861807e521ed30c22925f009f543a5ec (diff)

-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-07  1:11                     ` Zhang Yanfei
  0 siblings, 0 replies; 103+ messages in thread
From: Zhang Yanfei @ 2013-06-07  1:11 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Atsushi Kumagai, riel, kexec, hughd, linux-kernel, lisa.mitchell,
	Vivek Goyal, linux-mm, HATAYAMA Daisuke, Eric W. Biederman,
	kosaki.motohiro, Andrew Morton, walken, Cliff Wickman,
	jingbai.ma

On 06/04/2013 11:34 PM, Maxim Uvarov wrote:
> 
> 
> 
> 2013/6/3 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp <mailto:kumagai-atsushi@mxc.nes.nec.co.jp>>
> 
>     Hello Maxim,
> 
>     On Thu, 30 May 2013 14:30:01 +0400
>     Maxim Uvarov <muvarov@gmail.com <mailto:muvarov@gmail.com>> wrote:
> 
>     > 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com <mailto:zhangyanfei@cn.fujitsu.com>>
>     >
>     > > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
>     > > >
>     > > >
>     > > >
>     > > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
>     > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>>
>     > > >
>     > > >     (2013/05/24 18:02), Maxim Uvarov wrote:
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org> <mailto:
>     > > akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> <mailto:akpm@linux-foundation. <mailto:akpm@linux-foundation.>__org <mailto:
>     > > akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>>>>
>     > > >
>     > > >
>     > > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
>     > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>> <mailto:
>     > > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>>> wrote:
>     > > >
>     > > >              > This patch introduces mmap_vmcore().
>     > > >              >
>     > > >              > Don't permit writable nor executable mapping even with
>     > > mprotect()
>     > > >              > because this mmap() is aimed at reading crash dump memory.
>     > > >              > Non-writable mapping is also requirement of
>     > > remap_pfn_range() when
>     > > >              > mapping linear pages on non-consecutive physical pages;
>     > > see
>     > > >              > is_cow_mapping().
>     > > >              >
>     > > >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
>     > > and by
>     > > >              > remap_vmalloc_range_pertial at the same time for a single
>     > > >              > vma. do_munmap() can correctly clean partially remapped
>     > > vma with two
>     > > >              > functions in abnormal case. See zap_pte_range(),
>     > > vm_normal_page() and
>     > > >              > their comments for details.
>     > > >              >
>     > > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
>     > > memory only. This
>     > > >              > limitation comes from the fact that the third argument of
>     > > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
>     > > unsigned long.
>     > > >
>     > > >             More reviewing and testing, please.
>     > > >
>     > > >
>     > > >         Do you have git pull for both kernel and userland changes? I
>     > > would like to do some more testing on my machines.
>     > > >
>     > > >         Maxim.
>     > > >
>     > > >
>     > > >     Thanks! That's very helpful.
>     > > >
>     > > >     --
>     > > >     Thanks.
>     > > >     HATAYAMA, Daisuke
>     > > >
>     > > > Any update for this? Where can I checkout all sources?
>     > >
>     > > This series is now in Andrew Morton's -mm tree.
>     > >
>     > > Ok, and what about makedumpfile changes? Is it possible to fetch them from
>     > somewhere?
> 
>     You can fetch them from here, "mmap" branch is the change:
> 
>       git://git.code.sf.net/p/makedumpfile/code <http://git.code.sf.net/p/makedumpfile/code>
> 
>     And they will be merged into v1.5.4.
> 
> 
> thank you, got it. But still do not see kernel patches in akpm tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 
> 
> Should I look at different branch?

Now it is merged into the next tree you list above. See the commit:

author	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>	2013-06-06 00:40:01 (GMT)
committer	Stephen Rothwell <sfr@canb.auug.org.au>	2013-06-06 05:50:03 (GMT)
commit	4be2c06c30e4c3994d86e0be24ff1af12d2c71d5 (patch)
tree	d7fb8c64c628600e8ba24481927f087fc11c2986
parent	99f80952861807e521ed30c22925f009f543a5ec (diff)

-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-07  1:11                     ` Zhang Yanfei
  0 siblings, 0 replies; 103+ messages in thread
From: Zhang Yanfei @ 2013-06-07  1:11 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, HATAYAMA Daisuke, Atsushi Kumagai, Eric W. Biederman,
	kosaki.motohiro, Andrew Morton, walken, Cliff Wickman,
	Vivek Goyal

On 06/04/2013 11:34 PM, Maxim Uvarov wrote:
> 
> 
> 
> 2013/6/3 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp <mailto:kumagai-atsushi@mxc.nes.nec.co.jp>>
> 
>     Hello Maxim,
> 
>     On Thu, 30 May 2013 14:30:01 +0400
>     Maxim Uvarov <muvarov@gmail.com <mailto:muvarov@gmail.com>> wrote:
> 
>     > 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com <mailto:zhangyanfei@cn.fujitsu.com>>
>     >
>     > > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
>     > > >
>     > > >
>     > > >
>     > > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
>     > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>>
>     > > >
>     > > >     (2013/05/24 18:02), Maxim Uvarov wrote:
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org> <mailto:
>     > > akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> <mailto:akpm@linux-foundation. <mailto:akpm@linux-foundation.>__org <mailto:
>     > > akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>>>>
>     > > >
>     > > >
>     > > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
>     > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>> <mailto:
>     > > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com>>>> wrote:
>     > > >
>     > > >              > This patch introduces mmap_vmcore().
>     > > >              >
>     > > >              > Don't permit writable nor executable mapping even with
>     > > mprotect()
>     > > >              > because this mmap() is aimed at reading crash dump memory.
>     > > >              > Non-writable mapping is also requirement of
>     > > remap_pfn_range() when
>     > > >              > mapping linear pages on non-consecutive physical pages;
>     > > see
>     > > >              > is_cow_mapping().
>     > > >              >
>     > > >              > Set VM_MIXEDMAP flag to remap memory by remap_pfn_range
>     > > and by
>     > > >              > remap_vmalloc_range_pertial at the same time for a single
>     > > >              > vma. do_munmap() can correctly clean partially remapped
>     > > vma with two
>     > > >              > functions in abnormal case. See zap_pte_range(),
>     > > vm_normal_page() and
>     > > >              > their comments for details.
>     > > >              >
>     > > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
>     > > memory only. This
>     > > >              > limitation comes from the fact that the third argument of
>     > > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
>     > > unsigned long.
>     > > >
>     > > >             More reviewing and testing, please.
>     > > >
>     > > >
>     > > >         Do you have git pull for both kernel and userland changes? I
>     > > would like to do some more testing on my machines.
>     > > >
>     > > >         Maxim.
>     > > >
>     > > >
>     > > >     Thanks! That's very helpful.
>     > > >
>     > > >     --
>     > > >     Thanks.
>     > > >     HATAYAMA, Daisuke
>     > > >
>     > > > Any update for this? Where can I checkout all sources?
>     > >
>     > > This series is now in Andrew Morton's -mm tree.
>     > >
>     > > Ok, and what about makedumpfile changes? Is it possible to fetch them from
>     > somewhere?
> 
>     You can fetch them from here, "mmap" branch is the change:
> 
>       git://git.code.sf.net/p/makedumpfile/code <http://git.code.sf.net/p/makedumpfile/code>
> 
>     And they will be merged into v1.5.4.
> 
> 
> thank you, got it. But still do not see kernel patches in akpm tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 
> 
> Should I look at different branch?

Now it is merged into the next tree you list above. See the commit:

author	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>	2013-06-06 00:40:01 (GMT)
committer	Stephen Rothwell <sfr@canb.auug.org.au>	2013-06-06 05:50:03 (GMT)
commit	4be2c06c30e4c3994d86e0be24ff1af12d2c71d5 (patch)
tree	d7fb8c64c628600e8ba24481927f087fc11c2986
parent	99f80952861807e521ed30c22925f009f543a5ec (diff)

-- 
Thanks.
Zhang Yanfei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-07  1:01       ` HATAYAMA Daisuke
  (?)
@ 2013-06-07 18:34         ` Arnd Bergmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2013-06-07 18:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

On Friday 07 June 2013, HATAYAMA Daisuke wrote:
> Thanks for trying the build and your report!
> 
> OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error. 
> Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.

What I get is a link-time error:

fs/built-in.o: In function `mmap_vmcore':
:(.text+0x4bc18): undefined reference to `remap_vmalloc_range_partial'
fs/built-in.o: In function `merge_note_headers_elf32.constprop.4':
:(.init.text+0x142c): undefined reference to `find_vm_area'

and I used this patch to temporarily work around the problem, effectively disabling all
of /proc/vmcore on non-MMU kernels.

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 37e4f8d..9a078ef 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -55,7 +55,7 @@ static inline int is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-	return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+	return IS_ENABLED(CONFIG_MMU) && is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,


For testing, I used ARM at91x40_defconfig and manually turned on VMCORE support in
menuconfig, but it happened before using "randconfig". On most distros you can
these days install an arm cross compiler using yum or apt-get and build
the kernel yourself with 'make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-'

	Arnd

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-07 18:34         ` Arnd Bergmann
  0 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2013-06-07 18:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

On Friday 07 June 2013, HATAYAMA Daisuke wrote:
> Thanks for trying the build and your report!
> 
> OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error. 
> Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.

What I get is a link-time error:

fs/built-in.o: In function `mmap_vmcore':
:(.text+0x4bc18): undefined reference to `remap_vmalloc_range_partial'
fs/built-in.o: In function `merge_note_headers_elf32.constprop.4':
:(.init.text+0x142c): undefined reference to `find_vm_area'

and I used this patch to temporarily work around the problem, effectively disabling all
of /proc/vmcore on non-MMU kernels.

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 37e4f8d..9a078ef 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -55,7 +55,7 @@ static inline int is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-	return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+	return IS_ENABLED(CONFIG_MMU) && is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,


For testing, I used ARM at91x40_defconfig and manually turned on VMCORE support in
menuconfig, but it happened before using "randconfig". On most distros you can
these days install an arm cross compiler using yum or apt-get and build
the kernel yourself with 'make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-'

	Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-07 18:34         ` Arnd Bergmann
  0 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2013-06-07 18:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, akpm, walken, cpw, vgoyal

On Friday 07 June 2013, HATAYAMA Daisuke wrote:
> Thanks for trying the build and your report!
> 
> OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error. 
> Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.

What I get is a link-time error:

fs/built-in.o: In function `mmap_vmcore':
:(.text+0x4bc18): undefined reference to `remap_vmalloc_range_partial'
fs/built-in.o: In function `merge_note_headers_elf32.constprop.4':
:(.init.text+0x142c): undefined reference to `find_vm_area'

and I used this patch to temporarily work around the problem, effectively disabling all
of /proc/vmcore on non-MMU kernels.

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 37e4f8d..9a078ef 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -55,7 +55,7 @@ static inline int is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-	return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+	return IS_ENABLED(CONFIG_MMU) && is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,


For testing, I used ARM at91x40_defconfig and manually turned on VMCORE support in
menuconfig, but it happened before using "randconfig". On most distros you can
these days install an arm cross compiler using yum or apt-get and build
the kernel yourself with 'make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-'

	Arnd

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-07 18:34         ` Arnd Bergmann
  (?)
@ 2013-06-08 10:42           ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-08 10:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

2013/6/8 Arnd Bergmann <arnd@arndb.de>:
> On Friday 07 June 2013, HATAYAMA Daisuke wrote:
>> Thanks for trying the build and your report!
>>
>> OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error.
>> Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.
>
> What I get is a link-time error:
>
> fs/built-in.o: In function `mmap_vmcore':
> :(.text+0x4bc18): undefined reference to `remap_vmalloc_range_partial'
> fs/built-in.o: In function `merge_note_headers_elf32.constprop.4':
> :(.init.text+0x142c): undefined reference to `find_vm_area'
>
> and I used this patch to temporarily work around the problem, effectively disabling all
> of /proc/vmcore on non-MMU kernels.
>
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 37e4f8d..9a078ef 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -55,7 +55,7 @@ static inline int is_kdump_kernel(void)
>
>  static inline int is_vmcore_usable(void)
>  {
> -       return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
> +       return IS_ENABLED(CONFIG_MMU) && is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
>  }
>
>  /* vmcore_unusable() marks the vmcore as unusable,
>
>
> For testing, I used ARM at91x40_defconfig and manually turned on VMCORE support in
> menuconfig, but it happened before using "randconfig". On most distros you can
> these days install an arm cross compiler using yum or apt-get and build
> the kernel yourself with 'make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-'
>

Thanks for the detailed explanation. To be honest, I had totally
forgotten existence of cross-compiler and need of build check on
multiple architectures before posting patch set... I tried installing
cross compiler for arm and I've successfully got arm compiler using
yum. I feel it much easier than I tried building them on console some
years.

I successfully reproduce the build error you see and I found I
overlooked no MMU system. This build error is caused by my mmap patch
set I made that maps physically non-contiguous objects into virtually
contiguous user-space as ELF layout. For this, MMU is essential.

I'll post a patch to disable mmap on /proc/vmcore on no MMU system
next week. I cannot use  compony email address now.

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-08 10:42           ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-08 10:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: vgoyal, ebiederm, akpm, cpw, kumagai-atsushi, lisa.mitchell,
	kexec, linux-kernel, zhangyanfei, jingbai.ma, linux-mm, riel,
	walken, hughd, kosaki.motohiro

2013/6/8 Arnd Bergmann <arnd@arndb.de>:
> On Friday 07 June 2013, HATAYAMA Daisuke wrote:
>> Thanks for trying the build and your report!
>>
>> OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error.
>> Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.
>
> What I get is a link-time error:
>
> fs/built-in.o: In function `mmap_vmcore':
> :(.text+0x4bc18): undefined reference to `remap_vmalloc_range_partial'
> fs/built-in.o: In function `merge_note_headers_elf32.constprop.4':
> :(.init.text+0x142c): undefined reference to `find_vm_area'
>
> and I used this patch to temporarily work around the problem, effectively disabling all
> of /proc/vmcore on non-MMU kernels.
>
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 37e4f8d..9a078ef 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -55,7 +55,7 @@ static inline int is_kdump_kernel(void)
>
>  static inline int is_vmcore_usable(void)
>  {
> -       return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
> +       return IS_ENABLED(CONFIG_MMU) && is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
>  }
>
>  /* vmcore_unusable() marks the vmcore as unusable,
>
>
> For testing, I used ARM at91x40_defconfig and manually turned on VMCORE support in
> menuconfig, but it happened before using "randconfig". On most distros you can
> these days install an arm cross compiler using yum or apt-get and build
> the kernel yourself with 'make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-'
>

Thanks for the detailed explanation. To be honest, I had totally
forgotten existence of cross-compiler and need of build check on
multiple architectures before posting patch set... I tried installing
cross compiler for arm and I've successfully got arm compiler using
yum. I feel it much easier than I tried building them on console some
years.

I successfully reproduce the build error you see and I found I
overlooked no MMU system. This build error is caused by my mmap patch
set I made that maps physically non-contiguous objects into virtually
contiguous user-space as ELF layout. For this, MMU is essential.

I'll post a patch to disable mmap on /proc/vmcore on no MMU system
next week. I cannot use  compony email address now.

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-08 10:42           ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-08 10:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, kumagai-atsushi, ebiederm, kosaki.motohiro,
	zhangyanfei, akpm, walken, cpw, vgoyal

2013/6/8 Arnd Bergmann <arnd@arndb.de>:
> On Friday 07 June 2013, HATAYAMA Daisuke wrote:
>> Thanks for trying the build and your report!
>>
>> OTOH, I don't have no-MMU architectures; x86 box only. I cannot reproduce this build error.
>> Could you give me your build log? I want to use it to detect what part depends on CONFIG_MMU.
>
> What I get is a link-time error:
>
> fs/built-in.o: In function `mmap_vmcore':
> :(.text+0x4bc18): undefined reference to `remap_vmalloc_range_partial'
> fs/built-in.o: In function `merge_note_headers_elf32.constprop.4':
> :(.init.text+0x142c): undefined reference to `find_vm_area'
>
> and I used this patch to temporarily work around the problem, effectively disabling all
> of /proc/vmcore on non-MMU kernels.
>
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index 37e4f8d..9a078ef 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -55,7 +55,7 @@ static inline int is_kdump_kernel(void)
>
>  static inline int is_vmcore_usable(void)
>  {
> -       return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
> +       return IS_ENABLED(CONFIG_MMU) && is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
>  }
>
>  /* vmcore_unusable() marks the vmcore as unusable,
>
>
> For testing, I used ARM at91x40_defconfig and manually turned on VMCORE support in
> menuconfig, but it happened before using "randconfig". On most distros you can
> these days install an arm cross compiler using yum or apt-get and build
> the kernel yourself with 'make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi-'
>

Thanks for the detailed explanation. To be honest, I had totally
forgotten existence of cross-compiler and need of build check on
multiple architectures before posting patch set... I tried installing
cross compiler for arm and I've successfully got arm compiler using
yum. I feel it much easier than I tried building them on console some
years.

I successfully reproduce the build error you see and I found I
overlooked no MMU system. This build error is caused by my mmap patch
set I made that maps physically non-contiguous objects into virtually
contiguous user-space as ELF layout. For this, MMU is essential.

I'll post a patch to disable mmap on /proc/vmcore on no MMU system
next week. I cannot use  compony email address now.

Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-03  8:43                 ` Atsushi Kumagai
                                   ` (2 preceding siblings ...)
  (?)
@ 2013-06-28 16:40                 ` Maxim Uvarov
  2013-06-30 23:53                     ` HATAYAMA Daisuke
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxim Uvarov @ 2013-06-28 16:40 UTC (permalink / raw)
  To: Atsushi Kumagai
  Cc: riel, kexec, hughd, linux-kernel, lisa.mitchell, vgoyal,
	linux-mm, d.hatayama, zhangyanfei, ebiederm, kosaki.motohiro,
	akpm, walken, cpw, jingbai.ma

[-- Attachment #1: Type: text/plain, Size: 3319 bytes --]

Did test on 1TB machine. Total vmcore capture and save took 143 minutes
while vmcore size increased from 9Gb to 59Gb.

Will do some debug for that.

Maxim.

2013/6/3 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>

> Hello Maxim,
>
> On Thu, 30 May 2013 14:30:01 +0400
> Maxim Uvarov <muvarov@gmail.com> wrote:
>
> > 2013/5/30 Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> >
> > > On 05/30/2013 05:14 PM, Maxim Uvarov wrote:
> > > >
> > > >
> > > >
> > > > 2013/5/27 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com <mailto:
> > > d.hatayama@jp.fujitsu.com>>
> > > >
> > > >     (2013/05/24 18:02), Maxim Uvarov wrote:
> > > >
> > > >
> > > >
> > > >
> > > >         2013/5/24 Andrew Morton <akpm@linux-foundation.org <mailto:
> > > akpm@linux-foundation.org> <mailto:akpm@linux-foundation.__org
> <mailto:
> > > akpm@linux-foundation.org>>>
> > > >
> > > >
> > > >             On Thu, 23 May 2013 14:25:48 +0900 HATAYAMA Daisuke <
> > > d.hatayama@jp.fujitsu.com <mailto:d.hatayama@jp.fujitsu.com> <mailto:
> > > d.hatayama@jp.fujitsu.__com <mailto:d.hatayama@jp.fujitsu.com>>>
> wrote:
> > > >
> > > >              > This patch introduces mmap_vmcore().
> > > >              >
> > > >              > Don't permit writable nor executable mapping even with
> > > mprotect()
> > > >              > because this mmap() is aimed at reading crash dump
> memory.
> > > >              > Non-writable mapping is also requirement of
> > > remap_pfn_range() when
> > > >              > mapping linear pages on non-consecutive physical
> pages;
> > > see
> > > >              > is_cow_mapping().
> > > >              >
> > > >              > Set VM_MIXEDMAP flag to remap memory by
> remap_pfn_range
> > > and by
> > > >              > remap_vmalloc_range_pertial at the same time for a
> single
> > > >              > vma. do_munmap() can correctly clean partially
> remapped
> > > vma with two
> > > >              > functions in abnormal case. See zap_pte_range(),
> > > vm_normal_page() and
> > > >              > their comments for details.
> > > >              >
> > > >              > On x86-32 PAE kernels, mmap() supports at most 16TB
> > > memory only. This
> > > >              > limitation comes from the fact that the third
> argument of
> > > >              > remap_pfn_range(), pfn, is of 32-bit length on x86-32:
> > > unsigned long.
> > > >
> > > >             More reviewing and testing, please.
> > > >
> > > >
> > > >         Do you have git pull for both kernel and userland changes? I
> > > would like to do some more testing on my machines.
> > > >
> > > >         Maxim.
> > > >
> > > >
> > > >     Thanks! That's very helpful.
> > > >
> > > >     --
> > > >     Thanks.
> > > >     HATAYAMA, Daisuke
> > > >
> > > > Any update for this? Where can I checkout all sources?
> > >
> > > This series is now in Andrew Morton's -mm tree.
> > >
> > > Ok, and what about makedumpfile changes? Is it possible to fetch them
> from
> > somewhere?
>
> You can fetch them from here, "mmap" branch is the change:
>
>   git://git.code.sf.net/p/makedumpfile/code
>
> And they will be merged into v1.5.4.
>
>
> Thanks
> Atsushi Kumagai
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>



-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 5418 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-28 16:40                 ` Maxim Uvarov
  2013-06-30 23:53                     ` HATAYAMA Daisuke
@ 2013-06-30 23:53                     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-30 23:53 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Atsushi Kumagai, riel, kexec, hughd, linux-kernel, lisa.mitchell,
	vgoyal, linux-mm, zhangyanfei, ebiederm, kosaki.motohiro, akpm,
	walken, cpw, jingbai.ma

(2013/06/29 1:40), Maxim Uvarov wrote:
> Did test on 1TB machine. Total vmcore capture and save took 143 minutes while vmcore size increased from 9Gb to 59Gb.
>
> Will do some debug for that.
>
> Maxim.

Please show me your kdump configuration file and tell me what you did in the test and how you confirmed the result.

-- 
Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-30 23:53                     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-30 23:53 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: Atsushi Kumagai, riel, kexec, hughd, linux-kernel, lisa.mitchell,
	vgoyal, linux-mm, zhangyanfei, ebiederm, kosaki.motohiro, akpm,
	walken, cpw, jingbai.ma

(2013/06/29 1:40), Maxim Uvarov wrote:
> Did test on 1TB machine. Total vmcore capture and save took 143 minutes while vmcore size increased from 9Gb to 59Gb.
>
> Will do some debug for that.
>
> Maxim.

Please show me your kdump configuration file and tell me what you did in the test and how you confirmed the result.

-- 
Thanks.
HATAYAMA, Daisuke

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-06-30 23:53                     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 103+ messages in thread
From: HATAYAMA Daisuke @ 2013-06-30 23:53 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, Atsushi Kumagai, ebiederm, kosaki.motohiro,
	zhangyanfei, akpm, walken, cpw, vgoyal

(2013/06/29 1:40), Maxim Uvarov wrote:
> Did test on 1TB machine. Total vmcore capture and save took 143 minutes while vmcore size increased from 9Gb to 59Gb.
>
> Will do some debug for that.
>
> Maxim.

Please show me your kdump configuration file and tell me what you did in the test and how you confirmed the result.

-- 
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-06-30 23:53                     ` HATAYAMA Daisuke
  (?)
  (?)
@ 2013-07-01 14:34                     ` Maxim Uvarov
  2013-07-01 19:53                         ` Andrew Morton
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxim Uvarov @ 2013-07-01 14:34 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: Atsushi Kumagai, riel, kexec, hughd, linux-kernel, lisa.mitchell,
	vgoyal, linux-mm, zhangyanfei, ebiederm, kosaki.motohiro, akpm,
	walken, cpw, jingbai.ma

[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]

2013/7/1 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>

> (2013/06/29 1:40), Maxim Uvarov wrote:
>
>> Did test on 1TB machine. Total vmcore capture and save took 143 minutes
>> while vmcore size increased from 9Gb to 59Gb.
>>
>> Will do some debug for that.
>>
>> Maxim.
>>
>
> Please show me your kdump configuration file and tell me what you did in
> the test and how you confirmed the result.
>
>
Hello Hatayama,

I re-run tests in dev env. I took your latest kernel patchset from
patchwork for vmcore + devel branch of makedumpfile + fix to open and write
to /dev/null. Run this test on 1Tb memory machine with memory used by some
user space processes. crashkernel=384M.

Please see my results for makedumpfile process work:
[gzip compression]
-c -d31 /dev/null
real 37.8 m
user 29.51 m
sys 7.12 m

[no compression]
-d31 /dev/null
real 27 m
user 23 m
sys   4 m

[no compression, disable cyclic mode]
-d31 --non-cyclic /dev/null
real 26.25 m
user 23 m
sys 3.13 m

[gzip compression]
-c -d31 /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.75   38.840351         110    352717           mmap
 44.55   31.607620          90    352716         1 munmap
  0.70    0.497668           0  25497667           brk
  0.00    0.000356           0    111920           write
  0.00    0.000280           0    111904           lseek
  0.00    0.000025           4         7           open
  0.00    0.000000           0       473           read
  0.00    0.000000           0         7           close
  0.00    0.000000           0         3           fstat
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         2           unlink
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00   70.946300              26427420         1 total


I used 2.6.39 kernel + your patches due to mine machine successfully work
with it. I think that kernel version is not sufficient here due to
/proc/vmcore is  very isolated.

Is that the same numbers which you have?

Interesting is that makedumpfile almost all time works in user space. And
in case without compression  and without disk I/O process time is not
significantly reduced. What is the bottleneck in 'copy dump' phase?

Thank you,
Maxim.




> --
> Thanks.
> HATAYAMA, Daisuke
>
>


-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 3462 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-07-01 14:34                     ` Maxim Uvarov
  2013-07-01 19:53                         ` Andrew Morton
@ 2013-07-01 19:53                         ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-07-01 19:53 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: HATAYAMA Daisuke, Atsushi Kumagai, riel, kexec, hughd,
	linux-kernel, lisa.mitchell, vgoyal, linux-mm, zhangyanfei,
	ebiederm, kosaki.motohiro, walken, cpw, jingbai.ma

On Mon, 1 Jul 2013 18:34:43 +0400 Maxim Uvarov <muvarov@gmail.com> wrote:

> 2013/7/1 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> 
> > (2013/06/29 1:40), Maxim Uvarov wrote:
> >
> >> Did test on 1TB machine. Total vmcore capture and save took 143 minutes
> >> while vmcore size increased from 9Gb to 59Gb.
> >>
> >> Will do some debug for that.
> >>
> >> Maxim.
> >>
> >
> > Please show me your kdump configuration file and tell me what you did in
> > the test and how you confirmed the result.
> >
> >
> Hello Hatayama,
> 
> I re-run tests in dev env. I took your latest kernel patchset from
> patchwork for vmcore + devel branch of makedumpfile + fix to open and write
> to /dev/null. Run this test on 1Tb memory machine with memory used by some
> user space processes. crashkernel=384M.
> 
> Please see my results for makedumpfile process work:
> [gzip compression]
> -c -d31 /dev/null
> real 37.8 m
> user 29.51 m
> sys 7.12 m
> 
> [no compression]
> -d31 /dev/null
> real 27 m
> user 23 m
> sys   4 m
> 
> [no compression, disable cyclic mode]
> -d31 --non-cyclic /dev/null
> real 26.25 m
> user 23 m
> sys 3.13 m
> 
> [gzip compression]
> -c -d31 /dev/null
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  54.75   38.840351         110    352717           mmap
>  44.55   31.607620          90    352716         1 munmap
>   0.70    0.497668           0  25497667           brk
>   0.00    0.000356           0    111920           write
>   0.00    0.000280           0    111904           lseek
>   0.00    0.000025           4         7           open
>   0.00    0.000000           0       473           read
>   0.00    0.000000           0         7           close
>   0.00    0.000000           0         3           fstat
>   0.00    0.000000           0         1           getpid
>   0.00    0.000000           0         1           execve
>   0.00    0.000000           0         1           uname
>   0.00    0.000000           0         2           unlink
>   0.00    0.000000           0         1           arch_prctl
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   70.946300              26427420         1 total
> 

I have no point of comparison here.  Is this performance good, or is
the mmap-based approach still a lot more expensive?



^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-07-01 19:53                         ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-07-01 19:53 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: HATAYAMA Daisuke, Atsushi Kumagai, riel, kexec, hughd,
	linux-kernel, lisa.mitchell, vgoyal, linux-mm, zhangyanfei,
	ebiederm, kosaki.motohiro, walken, cpw, jingbai.ma

On Mon, 1 Jul 2013 18:34:43 +0400 Maxim Uvarov <muvarov@gmail.com> wrote:

> 2013/7/1 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> 
> > (2013/06/29 1:40), Maxim Uvarov wrote:
> >
> >> Did test on 1TB machine. Total vmcore capture and save took 143 minutes
> >> while vmcore size increased from 9Gb to 59Gb.
> >>
> >> Will do some debug for that.
> >>
> >> Maxim.
> >>
> >
> > Please show me your kdump configuration file and tell me what you did in
> > the test and how you confirmed the result.
> >
> >
> Hello Hatayama,
> 
> I re-run tests in dev env. I took your latest kernel patchset from
> patchwork for vmcore + devel branch of makedumpfile + fix to open and write
> to /dev/null. Run this test on 1Tb memory machine with memory used by some
> user space processes. crashkernel=384M.
> 
> Please see my results for makedumpfile process work:
> [gzip compression]
> -c -d31 /dev/null
> real 37.8 m
> user 29.51 m
> sys 7.12 m
> 
> [no compression]
> -d31 /dev/null
> real 27 m
> user 23 m
> sys   4 m
> 
> [no compression, disable cyclic mode]
> -d31 --non-cyclic /dev/null
> real 26.25 m
> user 23 m
> sys 3.13 m
> 
> [gzip compression]
> -c -d31 /dev/null
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  54.75   38.840351         110    352717           mmap
>  44.55   31.607620          90    352716         1 munmap
>   0.70    0.497668           0  25497667           brk
>   0.00    0.000356           0    111920           write
>   0.00    0.000280           0    111904           lseek
>   0.00    0.000025           4         7           open
>   0.00    0.000000           0       473           read
>   0.00    0.000000           0         7           close
>   0.00    0.000000           0         3           fstat
>   0.00    0.000000           0         1           getpid
>   0.00    0.000000           0         1           execve
>   0.00    0.000000           0         1           uname
>   0.00    0.000000           0         2           unlink
>   0.00    0.000000           0         1           arch_prctl
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   70.946300              26427420         1 total
> 

I have no point of comparison here.  Is this performance good, or is
the mmap-based approach still a lot more expensive?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
@ 2013-07-01 19:53                         ` Andrew Morton
  0 siblings, 0 replies; 103+ messages in thread
From: Andrew Morton @ 2013-07-01 19:53 UTC (permalink / raw)
  To: Maxim Uvarov
  Cc: riel, hughd, jingbai.ma, kexec, linux-kernel, lisa.mitchell,
	linux-mm, HATAYAMA Daisuke, Atsushi Kumagai, ebiederm,
	kosaki.motohiro, zhangyanfei, walken, cpw, vgoyal

On Mon, 1 Jul 2013 18:34:43 +0400 Maxim Uvarov <muvarov@gmail.com> wrote:

> 2013/7/1 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> 
> > (2013/06/29 1:40), Maxim Uvarov wrote:
> >
> >> Did test on 1TB machine. Total vmcore capture and save took 143 minutes
> >> while vmcore size increased from 9Gb to 59Gb.
> >>
> >> Will do some debug for that.
> >>
> >> Maxim.
> >>
> >
> > Please show me your kdump configuration file and tell me what you did in
> > the test and how you confirmed the result.
> >
> >
> Hello Hatayama,
> 
> I re-run tests in dev env. I took your latest kernel patchset from
> patchwork for vmcore + devel branch of makedumpfile + fix to open and write
> to /dev/null. Run this test on 1Tb memory machine with memory used by some
> user space processes. crashkernel=384M.
> 
> Please see my results for makedumpfile process work:
> [gzip compression]
> -c -d31 /dev/null
> real 37.8 m
> user 29.51 m
> sys 7.12 m
> 
> [no compression]
> -d31 /dev/null
> real 27 m
> user 23 m
> sys   4 m
> 
> [no compression, disable cyclic mode]
> -d31 --non-cyclic /dev/null
> real 26.25 m
> user 23 m
> sys 3.13 m
> 
> [gzip compression]
> -c -d31 /dev/null
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  54.75   38.840351         110    352717           mmap
>  44.55   31.607620          90    352716         1 munmap
>   0.70    0.497668           0  25497667           brk
>   0.00    0.000356           0    111920           write
>   0.00    0.000280           0    111904           lseek
>   0.00    0.000025           4         7           open
>   0.00    0.000000           0       473           read
>   0.00    0.000000           0         7           close
>   0.00    0.000000           0         3           fstat
>   0.00    0.000000           0         1           getpid
>   0.00    0.000000           0         1           execve
>   0.00    0.000000           0         1           uname
>   0.00    0.000000           0         2           unlink
>   0.00    0.000000           0         1           arch_prctl
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   70.946300              26427420         1 total
> 

I have no point of comparison here.  Is this performance good, or is
the mmap-based approach still a lot more expensive?



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore
  2013-07-01 19:53                         ` Andrew Morton
  (?)
  (?)
@ 2013-07-02  7:00                         ` Maxim Uvarov
  -1 siblings, 0 replies; 103+ messages in thread
From: Maxim Uvarov @ 2013-07-02  7:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: HATAYAMA Daisuke, Atsushi Kumagai, riel, kexec, hughd,
	linux-kernel, lisa.mitchell, vgoyal, linux-mm, zhangyanfei,
	ebiederm, kosaki.motohiro, walken, cpw, jingbai.ma

[-- Attachment #1: Type: text/plain, Size: 2778 bytes --]

2013/7/1 Andrew Morton <akpm@linux-foundation.org>

> On Mon, 1 Jul 2013 18:34:43 +0400 Maxim Uvarov <muvarov@gmail.com> wrote:
>
> > 2013/7/1 HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> >
> > > (2013/06/29 1:40), Maxim Uvarov wrote:
> > >
> > >> Did test on 1TB machine. Total vmcore capture and save took 143
> minutes
> > >> while vmcore size increased from 9Gb to 59Gb.
> > >>
> > >> Will do some debug for that.
> > >>
> > >> Maxim.
> > >>
> > >
> > > Please show me your kdump configuration file and tell me what you did
> in
> > > the test and how you confirmed the result.
> > >
> > >
> > Hello Hatayama,
> >
> > I re-run tests in dev env. I took your latest kernel patchset from
> > patchwork for vmcore + devel branch of makedumpfile + fix to open and
> write
> > to /dev/null. Run this test on 1Tb memory machine with memory used by
> some
> > user space processes. crashkernel=384M.
> >
> > Please see my results for makedumpfile process work:
> > [gzip compression]
> > -c -d31 /dev/null
> > real 37.8 m
> > user 29.51 m
> > sys 7.12 m
> >
> > [no compression]
> > -d31 /dev/null
> > real 27 m
> > user 23 m
> > sys   4 m
> >
> > [no compression, disable cyclic mode]
> > -d31 --non-cyclic /dev/null
> > real 26.25 m
> > user 23 m
> > sys 3.13 m
> >
> > [gzip compression]
> > -c -d31 /dev/null
> > % time     seconds  usecs/call     calls    errors syscall
> > ------ ----------- ----------- --------- --------- ----------------
> >  54.75   38.840351         110    352717           mmap
> >  44.55   31.607620          90    352716         1 munmap
> >   0.70    0.497668           0  25497667           brk
> >   0.00    0.000356           0    111920           write
> >   0.00    0.000280           0    111904           lseek
> >   0.00    0.000025           4         7           open
> >   0.00    0.000000           0       473           read
> >   0.00    0.000000           0         7           close
> >   0.00    0.000000           0         3           fstat
> >   0.00    0.000000           0         1           getpid
> >   0.00    0.000000           0         1           execve
> >   0.00    0.000000           0         1           uname
> >   0.00    0.000000           0         2           unlink
> >   0.00    0.000000           0         1           arch_prctl
> > ------ ----------- ----------- --------- --------- ----------------
> > 100.00   70.946300              26427420         1 total
> >
>
> I have no point of comparison here.  Is this performance good, or is
> the mmap-based approach still a lot more expensive?
>
>
> Compressing to non-mmap version improvement is 30 minutes against 130
minutes for total dump process. And kernel load is very minimal. So
definitely we need these patches.

-- 
Best regards,
Maxim Uvarov

[-- Attachment #2: Type: text/html, Size: 3646 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2013-07-02  7:00 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-23  5:24 [PATCH v8 0/9] kdump, vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-05-23  5:24 ` HATAYAMA Daisuke
2013-05-23  5:24 ` HATAYAMA Daisuke
2013-05-23  5:25 ` [PATCH v8 1/9] vmcore: clean up read_vmcore() HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25 ` [PATCH v8 2/9] vmcore: allocate buffer for ELF headers on page-size alignment HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 14:22   ` Vivek Goyal
2013-05-23 14:22     ` Vivek Goyal
2013-05-23 14:22     ` Vivek Goyal
2013-05-23 21:46   ` Andrew Morton
2013-05-23 21:46     ` Andrew Morton
2013-05-23 21:46     ` Andrew Morton
2013-05-23  5:25 ` [PATCH v8 3/9] vmcore: treat memory chunks referenced by PT_LOAD program header entries in page-size boundary in vmcore_list HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 21:49   ` Andrew Morton
2013-05-23 21:49     ` Andrew Morton
2013-05-23 21:49     ` Andrew Morton
2013-05-24 13:12     ` Vivek Goyal
2013-05-24 13:12       ` Vivek Goyal
2013-05-24 13:12       ` Vivek Goyal
2013-05-27  0:13       ` HATAYAMA Daisuke
2013-05-27  0:13         ` HATAYAMA Daisuke
2013-05-27  0:13         ` HATAYAMA Daisuke
2013-05-23  5:25 ` [PATCH v8 4/9] vmalloc: make find_vm_area check in range HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25 ` [PATCH v8 5/9] vmalloc: introduce remap_vmalloc_range_partial HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 22:00   ` Andrew Morton
2013-05-23 22:00     ` Andrew Morton
2013-05-23 22:00     ` Andrew Morton
2013-05-23  5:25 ` [PATCH v8 6/9] vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 14:28   ` Vivek Goyal
2013-05-23 14:28     ` Vivek Goyal
2013-05-23 14:28     ` Vivek Goyal
2013-05-23 22:17   ` Andrew Morton
2013-05-23 22:17     ` Andrew Morton
2013-05-23 22:17     ` Andrew Morton
2013-05-23  5:25 ` [PATCH v8 7/9] vmcore: Allow user process to remap ELF note segment buffer HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 14:32   ` Vivek Goyal
2013-05-23 14:32     ` Vivek Goyal
2013-05-23 14:32     ` Vivek Goyal
2013-05-23  5:25 ` [PATCH v8 8/9] vmcore: calculate vmcore file size from buffer size and total size of vmcore objects HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 14:34   ` Vivek Goyal
2013-05-23 14:34     ` Vivek Goyal
2013-05-23 14:34     ` Vivek Goyal
2013-05-23  5:25 ` [PATCH v8 9/9] vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23  5:25   ` HATAYAMA Daisuke
2013-05-23 22:24   ` Andrew Morton
2013-05-23 22:24     ` Andrew Morton
2013-05-23 22:24     ` Andrew Morton
2013-05-24  9:02     ` Maxim Uvarov
2013-05-27  1:49       ` HATAYAMA Daisuke
2013-05-27  1:49         ` HATAYAMA Daisuke
2013-05-27  1:49         ` HATAYAMA Daisuke
2013-05-30  9:14         ` Maxim Uvarov
2013-05-30  9:26           ` Zhang Yanfei
2013-05-30  9:26             ` Zhang Yanfei
2013-05-30  9:26             ` Zhang Yanfei
2013-05-30 10:30             ` Maxim Uvarov
2013-06-03  8:43               ` Atsushi Kumagai
2013-06-03  8:43                 ` Atsushi Kumagai
2013-06-03  8:43                 ` Atsushi Kumagai
2013-06-04 15:34                 ` Maxim Uvarov
2013-06-07  1:11                   ` Zhang Yanfei
2013-06-07  1:11                     ` Zhang Yanfei
2013-06-07  1:11                     ` Zhang Yanfei
2013-06-28 16:40                 ` Maxim Uvarov
2013-06-30 23:53                   ` HATAYAMA Daisuke
2013-06-30 23:53                     ` HATAYAMA Daisuke
2013-06-30 23:53                     ` HATAYAMA Daisuke
2013-07-01 14:34                     ` Maxim Uvarov
2013-07-01 19:53                       ` Andrew Morton
2013-07-01 19:53                         ` Andrew Morton
2013-07-01 19:53                         ` Andrew Morton
2013-07-02  7:00                         ` Maxim Uvarov
2013-06-06 21:31   ` Arnd Bergmann
2013-06-06 21:31     ` Arnd Bergmann
2013-06-06 21:31     ` Arnd Bergmann
2013-06-07  1:01     ` HATAYAMA Daisuke
2013-06-07  1:01       ` HATAYAMA Daisuke
2013-06-07  1:01       ` HATAYAMA Daisuke
2013-06-07 18:34       ` Arnd Bergmann
2013-06-07 18:34         ` Arnd Bergmann
2013-06-07 18:34         ` Arnd Bergmann
2013-06-08 10:42         ` HATAYAMA Daisuke
2013-06-08 10:42           ` HATAYAMA Daisuke
2013-06-08 10:42           ` HATAYAMA Daisuke
2013-05-23 14:35 ` [PATCH v8 0/9] kdump, " Vivek Goyal
2013-05-23 14:35   ` Vivek Goyal
2013-05-23 14:35   ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.