All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-02-14 10:11 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance. My simple benchmark shows the improvement
from 200 [MiB/sec] to over 50.0 [GiB/sec].

Benchmark
=========

= Machine spec
  - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*)
  - memory: 32GB
  - kernel: 3.8-rc6 with this patch
  - vmcore size: 31.7GB

  (*) only 1 cpu is used in the 2nd kernel now.

= Benchmark Case

1) copy /proc/vmcore *WITHOUT* mmap() on /proc/vmcore

$ time dd bs=4096 if=/proc/vmcore of=/dev/null
8307246+1 records in
8307246+1 records out
real    2m 31.50s
user    0m 1.06s
sys     2m 27.60s

So performance is 214.26 [MiB/sec].

2) copy /proc/vmcore with mmap()

  I ran the next command and recorded real time:

  $ for n in $(seq 1 15) ; do \
  >   time copyvmcore2 --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \
  > done

  where copyvmcore2 is an ad-hoc test tool that read data from
  /proc/vmcore via mmap() in given block-size unit and write them to
  some file.

|  n | map size |  time | page table | performance |
|    |          | (sec) |            |   [GiB/sec] |
|----+----------+-------+------------+-------------|
|  1 | 4 KiB    | 78.35 | 8 iB       |        0.40 |
|  2 | 8 KiB    | 45.29 | 16 iB      |        0.70 |
|  3 | 16 KiB   | 23.82 | 32 iB      |        1.33 |
|  4 | 32 KiB   | 12.90 | 64 iB      |        2.46 |
|  5 | 64 KiB   |  6.13 | 128 iB     |        5.17 |
|  6 | 128 KiB  |  3.26 | 256 iB     |        9.72 |
|  7 | 256 KiB  |  1.86 | 512 iB     |       17.04 |
|  8 | 512 KiB  |  1.13 | 1 KiB      |       28.04 |
|  9 | 1 MiB    |  0.77 | 2 KiB      |       41.16 |
| 10 | 2 MiB    |  0.58 | 4 KiB      |       54.64 |
| 11 | 4 MiB    |  0.50 | 8 KiB      |       63.38 |
| 12 | 8 MiB    |  0.46 | 16 KiB     |       68.89 |
| 13 | 16 MiB   |  0.44 | 32 KiB     |       72.02 |
| 14 | 32 MiB   |  0.44 | 64 KiB     |       72.02 |
| 15 | 64 MiB   |  0.45 | 128 KiB    |       70.42 |

3) copy /proc/vmcore with mmap() on /dev/oldmem

I posted another patch series for mmap() on /dev/oldmem a few weeks ago.
See: https://lkml.org/lkml/2013/2/3/431

Next is the table shown on the post showing the benchmark.

|  n | map size |  time | page table | performance |
|    |          | (sec) |            |   [GiB/sec] |
|----+----------+-------+------------+-------------|
|  1 | 4 KiB    | 41.86 | 8 iB       |        0.76 |
|  2 | 8 KiB    | 25.43 | 16 iB      |        1.25 |
|  3 | 16 KiB   | 13.28 | 32 iB      |        2.39 |
|  4 | 32 KiB   |  7.20 | 64 iB      |        4.40 |
|  5 | 64 KiB   |  3.45 | 128 iB     |        9.19 |
|  6 | 128 KiB  |  1.82 | 256 iB     |       17.42 |
|  7 | 256 KiB  |  1.03 | 512 iB     |       30.78 |
|  8 | 512 KiB  |  0.61 | 1K iB      |       51.97 |
|  9 | 1 MiB    |  0.41 | 2K iB      |       77.32 |
| 10 | 2 MiB    |  0.32 | 4K iB      |       99.06 |
| 11 | 4 MiB    |  0.27 | 8K iB      |      117.41 |
| 12 | 8 MiB    |  0.24 | 16 KiB     |      132.08 |
| 13 | 16 MiB   |  0.23 | 32 KiB     |      137.83 |
| 14 | 32 MiB   |  0.22 | 64 KiB     |      144.09 |
| 15 | 64 MiB   |  0.22 | 128 KiB    |      144.09 |

= Discussion

- For small map size, we can see performance degradation on mmap()
  case due to many page table modification and TLB flushes similarly
  to read_oldmem() case. But for large map size we can see the
  improved performance.

  Each application need to choose appropreate map size for their
  preferable performance.

- mmap() on /dev/oldmem appears better than that on /proc/vmcore. But
  actual processing does not only copying but also IO work. This
  difference is not a problem.

- Both mmap() case shows drastically better performance than previous
  RFC patch set's about 2.5 [GiB/sec] that maps all dump target memory
  in kernel direct mapping address space. This is because there's no
  longer memcpy() from kernel-space to user-space.

Design
======

= Support Range

- mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32
  interface is used only if dump target size is less than 4GB. Then,
  the existing interface is enough in performance.

= Change of /proc/vmcore format

For mmap()'s page-size boundary requirement, /proc/vmcore changed its
own shape and now put its objects in page-size boundary.

- Allocate buffer for ELF headers in page-size boundary.
  => See [PATCH 01/13].

- Note objects scattered on old memory are copied in a single
  page-size aligned buffer on 2nd kernel, and it is remapped to
  user-space.
  => See [PATCH 09/13].
  
- The head and/or tail pages of memroy chunks are also copied on 2nd
  kernel if either of their ends is not page-size aligned. See
  => See [PATCH 12/13].

= 32-bit PAE limitation

- On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory
  only since remap_pfn_range()'s third argument, pfn, has 32-bit
  length only, defined as unsigned long type.

TODO
====

- fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
  confirm whether we can see enough performance improvement.

Test
====

Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.

---

HATAYAMA Daisuke (13):
      vmcore: introduce mmap_vmcore()
      vmcore: copy non page-size aligned head and tail pages in 2nd kernel
      vmcore: count holes generated by round-up operation for vmcore size
      vmcore: round-up offset of vmcore object in page-size boundary
      vmcore: copy ELF note segments in buffer on 2nd kernel
      vmcore: remove unused helper function
      vmcore: modify read_vmcore() to read buffer on 2nd kernel
      vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
      vmcore: modify ELF32 code according to new type
      vmcore: introduce types for objects copied in 2nd kernel
      vmcore: fill unused part of buffer for ELF headers with 0
      vmcore: round up buffer size of ELF headers by PAGE_SIZE
      vmcore: allocate buffer for ELF headers on page-size alignment


 fs/proc/vmcore.c        |  408 +++++++++++++++++++++++++++++++++++------------
 include/linux/proc_fs.h |   11 +
 2 files changed, 313 insertions(+), 106 deletions(-)

-- 

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-02-14 10:11 ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Currently, read to /proc/vmcore is done by read_oldmem() that uses
ioremap/iounmap per a single page. For example, if memory is 1GB,
ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
times. This causes big performance degradation.

To address the issue, this patch implements mmap() on /proc/vmcore to
improve read performance. My simple benchmark shows the improvement
from 200 [MiB/sec] to over 50.0 [GiB/sec].

Benchmark
=========

= Machine spec
  - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*)
  - memory: 32GB
  - kernel: 3.8-rc6 with this patch
  - vmcore size: 31.7GB

  (*) only 1 cpu is used in the 2nd kernel now.

= Benchmark Case

1) copy /proc/vmcore *WITHOUT* mmap() on /proc/vmcore

$ time dd bs=4096 if=/proc/vmcore of=/dev/null
8307246+1 records in
8307246+1 records out
real    2m 31.50s
user    0m 1.06s
sys     2m 27.60s

So performance is 214.26 [MiB/sec].

2) copy /proc/vmcore with mmap()

  I ran the next command and recorded real time:

  $ for n in $(seq 1 15) ; do \
  >   time copyvmcore2 --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \
  > done

  where copyvmcore2 is an ad-hoc test tool that read data from
  /proc/vmcore via mmap() in given block-size unit and write them to
  some file.

|  n | map size |  time | page table | performance |
|    |          | (sec) |            |   [GiB/sec] |
|----+----------+-------+------------+-------------|
|  1 | 4 KiB    | 78.35 | 8 iB       |        0.40 |
|  2 | 8 KiB    | 45.29 | 16 iB      |        0.70 |
|  3 | 16 KiB   | 23.82 | 32 iB      |        1.33 |
|  4 | 32 KiB   | 12.90 | 64 iB      |        2.46 |
|  5 | 64 KiB   |  6.13 | 128 iB     |        5.17 |
|  6 | 128 KiB  |  3.26 | 256 iB     |        9.72 |
|  7 | 256 KiB  |  1.86 | 512 iB     |       17.04 |
|  8 | 512 KiB  |  1.13 | 1 KiB      |       28.04 |
|  9 | 1 MiB    |  0.77 | 2 KiB      |       41.16 |
| 10 | 2 MiB    |  0.58 | 4 KiB      |       54.64 |
| 11 | 4 MiB    |  0.50 | 8 KiB      |       63.38 |
| 12 | 8 MiB    |  0.46 | 16 KiB     |       68.89 |
| 13 | 16 MiB   |  0.44 | 32 KiB     |       72.02 |
| 14 | 32 MiB   |  0.44 | 64 KiB     |       72.02 |
| 15 | 64 MiB   |  0.45 | 128 KiB    |       70.42 |

3) copy /proc/vmcore with mmap() on /dev/oldmem

I posted another patch series for mmap() on /dev/oldmem a few weeks ago.
See: https://lkml.org/lkml/2013/2/3/431

Next is the table shown on the post showing the benchmark.

|  n | map size |  time | page table | performance |
|    |          | (sec) |            |   [GiB/sec] |
|----+----------+-------+------------+-------------|
|  1 | 4 KiB    | 41.86 | 8 iB       |        0.76 |
|  2 | 8 KiB    | 25.43 | 16 iB      |        1.25 |
|  3 | 16 KiB   | 13.28 | 32 iB      |        2.39 |
|  4 | 32 KiB   |  7.20 | 64 iB      |        4.40 |
|  5 | 64 KiB   |  3.45 | 128 iB     |        9.19 |
|  6 | 128 KiB  |  1.82 | 256 iB     |       17.42 |
|  7 | 256 KiB  |  1.03 | 512 iB     |       30.78 |
|  8 | 512 KiB  |  0.61 | 1K iB      |       51.97 |
|  9 | 1 MiB    |  0.41 | 2K iB      |       77.32 |
| 10 | 2 MiB    |  0.32 | 4K iB      |       99.06 |
| 11 | 4 MiB    |  0.27 | 8K iB      |      117.41 |
| 12 | 8 MiB    |  0.24 | 16 KiB     |      132.08 |
| 13 | 16 MiB   |  0.23 | 32 KiB     |      137.83 |
| 14 | 32 MiB   |  0.22 | 64 KiB     |      144.09 |
| 15 | 64 MiB   |  0.22 | 128 KiB    |      144.09 |

= Discussion

- For small map size, we can see performance degradation on mmap()
  case due to many page table modification and TLB flushes similarly
  to read_oldmem() case. But for large map size we can see the
  improved performance.

  Each application need to choose appropreate map size for their
  preferable performance.

- mmap() on /dev/oldmem appears better than that on /proc/vmcore. But
  actual processing does not only copying but also IO work. This
  difference is not a problem.

- Both mmap() case shows drastically better performance than previous
  RFC patch set's about 2.5 [GiB/sec] that maps all dump target memory
  in kernel direct mapping address space. This is because there's no
  longer memcpy() from kernel-space to user-space.

Design
======

= Support Range

- mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32
  interface is used only if dump target size is less than 4GB. Then,
  the existing interface is enough in performance.

= Change of /proc/vmcore format

For mmap()'s page-size boundary requirement, /proc/vmcore changed its
own shape and now put its objects in page-size boundary.

- Allocate buffer for ELF headers in page-size boundary.
  => See [PATCH 01/13].

- Note objects scattered on old memory are copied in a single
  page-size aligned buffer on 2nd kernel, and it is remapped to
  user-space.
  => See [PATCH 09/13].
  
- The head and/or tail pages of memroy chunks are also copied on 2nd
  kernel if either of their ends is not page-size aligned. See
  => See [PATCH 12/13].

= 32-bit PAE limitation

- On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory
  only since remap_pfn_range()'s third argument, pfn, has 32-bit
  length only, defined as unsigned long type.

TODO
====

- fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
  confirm whether we can see enough performance improvement.

Test
====

Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.

---

HATAYAMA Daisuke (13):
      vmcore: introduce mmap_vmcore()
      vmcore: copy non page-size aligned head and tail pages in 2nd kernel
      vmcore: count holes generated by round-up operation for vmcore size
      vmcore: round-up offset of vmcore object in page-size boundary
      vmcore: copy ELF note segments in buffer on 2nd kernel
      vmcore: remove unused helper function
      vmcore: modify read_vmcore() to read buffer on 2nd kernel
      vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
      vmcore: modify ELF32 code according to new type
      vmcore: introduce types for objects copied in 2nd kernel
      vmcore: fill unused part of buffer for ELF headers with 0
      vmcore: round up buffer size of ELF headers by PAGE_SIZE
      vmcore: allocate buffer for ELF headers on page-size alignment


 fs/proc/vmcore.c        |  408 +++++++++++++++++++++++++++++++++++------------
 include/linux/proc_fs.h |   11 +
 2 files changed, 313 insertions(+), 106 deletions(-)

-- 

Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:11   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Allocate buffer for ELF headers on page-size aligned boudary to
satisfy mmap() requirement. For this, __get_free_pages() is used
instead of kmalloc().

Also, later patch will decrease actually used buffer size for ELF
headers, so it's necessary to keep original buffer size and actually
used buffer size separately. elfcorebuf_sz_orig keeps the original one
and elfcorebuf_sz the actually used one.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   30 +++++++++++++++++++++---------
 1 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 0d5071d..85714c3 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -30,6 +30,7 @@ static LIST_HEAD(vmcore_list);
 /* Stores the pointer to the buffer containing kernel elf core headers. */
 static char *elfcorebuf;
 static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -560,26 +561,31 @@ static int __init parse_crash_elf64_headers(void)
 
 	/* Read in all elf headers. */
 	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = elfcorebuf_sz;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
@@ -616,26 +622,31 @@ static int __init parse_crash_elf32_headers(void)
 
 	/* Read in all elf headers. */
 	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = elfcorebuf_sz;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
@@ -719,7 +730,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	kfree(elfcorebuf);
+	free_pages((unsigned long)elfcorebuf,
+		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-02-14 10:11   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Allocate buffer for ELF headers on page-size aligned boudary to
satisfy mmap() requirement. For this, __get_free_pages() is used
instead of kmalloc().

Also, later patch will decrease actually used buffer size for ELF
headers, so it's necessary to keep original buffer size and actually
used buffer size separately. elfcorebuf_sz_orig keeps the original one
and elfcorebuf_sz the actually used one.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   30 +++++++++++++++++++++---------
 1 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 0d5071d..85714c3 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -30,6 +30,7 @@ static LIST_HEAD(vmcore_list);
 /* Stores the pointer to the buffer containing kernel elf core headers. */
 static char *elfcorebuf;
 static size_t elfcorebuf_sz;
+static size_t elfcorebuf_sz_orig;
 
 /* Total size of vmcore file. */
 static u64 vmcore_size;
@@ -560,26 +561,31 @@ static int __init parse_crash_elf64_headers(void)
 
 	/* Read in all elf headers. */
 	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = elfcorebuf_sz;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf64(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf64(elfcorebuf, elfcorebuf_sz,
 							&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
@@ -616,26 +622,31 @@ static int __init parse_crash_elf32_headers(void)
 
 	/* Read in all elf headers. */
 	elfcorebuf_sz = sizeof(Elf32_Ehdr) + ehdr.e_phnum * sizeof(Elf32_Phdr);
-	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
+	elfcorebuf_sz_orig = elfcorebuf_sz;
+	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					       get_order(elfcorebuf_sz));
 	if (!elfcorebuf)
 		return -ENOMEM;
 	addr = elfcorehdr_addr;
 	rc = read_from_oldmem(elfcorebuf, elfcorebuf_sz, &addr, 0);
 	if (rc < 0) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 
 	/* Merge all PT_NOTE headers into one. */
 	rc = merge_note_headers_elf32(elfcorebuf, &elfcorebuf_sz, &vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	rc = process_ptload_program_headers_elf32(elfcorebuf, elfcorebuf_sz,
 								&vmcore_list);
 	if (rc) {
-		kfree(elfcorebuf);
+		free_pages((unsigned long)elfcorebuf,
+			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
 	set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
@@ -719,7 +730,8 @@ void vmcore_cleanup(void)
 		list_del(&m->list);
 		kfree(m);
 	}
-	kfree(elfcorebuf);
+	free_pages((unsigned long)elfcorebuf,
+		   get_order(elfcorebuf_sz_orig));
 	elfcorebuf = NULL;
 }
 EXPORT_SYMBOL_GPL(vmcore_cleanup);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:11   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

To satisfy mmap() page-size boundary requirement, reound up buffer
size of ELF headers by PAGE_SIZE. The resulting value becomes offset
of ELF note segments and it's assigned in unique PT_NOTE program
header entry.

Also, some part that assumes past ELF headers' size is replaced by
this new rounded-up value.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 85714c3..5010ead 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -313,7 +313,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf64_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -331,6 +331,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
+
 	return 0;
 }
 
@@ -431,9 +433,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
-			phdr_ptr->p_memsz; /* Note sections */
+	vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
+						  PAGE_SIZE);
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		if (phdr_ptr->p_type != PT_LOAD)


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
@ 2013-02-14 10:11   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

To satisfy mmap() page-size boundary requirement, reound up buffer
size of ELF headers by PAGE_SIZE. The resulting value becomes offset
of ELF note segments and it's assigned in unique PT_NOTE program
header entry.

Also, some part that assumes past ELF headers' size is replaced by
this new rounded-up value.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 85714c3..5010ead 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -313,7 +313,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	phdr.p_flags   = 0;
 	note_off = sizeof(Elf64_Ehdr) +
 			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
-	phdr.p_offset  = note_off;
+	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
 	phdr.p_vaddr   = phdr.p_paddr = 0;
 	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
 	phdr.p_align   = 0;
@@ -331,6 +331,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 
+	*elfsz = roundup(*elfsz, PAGE_SIZE);
+
 	return 0;
 }
 
@@ -431,9 +433,8 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
 
 	/* First program header is PT_NOTE header. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr) +
-			phdr_ptr->p_memsz; /* Note sections */
+	vmcore_off = phdr_ptr->p_offset + roundup(phdr_ptr->p_memsz,
+						  PAGE_SIZE);
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		if (phdr_ptr->p_type != PT_LOAD)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/13] vmcore: fill unused part of buffer for ELF headers with 0
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:11   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Via mmap() we export the range [elfcorebuf_sz, roundup(elfcorebuf_sz,
PAGE_SIZE)] to user-space. We need to fill this range with 0.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 5010ead..43d338a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -328,6 +328,11 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
 
+	/* Fill unused part with zero */
+	memset(elfptr + sizeof(Elf64_Ehdr) +
+	       (ehdr_ptr->e_phnum - nr_ptnote + 1) * sizeof(Elf64_Phdr), 0,
+	       (nr_ptnote - 1) * sizeof(Elf64_Phdr));
+
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/13] vmcore: fill unused part of buffer for ELF headers with 0
@ 2013-02-14 10:11   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:11 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Via mmap() we export the range [elfcorebuf_sz, roundup(elfcorebuf_sz,
PAGE_SIZE)] to user-space. We need to fill this range with 0.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 5010ead..43d338a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -328,6 +328,11 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 	*elfsz = *elfsz - i;
 	memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
 
+	/* Fill unused part with zero */
+	memset(elfptr + sizeof(Elf64_Ehdr) +
+	       (ehdr_ptr->e_phnum - nr_ptnote + 1) * sizeof(Elf64_Phdr), 0,
+	       (nr_ptnote - 1) * sizeof(Elf64_Phdr));
+
 	/* Modify e_phnum to reflect merged headers. */
 	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Some parts of old memory need to be copied in buffers on 2nd kernel to
be remapped to user-space. To distinguish objects in the buffer on 2nd
kernel and the ones on old memory, enum vmcore_type is introduced: the
object in the buffer on 2nd kernel has VMCORE_2ND_KERNEL type, and the
one on old memory has VMCORE_OLD_MEMORY type.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 include/linux/proc_fs.h |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 32676b3..4b153ed 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -97,11 +97,20 @@ struct kcore_list {
 	int type;
 };
 
+enum vmcore_type {
+	VMCORE_OLD_MEMORY,
+	VMCORE_2ND_KERNEL,
+};
+
 struct vmcore {
 	struct list_head list;
-	unsigned long long paddr;
+	union {
+		unsigned long long paddr;
+		char *buf;
+	};
 	unsigned long long size;
 	loff_t offset;
+	enum vmcore_type type;
 };
 
 #ifdef CONFIG_PROC_FS


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Some parts of old memory need to be copied in buffers on 2nd kernel to
be remapped to user-space. To distinguish objects in the buffer on 2nd
kernel and the ones on old memory, enum vmcore_type is introduced: the
object in the buffer on 2nd kernel has VMCORE_2ND_KERNEL type, and the
one on old memory has VMCORE_OLD_MEMORY type.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 include/linux/proc_fs.h |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 32676b3..4b153ed 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -97,11 +97,20 @@ struct kcore_list {
 	int type;
 };
 
+enum vmcore_type {
+	VMCORE_OLD_MEMORY,
+	VMCORE_2ND_KERNEL,
+};
+
 struct vmcore {
 	struct list_head list;
-	unsigned long long paddr;
+	union {
+		unsigned long long paddr;
+		char *buf;
+	};
 	unsigned long long size;
 	loff_t offset;
+	enum vmcore_type type;
 };
 
 #ifdef CONFIG_PROC_FS


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/13] vmcore: modify ELF32 code according to new type
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

On elf32 mmap() is not supported. All vmcore objects are in old
memory.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 43d338a..7e3f922 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -389,6 +389,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			kfree(notes_section);
 			return -ENOMEM;
 		}
+		new->type = VMCORE_OLD_MEMORY;
 		new->paddr = phdr_ptr->p_offset;
 		new->size = real_sz;
 		list_add_tail(&new->list, vc_list);
@@ -486,6 +487,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
+		new->type = VMCORE_OLD_MEMORY;
 		new->paddr = phdr_ptr->p_offset;
 		new->size = phdr_ptr->p_memsz;
 		list_add_tail(&new->list, vc_list);


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/13] vmcore: modify ELF32 code according to new type
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

On elf32 mmap() is not supported. All vmcore objects are in old
memory.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 43d338a..7e3f922 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -389,6 +389,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
 			kfree(notes_section);
 			return -ENOMEM;
 		}
+		new->type = VMCORE_OLD_MEMORY;
 		new->paddr = phdr_ptr->p_offset;
 		new->size = real_sz;
 		list_add_tail(&new->list, vc_list);
@@ -486,6 +487,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 		new = get_new_element();
 		if (!new)
 			return -ENOMEM;
+		new->type = VMCORE_OLD_MEMORY;
 		new->paddr = phdr_ptr->p_offset;
 		new->size = phdr_ptr->p_memsz;
 		list_add_tail(&new->list, vc_list);


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Vmcore object has buffer on 2nd kernel if it has VMCORE_2ND_KERNEL
type, which needs to be freed.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 7e3f922..77e0a0e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -735,6 +735,15 @@ void vmcore_cleanup(void)
 		struct vmcore *m;
 
 		m = list_entry(pos, struct vmcore, list);
+
+		switch (m->type) {
+		case VMCORE_OLD_MEMORY:
+			break;
+		case VMCORE_2ND_KERNEL:
+			free_pages((unsigned long)m->buf, get_order(m->size));
+			break;
+		}
+
 		list_del(&m->list);
 		kfree(m);
 	}


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Vmcore object has buffer on 2nd kernel if it has VMCORE_2ND_KERNEL
type, which needs to be freed.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 7e3f922..77e0a0e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -735,6 +735,15 @@ void vmcore_cleanup(void)
 		struct vmcore *m;
 
 		m = list_entry(pos, struct vmcore, list);
+
+		switch (m->type) {
+		case VMCORE_OLD_MEMORY:
+			break;
+		case VMCORE_2ND_KERNEL:
+			free_pages((unsigned long)m->buf, get_order(m->size));
+			break;
+		}
+
 		list_del(&m->list);
 		kfree(m);
 	}


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/13] vmcore: modify read_vmcore() to read buffer on 2nd kernel
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

If a given vmcore object has VMCORE_2ND_KERNEL type, target data is in
the buffer on 2nd kernel.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   64 ++++++++++++++++++++++++++----------------------------
 1 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 77e0a0e..4125a65 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -146,8 +146,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 {
 	ssize_t acc = 0, tmp;
 	size_t tsz;
-	u64 start, nr_bytes;
-	struct vmcore *curr_m = NULL;
+	struct vmcore *m;
 
 	if (buflen == 0 || *fpos >= vmcore_size)
 		return 0;
@@ -173,39 +172,38 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
-	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
-	if (!curr_m)
-        	return -EINVAL;
-	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
-		tsz = buflen;
-
-	/* Calculate left bytes in current memory segment. */
-	nr_bytes = (curr_m->size - (start - curr_m->paddr));
-	if (tsz > nr_bytes)
-		tsz = nr_bytes;
-
-	while (buflen) {
-		tmp = read_from_oldmem(buffer, tsz, &start, 1);
-		if (tmp < 0)
-			return tmp;
-		buflen -= tsz;
-		*fpos += tsz;
-		buffer += tsz;
-		acc += tsz;
-		if (start >= (curr_m->paddr + curr_m->size)) {
-			if (curr_m->list.next == &vmcore_list)
-				return acc;	/*EOF*/
-			curr_m = list_entry(curr_m->list.next,
-						struct vmcore, list);
-			start = curr_m->paddr;
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = m->offset + m->size - *fpos;
+			if (buflen < tsz)
+				tsz = buflen;
+			switch (m->type) {
+			case VMCORE_OLD_MEMORY: {
+				u64 paddr = m->paddr + *fpos - m->offset;
+
+				tmp = read_from_oldmem(buffer, tsz, &paddr, 1);
+				if (tmp < 0)
+					return tmp;
+				break;
+			}
+			case VMCORE_2ND_KERNEL:
+				if (copy_to_user(buffer,
+						 m->buf + (*fpos - m->offset),
+						 tsz))
+					return -EFAULT;
+				break;
+			}
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
 		}
-		if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
-			tsz = buflen;
-		/* Calculate left bytes in current memory segment. */
-		nr_bytes = (curr_m->size - (start - curr_m->paddr));
-		if (tsz > nr_bytes)
-			tsz = nr_bytes;
 	}
+
 	return acc;
 }
 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/13] vmcore: modify read_vmcore() to read buffer on 2nd kernel
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

If a given vmcore object has VMCORE_2ND_KERNEL type, target data is in
the buffer on 2nd kernel.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   64 ++++++++++++++++++++++++++----------------------------
 1 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 77e0a0e..4125a65 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -146,8 +146,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 {
 	ssize_t acc = 0, tmp;
 	size_t tsz;
-	u64 start, nr_bytes;
-	struct vmcore *curr_m = NULL;
+	struct vmcore *m;
 
 	if (buflen == 0 || *fpos >= vmcore_size)
 		return 0;
@@ -173,39 +172,38 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 			return acc;
 	}
 
-	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
-	if (!curr_m)
-        	return -EINVAL;
-	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
-		tsz = buflen;
-
-	/* Calculate left bytes in current memory segment. */
-	nr_bytes = (curr_m->size - (start - curr_m->paddr));
-	if (tsz > nr_bytes)
-		tsz = nr_bytes;
-
-	while (buflen) {
-		tmp = read_from_oldmem(buffer, tsz, &start, 1);
-		if (tmp < 0)
-			return tmp;
-		buflen -= tsz;
-		*fpos += tsz;
-		buffer += tsz;
-		acc += tsz;
-		if (start >= (curr_m->paddr + curr_m->size)) {
-			if (curr_m->list.next == &vmcore_list)
-				return acc;	/*EOF*/
-			curr_m = list_entry(curr_m->list.next,
-						struct vmcore, list);
-			start = curr_m->paddr;
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = m->offset + m->size - *fpos;
+			if (buflen < tsz)
+				tsz = buflen;
+			switch (m->type) {
+			case VMCORE_OLD_MEMORY: {
+				u64 paddr = m->paddr + *fpos - m->offset;
+
+				tmp = read_from_oldmem(buffer, tsz, &paddr, 1);
+				if (tmp < 0)
+					return tmp;
+				break;
+			}
+			case VMCORE_2ND_KERNEL:
+				if (copy_to_user(buffer,
+						 m->buf + (*fpos - m->offset),
+						 tsz))
+					return -EFAULT;
+				break;
+			}
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
 		}
-		if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
-			tsz = buflen;
-		/* Calculate left bytes in current memory segment. */
-		nr_bytes = (curr_m->size - (start - curr_m->paddr));
-		if (tsz > nr_bytes)
-			tsz = nr_bytes;
 	}
+
 	return acc;
 }
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/13] vmcore: remove unused helper function
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Remove map_offset_to_paddr(), which is no longer used.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   21 ---------------------
 1 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 4125a65..3aedb52 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -117,27 +117,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
 	return read;
 }
 
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
-					struct vmcore **m_ptr)
-{
-	struct vmcore *m;
-	u64 paddr;
-
-	list_for_each_entry(m, vc_list, list) {
-		u64 start, end;
-		start = m->offset;
-		end = m->offset + m->size - 1;
-		if (offset >= start && offset <= end) {
-			paddr = m->paddr + offset - start;
-			*m_ptr = m;
-			return paddr;
-		}
-	}
-	*m_ptr = NULL;
-	return 0;
-}
-
 /* Read from the ELF header and then the crash dump. On error, negative value is
  * returned otherwise number of bytes read are returned.
  */


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/13] vmcore: remove unused helper function
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Remove map_offset_to_paddr(), which is no longer used.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   21 ---------------------
 1 files changed, 0 insertions(+), 21 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 4125a65..3aedb52 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -117,27 +117,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
 	return read;
 }
 
-/* Maps vmcore file offset to respective physical address in memroy. */
-static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
-					struct vmcore **m_ptr)
-{
-	struct vmcore *m;
-	u64 paddr;
-
-	list_for_each_entry(m, vc_list, list) {
-		u64 start, end;
-		start = m->offset;
-		end = m->offset + m->size - 1;
-		if (offset >= start && offset <= end) {
-			paddr = m->paddr + offset - start;
-			*m_ptr = m;
-			return paddr;
-		}
-	}
-	*m_ptr = NULL;
-	return 0;
-}
-
 /* Read from the ELF header and then the crash dump. On error, negative value is
  * returned otherwise number of bytes read are returned.
  */


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Objects exported from ELF note segments are in fact located apart from
each other on old memory. But on /proc/vmcore they are exported as a
single ELF note segment. To satisfy mmap()'s page-size boundary
requirement, copy them in a page-size aligned buffer allocated by
__get_free_pages() on 2nd kernel and remap the buffer to user-space.

The buffer for ELF note segments is added to vmcore_list as the object
of VMCORE_2ND_KERNEL type.

Copy of ELF note segments is done in two pass: first pass tries to
calculate real total size of ELF note segments, and then 2nd pass
copies the segment data into the buffer of the real total size.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   78 +++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3aedb52..ccf0dc5 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -230,27 +230,25 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 	return size;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+static int __init parse_note_segments_elf64(char *elfptr, int *nr_ptnote,
+					    u64 *phdr_sz, char *notebuf)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
+	int i, rc=0;
+	loff_t notebuf_off = 0;
 	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr phdr, *phdr_ptr;
 	Elf64_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
+	Elf64_Phdr *phdr_ptr;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
+		if (nr_ptnote)
+			*nr_ptnote = *nr_ptnote + 1;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -271,20 +269,51 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
+		if (phdr_sz)
+			*phdr_sz += real_sz;
+		if (notebuf) {
+			memcpy(notebuf + notebuf_off, notes_section, real_sz);
+			notebuf_off += real_sz;
 		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
 	}
 
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+						struct list_head *vc_list)
+{
+	int i, nr_ptnote, rc=0;
+	char *tmp, *notebuf;
+	Elf64_Ehdr *ehdr_ptr;
+	Elf64_Phdr phdr;
+	u64 phdr_sz, note_off, notebuf_sz;
+	struct vmcore *new;
+
+	ehdr_ptr = (Elf64_Ehdr *)elfptr;
+
+	/* The 1st pass calculates real size of ELF note segments. */
+	nr_ptnote = 0;
+	phdr_sz = 0;
+	rc = parse_note_segments_elf64(elfptr, &nr_ptnote, &phdr_sz, NULL);
+	if (rc < 0)
+		return rc;
+
+	/* The 2nd pass copies the ELF note segments into the buffer
+	 * of the exact size. */
+	notebuf_sz = roundup(phdr_sz, PAGE_SIZE);
+	notebuf = (char *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					    get_order(notebuf_sz));
+	if (!notebuf)
+		return -ENOMEM;
+	rc = parse_note_segments_elf64(elfptr, NULL, NULL, notebuf);
+	if (rc < 0) {
+		free_pages((unsigned long)notebuf, get_order(notebuf_sz));
+		return rc;
+	}
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
@@ -315,6 +344,17 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 
 	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
+	/* Add the merged unique ELF note segments in vmcore_list. */
+	new = get_new_element();
+	if (!new) {
+		free_pages((unsigned long)notebuf, get_order(notebuf_sz));
+		return -ENOMEM;
+	}
+	new->type = VMCORE_2ND_KERNEL;
+	new->buf = notebuf;
+	new->size = notebuf_sz;
+	list_add_tail(&new->list, vc_list);
+
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Objects exported from ELF note segments are in fact located apart from
each other on old memory. But on /proc/vmcore they are exported as a
single ELF note segment. To satisfy mmap()'s page-size boundary
requirement, copy them in a page-size aligned buffer allocated by
__get_free_pages() on 2nd kernel and remap the buffer to user-space.

The buffer for ELF note segments is added to vmcore_list as the object
of VMCORE_2ND_KERNEL type.

Copy of ELF note segments is done in two pass: first pass tries to
calculate real total size of ELF note segments, and then 2nd pass
copies the segment data into the buffer of the real total size.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   78 +++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3aedb52..ccf0dc5 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -230,27 +230,25 @@ static u64 __init get_vmcore_size_elf32(char *elfptr)
 	return size;
 }
 
-/* Merges all the PT_NOTE headers into one. */
-static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
-						struct list_head *vc_list)
+static int __init parse_note_segments_elf64(char *elfptr, int *nr_ptnote,
+					    u64 *phdr_sz, char *notebuf)
 {
-	int i, nr_ptnote=0, rc=0;
-	char *tmp;
+	int i, rc=0;
+	loff_t notebuf_off = 0;
 	Elf64_Ehdr *ehdr_ptr;
-	Elf64_Phdr phdr, *phdr_ptr;
 	Elf64_Nhdr *nhdr_ptr;
-	u64 phdr_sz = 0, note_off;
+	Elf64_Phdr *phdr_ptr;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
 		int j;
 		void *notes_section;
-		struct vmcore *new;
 		u64 offset, max_sz, sz, real_sz = 0;
 		if (phdr_ptr->p_type != PT_NOTE)
 			continue;
-		nr_ptnote++;
+		if (nr_ptnote)
+			*nr_ptnote = *nr_ptnote + 1;
 		max_sz = phdr_ptr->p_memsz;
 		offset = phdr_ptr->p_offset;
 		notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -271,20 +269,51 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 			real_sz += sz;
 			nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
 		}
-
-		/* Add this contiguous chunk of notes section to vmcore list.*/
-		new = get_new_element();
-		if (!new) {
-			kfree(notes_section);
-			return -ENOMEM;
+		if (phdr_sz)
+			*phdr_sz += real_sz;
+		if (notebuf) {
+			memcpy(notebuf + notebuf_off, notes_section, real_sz);
+			notebuf_off += real_sz;
 		}
-		new->paddr = phdr_ptr->p_offset;
-		new->size = real_sz;
-		list_add_tail(&new->list, vc_list);
-		phdr_sz += real_sz;
 		kfree(notes_section);
 	}
 
+	return 0;
+}
+
+/* Merges all the PT_NOTE headers into one. */
+static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
+						struct list_head *vc_list)
+{
+	int i, nr_ptnote, rc=0;
+	char *tmp, *notebuf;
+	Elf64_Ehdr *ehdr_ptr;
+	Elf64_Phdr phdr;
+	u64 phdr_sz, note_off, notebuf_sz;
+	struct vmcore *new;
+
+	ehdr_ptr = (Elf64_Ehdr *)elfptr;
+
+	/* The 1st pass calculates real size of ELF note segments. */
+	nr_ptnote = 0;
+	phdr_sz = 0;
+	rc = parse_note_segments_elf64(elfptr, &nr_ptnote, &phdr_sz, NULL);
+	if (rc < 0)
+		return rc;
+
+	/* The 2nd pass copies the ELF note segments into the buffer
+	 * of the exact size. */
+	notebuf_sz = roundup(phdr_sz, PAGE_SIZE);
+	notebuf = (char *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+					    get_order(notebuf_sz));
+	if (!notebuf)
+		return -ENOMEM;
+	rc = parse_note_segments_elf64(elfptr, NULL, NULL, notebuf);
+	if (rc < 0) {
+		free_pages((unsigned long)notebuf, get_order(notebuf_sz));
+		return rc;
+	}
+
 	/* Prepare merged PT_NOTE program header. */
 	phdr.p_type    = PT_NOTE;
 	phdr.p_flags   = 0;
@@ -315,6 +344,17 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
 
 	*elfsz = roundup(*elfsz, PAGE_SIZE);
 
+	/* Add the merged unique ELF note segments in vmcore_list. */
+	new = get_new_element();
+	if (!new) {
+		free_pages((unsigned long)notebuf, get_order(notebuf_sz));
+		return -ENOMEM;
+	}
+	new->type = VMCORE_2ND_KERNEL;
+	new->buf = notebuf;
+	new->size = notebuf_sz;
+	list_add_tail(&new->list, vc_list);
+
 	return 0;
 }
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/13] vmcore: round-up offset of vmcore object in page-size boundary
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

To satisfy mmap()'s page-size bounary requirement, round-up offset of
each vmcore objects in page-size boundary.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index ccf0dc5..afedb5f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -517,7 +517,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
+static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
 						struct list_head *vc_list)
 {
 	loff_t vmcore_off;
@@ -527,12 +527,11 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+	vmcore_off = elfsz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
-		vmcore_off += m->size;
+		vmcore_off += roundup(m->size, PAGE_SIZE);
 	}
 }
 
@@ -613,7 +612,7 @@ static int __init parse_crash_elf64_headers(void)
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/13] vmcore: round-up offset of vmcore object in page-size boundary
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

To satisfy mmap()'s page-size bounary requirement, round-up offset of
each vmcore objects in page-size boundary.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index ccf0dc5..afedb5f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -517,7 +517,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
 }
 
 /* Sets offset fields of vmcore elements. */
-static void __init set_vmcore_list_offsets_elf64(char *elfptr,
+static void __init set_vmcore_list_offsets_elf64(char *elfptr, size_t elfsz,
 						struct list_head *vc_list)
 {
 	loff_t vmcore_off;
@@ -527,12 +527,11 @@ static void __init set_vmcore_list_offsets_elf64(char *elfptr,
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 
 	/* Skip Elf header and program headers. */
-	vmcore_off = sizeof(Elf64_Ehdr) +
-			(ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+	vmcore_off = elfsz;
 
 	list_for_each_entry(m, vc_list, list) {
 		m->offset = vmcore_off;
-		vmcore_off += m->size;
+		vmcore_off += roundup(m->size, PAGE_SIZE);
 	}
 }
 
@@ -613,7 +612,7 @@ static int __init parse_crash_elf64_headers(void)
 			   get_order(elfcorebuf_sz_orig));
 		return rc;
 	}
-	set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+	set_vmcore_list_offsets_elf64(elfcorebuf, elfcorebuf_sz, &vmcore_list);
 	return 0;
 }
 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/13] vmcore: count holes generated by round-up operation for vmcore size
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

The previous patch offsets of each vmcore objects by round-up
operation. vmcore size must count the holes.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index afedb5f..2968e5a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -196,7 +196,7 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -205,9 +205,9 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
+		size += roundup(phdr_ptr->p_memsz, PAGE_SIZE);
 		phdr_ptr++;
 	}
 	return size;
@@ -699,7 +699,7 @@ static int __init parse_crash_elf_headers(void)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/13] vmcore: count holes generated by round-up operation for vmcore size
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

The previous patch offsets of each vmcore objects by round-up
operation. vmcore size must count the holes.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index afedb5f..2968e5a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -196,7 +196,7 @@ static struct vmcore* __init get_new_element(void)
 	return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size_elf64(char *elfptr)
+static u64 __init get_vmcore_size_elf64(char *elfptr, size_t elfsz)
 {
 	int i;
 	u64 size;
@@ -205,9 +205,9 @@ static u64 __init get_vmcore_size_elf64(char *elfptr)
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr));
-	size = sizeof(Elf64_Ehdr) + ((ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr));
+	size = elfsz;
 	for (i = 0; i < ehdr_ptr->e_phnum; i++) {
-		size += phdr_ptr->p_memsz;
+		size += roundup(phdr_ptr->p_memsz, PAGE_SIZE);
 		phdr_ptr++;
 	}
 	return size;
@@ -699,7 +699,7 @@ static int __init parse_crash_elf_headers(void)
 			return rc;
 
 		/* Determine vmcore size. */
-		vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+		vmcore_size = get_vmcore_size_elf64(elfcorebuf, elfcorebuf_sz);
 	} else if (e_ident[EI_CLASS] == ELFCLASS32) {
 		rc = parse_crash_elf32_headers();
 		if (rc)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/13] vmcore: copy non page-size aligned head and tail pages in 2nd kernel
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Due to mmap() requirement, we need to copy pages not starting or
ending with page-size aligned address in 2nd kernel and to map them to
user-space.

For example, see the map below:

    00000000-0001ffff : reserved
    00010000-0009f7ff : System RAM
    0009f800-0009ffff : reserved

where the System RAM ends with 0x9f800 that is not page-size
aligned. This map is divided into two parts:

    00010000-0009dfff
    0009f000-0009f7ff

and the first one is kept in old memory and the 2nd one is copied into
buffer on 2nd kernel.

This kind of non-page-size-aligned area can always occur since any
part of System RAM can be converted into reserved area at runtime.

If not doing copying like this and if remapping non page-size aligned
pages on old memory directly, mmap() had to export memory which is not
dump target to user-space. In the above example this is reserved
0x9f800-0xa0000.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   97 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 2968e5a..99f5673 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -446,11 +446,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 						size_t elfsz,
 						struct list_head *vc_list)
 {
-	int i;
+	int i, rc;
 	Elf64_Ehdr *ehdr_ptr;
 	Elf64_Phdr *phdr_ptr;
 	loff_t vmcore_off;
-	struct vmcore *new;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
@@ -460,20 +459,98 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 						  PAGE_SIZE);
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 start, end, rest;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
-		/* Add this contiguous chunk of memory to vmcore list.*/
-		new = get_new_element();
-		if (!new)
-			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
-		list_add_tail(&new->list, vc_list);
+		start = phdr_ptr->p_offset;
+		end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
+		rest = phdr_ptr->p_memsz;
+
+		if (start & ~PAGE_MASK) {
+			u64 paddr, len;
+			char *buf;
+			struct vmcore *new;
+
+			paddr = start;
+			len = min(roundup(start,PAGE_SIZE), end) - start;
+
+			buf = (char *)get_zeroed_page(GFP_KERNEL);
+			if (!buf)
+				return -ENOMEM;
+			rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
+					      &paddr, 0);
+			if (rc < 0) {
+				free_pages((unsigned long)buf, 0);
+				return rc;
+			}
+
+			new = get_new_element();
+			if (!new) {
+				free_pages((unsigned long)buf, 0);
+				return -ENOMEM;
+			}
+			new->type = VMCORE_2ND_KERNEL;
+			new->size = PAGE_SIZE;
+			new->buf = buf;
+			list_add_tail(&new->list, vc_list);
+
+			rest -= len;
+		}
+
+		if (rest > 0 &&
+		    roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
+			u64 paddr, len;
+			struct vmcore *new;
+
+			paddr = roundup(start, PAGE_SIZE);
+			len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
+
+			new = get_new_element();
+			if (!new)
+				return -ENOMEM;
+			new->type = VMCORE_OLD_MEMORY;
+			new->paddr = paddr;
+			new->size = len;
+			list_add_tail(&new->list, vc_list);
+
+			rest -= new->size;
+		}
+
+		if (rest > 0) {
+			u64 paddr, len;
+			char *buf;
+			struct vmcore *new;
+
+			paddr = rounddown(end, PAGE_SIZE);
+			len = end - rounddown(end, PAGE_SIZE);
+
+			buf = (char *)get_zeroed_page(GFP_KERNEL);
+			if (!buf)
+				return -ENOMEM;
+			rc = read_from_oldmem(buf, len, &paddr, 0);
+			if (rc < 0) {
+				free_pages((unsigned long)buf, 0);
+				return rc;
+			}
+
+			new = get_new_element();
+			if (!new) {
+				free_pages((unsigned long)buf, 0);
+				return -ENOMEM;
+			}
+			new->type = VMCORE_2ND_KERNEL;
+			new->size = PAGE_SIZE;
+			new->buf = buf;
+			list_add_tail(&new->list, vc_list);
+
+			rest -= len;
+		}
 
 		/* Update the program header offset. */
 		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);
 	}
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/13] vmcore: copy non page-size aligned head and tail pages in 2nd kernel
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

Due to mmap() requirement, we need to copy pages not starting or
ending with page-size aligned address in 2nd kernel and to map them to
user-space.

For example, see the map below:

    00000000-0001ffff : reserved
    00010000-0009f7ff : System RAM
    0009f800-0009ffff : reserved

where the System RAM ends with 0x9f800 that is not page-size
aligned. This map is divided into two parts:

    00010000-0009dfff
    0009f000-0009f7ff

and the first one is kept in old memory and the 2nd one is copied into
buffer on 2nd kernel.

This kind of non-page-size-aligned area can always occur since any
part of System RAM can be converted into reserved area at runtime.

If not doing copying like this and if remapping non page-size aligned
pages on old memory directly, mmap() had to export memory which is not
dump target to user-space. In the above example this is reserved
0x9f800-0xa0000.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   97 ++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 2968e5a..99f5673 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -446,11 +446,10 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 						size_t elfsz,
 						struct list_head *vc_list)
 {
-	int i;
+	int i, rc;
 	Elf64_Ehdr *ehdr_ptr;
 	Elf64_Phdr *phdr_ptr;
 	loff_t vmcore_off;
-	struct vmcore *new;
 
 	ehdr_ptr = (Elf64_Ehdr *)elfptr;
 	phdr_ptr = (Elf64_Phdr*)(elfptr + sizeof(Elf64_Ehdr)); /* PT_NOTE hdr */
@@ -460,20 +459,98 @@ static int __init process_ptload_program_headers_elf64(char *elfptr,
 						  PAGE_SIZE);
 
 	for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
+		u64 start, end, rest;
+
 		if (phdr_ptr->p_type != PT_LOAD)
 			continue;
 
-		/* Add this contiguous chunk of memory to vmcore list.*/
-		new = get_new_element();
-		if (!new)
-			return -ENOMEM;
-		new->paddr = phdr_ptr->p_offset;
-		new->size = phdr_ptr->p_memsz;
-		list_add_tail(&new->list, vc_list);
+		start = phdr_ptr->p_offset;
+		end = phdr_ptr->p_offset + phdr_ptr->p_memsz;
+		rest = phdr_ptr->p_memsz;
+
+		if (start & ~PAGE_MASK) {
+			u64 paddr, len;
+			char *buf;
+			struct vmcore *new;
+
+			paddr = start;
+			len = min(roundup(start,PAGE_SIZE), end) - start;
+
+			buf = (char *)get_zeroed_page(GFP_KERNEL);
+			if (!buf)
+				return -ENOMEM;
+			rc = read_from_oldmem(buf + (start & ~PAGE_MASK), len,
+					      &paddr, 0);
+			if (rc < 0) {
+				free_pages((unsigned long)buf, 0);
+				return rc;
+			}
+
+			new = get_new_element();
+			if (!new) {
+				free_pages((unsigned long)buf, 0);
+				return -ENOMEM;
+			}
+			new->type = VMCORE_2ND_KERNEL;
+			new->size = PAGE_SIZE;
+			new->buf = buf;
+			list_add_tail(&new->list, vc_list);
+
+			rest -= len;
+		}
+
+		if (rest > 0 &&
+		    roundup(start, PAGE_SIZE) < rounddown(end, PAGE_SIZE)) {
+			u64 paddr, len;
+			struct vmcore *new;
+
+			paddr = roundup(start, PAGE_SIZE);
+			len =rounddown(end,PAGE_SIZE)-roundup(start,PAGE_SIZE);
+
+			new = get_new_element();
+			if (!new)
+				return -ENOMEM;
+			new->type = VMCORE_OLD_MEMORY;
+			new->paddr = paddr;
+			new->size = len;
+			list_add_tail(&new->list, vc_list);
+
+			rest -= new->size;
+		}
+
+		if (rest > 0) {
+			u64 paddr, len;
+			char *buf;
+			struct vmcore *new;
+
+			paddr = rounddown(end, PAGE_SIZE);
+			len = end - rounddown(end, PAGE_SIZE);
+
+			buf = (char *)get_zeroed_page(GFP_KERNEL);
+			if (!buf)
+				return -ENOMEM;
+			rc = read_from_oldmem(buf, len, &paddr, 0);
+			if (rc < 0) {
+				free_pages((unsigned long)buf, 0);
+				return rc;
+			}
+
+			new = get_new_element();
+			if (!new) {
+				free_pages((unsigned long)buf, 0);
+				return -ENOMEM;
+			}
+			new->type = VMCORE_2ND_KERNEL;
+			new->size = PAGE_SIZE;
+			new->buf = buf;
+			list_add_tail(&new->list, vc_list);
+
+			rest -= len;
+		}
 
 		/* Update the program header offset. */
 		phdr_ptr->p_offset = vmcore_off;
-		vmcore_off = vmcore_off + phdr_ptr->p_memsz;
+		vmcore_off +=roundup(end,PAGE_SIZE)-rounddown(start,PAGE_SIZE);
 	}
 	return 0;
 }


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/13] vmcore: introduce mmap_vmcore()
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

This patch introduces mmap_vmcore().

If vmcore object has VMCORE_OLD_MEMORY type, remaped is a page on old
memory. If vmcore object has VMCORE_2ND_KERNEL type, remaped is buffer
on 2nd kernel.

Neither writable nor executable mapping is permitted even with
mprotect(). Non-writable mapping is also requirement of
remap_pfn_range() when mapping linear pags on non-consequtive physical
pages; see is_cow_mapping().

On ELF32 mmap() is not suppoted, returning -ENODEV, since then dump
file size must be less than 4GB; exiting read() interface is enough.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 99f5673..f521480 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -186,9 +186,85 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return acc;
 }
 
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned char *e_ident = (unsigned char *)elfcorebuf;
+	size_t size = vma->vm_end - vma->vm_start;
+	u64 start, end, len, tsz;
+	struct vmcore *m;
+
+	if (e_ident[EI_CLASS] == ELFCLASS32)
+		return -ENODEV;
+
+	start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	end = start + size;
+
+	if (size > vmcore_size || end > vmcore_size)
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+	len = 0;
+
+	if (start < elfcorebuf_sz) {
+		u64 pfn;
+
+		tsz = elfcorebuf_sz - start;
+		if (size < tsz)
+			tsz = size;
+		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+				    vma->vm_page_prot))
+			return -EAGAIN;
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (start < m->offset + m->size) {
+			u64 pfn = 0;
+
+			tsz = m->offset + m->size - start;
+			if (size < tsz)
+				tsz = size;
+			switch (m->type) {
+			case VMCORE_OLD_MEMORY:
+				pfn = (m->paddr + (start - m->offset))
+					>> PAGE_SHIFT;
+				break;
+			case VMCORE_2ND_KERNEL:
+				pfn = __pa(m->buf + start - m->offset)
+					>> PAGE_SHIFT;
+				break;
+			}
+			if (remap_pfn_range(vma, vma->vm_start + len, pfn, tsz,
+					    vma->vm_page_prot)) {
+				do_munmap(vma->vm_mm, vma->vm_start, len);
+				return -EAGAIN;
+			}
+			size -= tsz;
+			start += tsz;
+			len += tsz;
+
+			if (size == 0)
+				return 0;
+		}
+	}
+
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
+	.mmap		= mmap_vmcore,
 };
 
 static struct vmcore* __init get_new_element(void)


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 13/13] vmcore: introduce mmap_vmcore()
@ 2013-02-14 10:12   ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-14 10:12 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

This patch introduces mmap_vmcore().

If vmcore object has VMCORE_OLD_MEMORY type, remaped is a page on old
memory. If vmcore object has VMCORE_2ND_KERNEL type, remaped is buffer
on 2nd kernel.

Neither writable nor executable mapping is permitted even with
mprotect(). Non-writable mapping is also requirement of
remap_pfn_range() when mapping linear pags on non-consequtive physical
pages; see is_cow_mapping().

On ELF32 mmap() is not suppoted, returning -ENODEV, since then dump
file size must be less than 4GB; exiting read() interface is enough.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---

 fs/proc/vmcore.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 99f5673..f521480 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -186,9 +186,85 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return acc;
 }
 
+static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned char *e_ident = (unsigned char *)elfcorebuf;
+	size_t size = vma->vm_end - vma->vm_start;
+	u64 start, end, len, tsz;
+	struct vmcore *m;
+
+	if (e_ident[EI_CLASS] == ELFCLASS32)
+		return -ENODEV;
+
+	start = (u64)vma->vm_pgoff << PAGE_SHIFT;
+	end = start + size;
+
+	if (size > vmcore_size || end > vmcore_size)
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+	len = 0;
+
+	if (start < elfcorebuf_sz) {
+		u64 pfn;
+
+		tsz = elfcorebuf_sz - start;
+		if (size < tsz)
+			tsz = size;
+		pfn = __pa(elfcorebuf + start) >> PAGE_SHIFT;
+		if (remap_pfn_range(vma, vma->vm_start, pfn, tsz,
+				    vma->vm_page_prot))
+			return -EAGAIN;
+		size -= tsz;
+		start += tsz;
+		len += tsz;
+
+		if (size == 0)
+			return 0;
+	}
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (start < m->offset + m->size) {
+			u64 pfn = 0;
+
+			tsz = m->offset + m->size - start;
+			if (size < tsz)
+				tsz = size;
+			switch (m->type) {
+			case VMCORE_OLD_MEMORY:
+				pfn = (m->paddr + (start - m->offset))
+					>> PAGE_SHIFT;
+				break;
+			case VMCORE_2ND_KERNEL:
+				pfn = __pa(m->buf + start - m->offset)
+					>> PAGE_SHIFT;
+				break;
+			}
+			if (remap_pfn_range(vma, vma->vm_start + len, pfn, tsz,
+					    vma->vm_page_prot)) {
+				do_munmap(vma->vm_mm, vma->vm_start, len);
+				return -EAGAIN;
+			}
+			size -= tsz;
+			start += tsz;
+			len += tsz;
+
+			if (size == 0)
+				return 0;
+		}
+	}
+
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
+	.mmap		= mmap_vmcore,
 };
 
 static struct vmcore* __init get_new_element(void)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-02-15  3:57   ` Atsushi Kumagai
  -1 siblings, 0 replies; 66+ messages in thread
From: Atsushi Kumagai @ 2013-02-15  3:57 UTC (permalink / raw)
  To: d.hatayama; +Cc: ebiederm, vgoyal, cpw, lisa.mitchell, kexec, linux-kernel

Hello HATAYAMA-san,

On Thu, 14 Feb 2013 19:11:43 +0900
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
> 
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance. My simple benchmark shows the improvement
> from 200 [MiB/sec] to over 50.0 [GiB/sec].

Thanks for your hard work, I think it's a good enough improvement.
 
> Benchmark
> =========
> 
> = Machine spec
>   - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*)
>   - memory: 32GB
>   - kernel: 3.8-rc6 with this patch
>   - vmcore size: 31.7GB
> 
>   (*) only 1 cpu is used in the 2nd kernel now.
> 
> = Benchmark Case
> 
> 1) copy /proc/vmcore *WITHOUT* mmap() on /proc/vmcore
> 
> $ time dd bs=4096 if=/proc/vmcore of=/dev/null
> 8307246+1 records in
> 8307246+1 records out
> real    2m 31.50s
> user    0m 1.06s
> sys     2m 27.60s
> 
> So performance is 214.26 [MiB/sec].
> 
> 2) copy /proc/vmcore with mmap()
> 
>   I ran the next command and recorded real time:
> 
>   $ for n in $(seq 1 15) ; do \
>   >   time copyvmcore2 --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \
>   > done
> 
>   where copyvmcore2 is an ad-hoc test tool that read data from
>   /proc/vmcore via mmap() in given block-size unit and write them to
>   some file.
> 
> |  n | map size |  time | page table | performance |
> |    |          | (sec) |            |   [GiB/sec] |
> |----+----------+-------+------------+-------------|
> |  1 | 4 KiB    | 78.35 | 8 iB       |        0.40 |
> |  2 | 8 KiB    | 45.29 | 16 iB      |        0.70 |
> |  3 | 16 KiB   | 23.82 | 32 iB      |        1.33 |
> |  4 | 32 KiB   | 12.90 | 64 iB      |        2.46 |
> |  5 | 64 KiB   |  6.13 | 128 iB     |        5.17 |
> |  6 | 128 KiB  |  3.26 | 256 iB     |        9.72 |
> |  7 | 256 KiB  |  1.86 | 512 iB     |       17.04 |
> |  8 | 512 KiB  |  1.13 | 1 KiB      |       28.04 |
> |  9 | 1 MiB    |  0.77 | 2 KiB      |       41.16 |
> | 10 | 2 MiB    |  0.58 | 4 KiB      |       54.64 |
> | 11 | 4 MiB    |  0.50 | 8 KiB      |       63.38 |
> | 12 | 8 MiB    |  0.46 | 16 KiB     |       68.89 |
> | 13 | 16 MiB   |  0.44 | 32 KiB     |       72.02 |
> | 14 | 32 MiB   |  0.44 | 64 KiB     |       72.02 |
> | 15 | 64 MiB   |  0.45 | 128 KiB    |       70.42 |
> 
> 3) copy /proc/vmcore with mmap() on /dev/oldmem
> 
> I posted another patch series for mmap() on /dev/oldmem a few weeks ago.
> See: https://lkml.org/lkml/2013/2/3/431
> 
> Next is the table shown on the post showing the benchmark.
> 
> |  n | map size |  time | page table | performance |
> |    |          | (sec) |            |   [GiB/sec] |
> |----+----------+-------+------------+-------------|
> |  1 | 4 KiB    | 41.86 | 8 iB       |        0.76 |
> |  2 | 8 KiB    | 25.43 | 16 iB      |        1.25 |
> |  3 | 16 KiB   | 13.28 | 32 iB      |        2.39 |
> |  4 | 32 KiB   |  7.20 | 64 iB      |        4.40 |
> |  5 | 64 KiB   |  3.45 | 128 iB     |        9.19 |
> |  6 | 128 KiB  |  1.82 | 256 iB     |       17.42 |
> |  7 | 256 KiB  |  1.03 | 512 iB     |       30.78 |
> |  8 | 512 KiB  |  0.61 | 1K iB      |       51.97 |
> |  9 | 1 MiB    |  0.41 | 2K iB      |       77.32 |
> | 10 | 2 MiB    |  0.32 | 4K iB      |       99.06 |
> | 11 | 4 MiB    |  0.27 | 8K iB      |      117.41 |
> | 12 | 8 MiB    |  0.24 | 16 KiB     |      132.08 |
> | 13 | 16 MiB   |  0.23 | 32 KiB     |      137.83 |
> | 14 | 32 MiB   |  0.22 | 64 KiB     |      144.09 |
> | 15 | 64 MiB   |  0.22 | 128 KiB    |      144.09 |
> 
> = Discussion
> 
> - For small map size, we can see performance degradation on mmap()
>   case due to many page table modification and TLB flushes similarly
>   to read_oldmem() case. But for large map size we can see the
>   improved performance.
> 
>   Each application need to choose appropreate map size for their
>   preferable performance.
> 
> - mmap() on /dev/oldmem appears better than that on /proc/vmcore. But
>   actual processing does not only copying but also IO work. This
>   difference is not a problem.

To keep the makedumpfile code simple, I wouldn't like to use /dev/oldmem
as another input interface. And I hope that we can get enough performance 
with only /proc/vmcore.

> - Both mmap() case shows drastically better performance than previous
>   RFC patch set's about 2.5 [GiB/sec] that maps all dump target memory
>   in kernel direct mapping address space. This is because there's no
>   longer memcpy() from kernel-space to user-space.
> 
> Design
> ======
> 
> = Support Range
> 
> - mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32
>   interface is used only if dump target size is less than 4GB. Then,
>   the existing interface is enough in performance.
> 
> = Change of /proc/vmcore format
> 
> For mmap()'s page-size boundary requirement, /proc/vmcore changed its
> own shape and now put its objects in page-size boundary.
> 
> - Allocate buffer for ELF headers in page-size boundary.
>   => See [PATCH 01/13].
> 
> - Note objects scattered on old memory are copied in a single
>   page-size aligned buffer on 2nd kernel, and it is remapped to
>   user-space.
>   => See [PATCH 09/13].
>   
> - The head and/or tail pages of memroy chunks are also copied on 2nd
>   kernel if either of their ends is not page-size aligned. See
>   => See [PATCH 12/13].
> 
> = 32-bit PAE limitation
> 
> - On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory
>   only since remap_pfn_range()'s third argument, pfn, has 32-bit
>   length only, defined as unsigned long type.
> 
> TODO
> ====
> 
> - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
>   confirm whether we can see enough performance improvement.

As a first step, I'll make a prototype patch for benchmarking unless you
have already done it.


Thanks
Atsushi Kumagai

> 
> Test
> ====
> 
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
> 
> ---
> 
> HATAYAMA Daisuke (13):
>       vmcore: introduce mmap_vmcore()
>       vmcore: copy non page-size aligned head and tail pages in 2nd kernel
>       vmcore: count holes generated by round-up operation for vmcore size
>       vmcore: round-up offset of vmcore object in page-size boundary
>       vmcore: copy ELF note segments in buffer on 2nd kernel
>       vmcore: remove unused helper function
>       vmcore: modify read_vmcore() to read buffer on 2nd kernel
>       vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
>       vmcore: modify ELF32 code according to new type
>       vmcore: introduce types for objects copied in 2nd kernel
>       vmcore: fill unused part of buffer for ELF headers with 0
>       vmcore: round up buffer size of ELF headers by PAGE_SIZE
>       vmcore: allocate buffer for ELF headers on page-size alignment
> 
> 
>  fs/proc/vmcore.c        |  408 +++++++++++++++++++++++++++++++++++------------
>  include/linux/proc_fs.h |   11 +
>  2 files changed, 313 insertions(+), 106 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-02-15  3:57   ` Atsushi Kumagai
  0 siblings, 0 replies; 66+ messages in thread
From: Atsushi Kumagai @ 2013-02-15  3:57 UTC (permalink / raw)
  To: d.hatayama; +Cc: kexec, linux-kernel, lisa.mitchell, ebiederm, cpw, vgoyal

Hello HATAYAMA-san,

On Thu, 14 Feb 2013 19:11:43 +0900
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

> Currently, read to /proc/vmcore is done by read_oldmem() that uses
> ioremap/iounmap per a single page. For example, if memory is 1GB,
> ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144
> times. This causes big performance degradation.
> 
> To address the issue, this patch implements mmap() on /proc/vmcore to
> improve read performance. My simple benchmark shows the improvement
> from 200 [MiB/sec] to over 50.0 [GiB/sec].

Thanks for your hard work, I think it's a good enough improvement.
 
> Benchmark
> =========
> 
> = Machine spec
>   - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*)
>   - memory: 32GB
>   - kernel: 3.8-rc6 with this patch
>   - vmcore size: 31.7GB
> 
>   (*) only 1 cpu is used in the 2nd kernel now.
> 
> = Benchmark Case
> 
> 1) copy /proc/vmcore *WITHOUT* mmap() on /proc/vmcore
> 
> $ time dd bs=4096 if=/proc/vmcore of=/dev/null
> 8307246+1 records in
> 8307246+1 records out
> real    2m 31.50s
> user    0m 1.06s
> sys     2m 27.60s
> 
> So performance is 214.26 [MiB/sec].
> 
> 2) copy /proc/vmcore with mmap()
> 
>   I ran the next command and recorded real time:
> 
>   $ for n in $(seq 1 15) ; do \
>   >   time copyvmcore2 --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \
>   > done
> 
>   where copyvmcore2 is an ad-hoc test tool that read data from
>   /proc/vmcore via mmap() in given block-size unit and write them to
>   some file.
> 
> |  n | map size |  time | page table | performance |
> |    |          | (sec) |            |   [GiB/sec] |
> |----+----------+-------+------------+-------------|
> |  1 | 4 KiB    | 78.35 | 8 iB       |        0.40 |
> |  2 | 8 KiB    | 45.29 | 16 iB      |        0.70 |
> |  3 | 16 KiB   | 23.82 | 32 iB      |        1.33 |
> |  4 | 32 KiB   | 12.90 | 64 iB      |        2.46 |
> |  5 | 64 KiB   |  6.13 | 128 iB     |        5.17 |
> |  6 | 128 KiB  |  3.26 | 256 iB     |        9.72 |
> |  7 | 256 KiB  |  1.86 | 512 iB     |       17.04 |
> |  8 | 512 KiB  |  1.13 | 1 KiB      |       28.04 |
> |  9 | 1 MiB    |  0.77 | 2 KiB      |       41.16 |
> | 10 | 2 MiB    |  0.58 | 4 KiB      |       54.64 |
> | 11 | 4 MiB    |  0.50 | 8 KiB      |       63.38 |
> | 12 | 8 MiB    |  0.46 | 16 KiB     |       68.89 |
> | 13 | 16 MiB   |  0.44 | 32 KiB     |       72.02 |
> | 14 | 32 MiB   |  0.44 | 64 KiB     |       72.02 |
> | 15 | 64 MiB   |  0.45 | 128 KiB    |       70.42 |
> 
> 3) copy /proc/vmcore with mmap() on /dev/oldmem
> 
> I posted another patch series for mmap() on /dev/oldmem a few weeks ago.
> See: https://lkml.org/lkml/2013/2/3/431
> 
> Next is the table shown on the post showing the benchmark.
> 
> |  n | map size |  time | page table | performance |
> |    |          | (sec) |            |   [GiB/sec] |
> |----+----------+-------+------------+-------------|
> |  1 | 4 KiB    | 41.86 | 8 iB       |        0.76 |
> |  2 | 8 KiB    | 25.43 | 16 iB      |        1.25 |
> |  3 | 16 KiB   | 13.28 | 32 iB      |        2.39 |
> |  4 | 32 KiB   |  7.20 | 64 iB      |        4.40 |
> |  5 | 64 KiB   |  3.45 | 128 iB     |        9.19 |
> |  6 | 128 KiB  |  1.82 | 256 iB     |       17.42 |
> |  7 | 256 KiB  |  1.03 | 512 iB     |       30.78 |
> |  8 | 512 KiB  |  0.61 | 1K iB      |       51.97 |
> |  9 | 1 MiB    |  0.41 | 2K iB      |       77.32 |
> | 10 | 2 MiB    |  0.32 | 4K iB      |       99.06 |
> | 11 | 4 MiB    |  0.27 | 8K iB      |      117.41 |
> | 12 | 8 MiB    |  0.24 | 16 KiB     |      132.08 |
> | 13 | 16 MiB   |  0.23 | 32 KiB     |      137.83 |
> | 14 | 32 MiB   |  0.22 | 64 KiB     |      144.09 |
> | 15 | 64 MiB   |  0.22 | 128 KiB    |      144.09 |
> 
> = Discussion
> 
> - For small map size, we can see performance degradation on mmap()
>   case due to many page table modification and TLB flushes similarly
>   to read_oldmem() case. But for large map size we can see the
>   improved performance.
> 
>   Each application need to choose appropreate map size for their
>   preferable performance.
> 
> - mmap() on /dev/oldmem appears better than that on /proc/vmcore. But
>   actual processing does not only copying but also IO work. This
>   difference is not a problem.

To keep the makedumpfile code simple, I wouldn't like to use /dev/oldmem
as another input interface. And I hope that we can get enough performance 
with only /proc/vmcore.

> - Both mmap() case shows drastically better performance than previous
>   RFC patch set's about 2.5 [GiB/sec] that maps all dump target memory
>   in kernel direct mapping address space. This is because there's no
>   longer memcpy() from kernel-space to user-space.
> 
> Design
> ======
> 
> = Support Range
> 
> - mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32
>   interface is used only if dump target size is less than 4GB. Then,
>   the existing interface is enough in performance.
> 
> = Change of /proc/vmcore format
> 
> For mmap()'s page-size boundary requirement, /proc/vmcore changed its
> own shape and now put its objects in page-size boundary.
> 
> - Allocate buffer for ELF headers in page-size boundary.
>   => See [PATCH 01/13].
> 
> - Note objects scattered on old memory are copied in a single
>   page-size aligned buffer on 2nd kernel, and it is remapped to
>   user-space.
>   => See [PATCH 09/13].
>   
> - The head and/or tail pages of memroy chunks are also copied on 2nd
>   kernel if either of their ends is not page-size aligned. See
>   => See [PATCH 12/13].
> 
> = 32-bit PAE limitation
> 
> - On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory
>   only since remap_pfn_range()'s third argument, pfn, has 32-bit
>   length only, defined as unsigned long type.
> 
> TODO
> ====
> 
> - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
>   confirm whether we can see enough performance improvement.

As a first step, I'll make a prototype patch for benchmarking unless you
have already done it.


Thanks
Atsushi Kumagai

> 
> Test
> ====
> 
> Done on x86-64, x86-32 both with 1GB and over 4GB memory environments.
> 
> ---
> 
> HATAYAMA Daisuke (13):
>       vmcore: introduce mmap_vmcore()
>       vmcore: copy non page-size aligned head and tail pages in 2nd kernel
>       vmcore: count holes generated by round-up operation for vmcore size
>       vmcore: round-up offset of vmcore object in page-size boundary
>       vmcore: copy ELF note segments in buffer on 2nd kernel
>       vmcore: remove unused helper function
>       vmcore: modify read_vmcore() to read buffer on 2nd kernel
>       vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
>       vmcore: modify ELF32 code according to new type
>       vmcore: introduce types for objects copied in 2nd kernel
>       vmcore: fill unused part of buffer for ELF headers with 0
>       vmcore: round up buffer size of ELF headers by PAGE_SIZE
>       vmcore: allocate buffer for ELF headers on page-size alignment
> 
> 
>  fs/proc/vmcore.c        |  408 +++++++++++++++++++++++++++++++++++------------
>  include/linux/proc_fs.h |   11 +
>  2 files changed, 313 insertions(+), 106 deletions(-)
> 
> -- 
> 
> Thanks.
> HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment
  2013-02-14 10:11   ` HATAYAMA Daisuke
@ 2013-02-15 15:01     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:01 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:11:48PM +0900, HATAYAMA Daisuke wrote:

[..]

I think it is a good idea to copy andrew morton for this patch series.
Generally he is the one who pulls in all the kexec/kdump related patches.

> ---
> 
>  fs/proc/vmcore.c |   30 +++++++++++++++++++++---------
>  1 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 0d5071d..85714c3 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -30,6 +30,7 @@ static LIST_HEAD(vmcore_list);
>  /* Stores the pointer to the buffer containing kernel elf core headers. */
>  static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>  
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
> @@ -560,26 +561,31 @@ static int __init parse_crash_elf64_headers(void)
>  
>  	/* Read in all elf headers. */
>  	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = elfcorebuf_sz;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz));
							    ^^^
Just a minor nit. Can you use elfcorebuf_sz_orig instead of elfcorebuf_sz
for allocation. This just makes it inline with free_pages() later.
elfcorebuf_sz_orig size pages are allocated and same size is freed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment
@ 2013-02-15 15:01     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:01 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:11:48PM +0900, HATAYAMA Daisuke wrote:

[..]

I think it is a good idea to copy andrew morton for this patch series.
Generally he is the one who pulls in all the kexec/kdump related patches.

> ---
> 
>  fs/proc/vmcore.c |   30 +++++++++++++++++++++---------
>  1 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 0d5071d..85714c3 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -30,6 +30,7 @@ static LIST_HEAD(vmcore_list);
>  /* Stores the pointer to the buffer containing kernel elf core headers. */
>  static char *elfcorebuf;
>  static size_t elfcorebuf_sz;
> +static size_t elfcorebuf_sz_orig;
>  
>  /* Total size of vmcore file. */
>  static u64 vmcore_size;
> @@ -560,26 +561,31 @@ static int __init parse_crash_elf64_headers(void)
>  
>  	/* Read in all elf headers. */
>  	elfcorebuf_sz = sizeof(Elf64_Ehdr) + ehdr.e_phnum * sizeof(Elf64_Phdr);
> -	elfcorebuf = kmalloc(elfcorebuf_sz, GFP_KERNEL);
> +	elfcorebuf_sz_orig = elfcorebuf_sz;
> +	elfcorebuf = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
> +					       get_order(elfcorebuf_sz));
							    ^^^
Just a minor nit. Can you use elfcorebuf_sz_orig instead of elfcorebuf_sz
for allocation. This just makes it inline with free_pages() later.
elfcorebuf_sz_orig size pages are allocated and same size is freed.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
  2013-02-14 10:11   ` HATAYAMA Daisuke
@ 2013-02-15 15:18     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:18 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:11:54PM +0900, HATAYAMA Daisuke wrote:
> To satisfy mmap() page-size boundary requirement, reound up buffer
> size of ELF headers by PAGE_SIZE. The resulting value becomes offset
> of ELF note segments and it's assigned in unique PT_NOTE program
> header entry.
> 
> Also, some part that assumes past ELF headers' size is replaced by
> this new rounded-up value.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |    9 +++++----
>  1 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 85714c3..5010ead 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -313,7 +313,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -331,6 +331,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>  

Hi Hatayama,

While reading the /proc/vmcore code again, I realized that we are making
a horrible assumption. And that is that all PT_NOTE program headers
prepared by kexec-tools are contiguous. And we also seem to be assuming
that all PT_NOTE phdrs are following immediately Elf Header.

        /* Add merged PT_NOTE program header*/
        tmp = elfptr + sizeof(Elf64_Ehdr);
        memcpy(tmp, &phdr, sizeof(phdr));
        tmp += sizeof(phdr);

        /* Remove unwanted PT_NOTE program headers. */
        i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
        *elfsz = *elfsz - i;
        memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));

I know I wrote this code but now I realize that this is very bad
assumption. We should not be assuming where PT_NOTE headers are and
also should not be assuming that these are contiguous.

This will require fixing. I think we just need to read old elf headers
in a buffer and prepare new headers (merged one) in a separate buffer
instead of trying to make do with single buffer.

If it is not too much of trouble, can you please do this cleanup and
rebase your patches on top of it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
@ 2013-02-15 15:18     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:18 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:11:54PM +0900, HATAYAMA Daisuke wrote:
> To satisfy mmap() page-size boundary requirement, reound up buffer
> size of ELF headers by PAGE_SIZE. The resulting value becomes offset
> of ELF note segments and it's assigned in unique PT_NOTE program
> header entry.
> 
> Also, some part that assumes past ELF headers' size is replaced by
> this new rounded-up value.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |    9 +++++----
>  1 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 85714c3..5010ead 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -313,7 +313,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	phdr.p_flags   = 0;
>  	note_off = sizeof(Elf64_Ehdr) +
>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
> -	phdr.p_offset  = note_off;
> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>  	phdr.p_align   = 0;
> @@ -331,6 +331,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>  	/* Modify e_phnum to reflect merged headers. */
>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>  

Hi Hatayama,

While reading the /proc/vmcore code again, I realized that we are making
a horrible assumption. And that is that all PT_NOTE program headers
prepared by kexec-tools are contiguous. And we also seem to be assuming
that all PT_NOTE phdrs are following immediately Elf Header.

        /* Add merged PT_NOTE program header*/
        tmp = elfptr + sizeof(Elf64_Ehdr);
        memcpy(tmp, &phdr, sizeof(phdr));
        tmp += sizeof(phdr);

        /* Remove unwanted PT_NOTE program headers. */
        i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
        *elfsz = *elfsz - i;
        memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));

I know I wrote this code but now I realize that this is very bad
assumption. We should not be assuming where PT_NOTE headers are and
also should not be assuming that these are contiguous.

This will require fixing. I think we just need to read old elf headers
in a buffer and prepare new headers (merged one) in a separate buffer
instead of trying to make do with single buffer.

If it is not too much of trouble, can you please do this cleanup and
rebase your patches on top of it.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
  2013-02-14 10:12   ` HATAYAMA Daisuke
@ 2013-02-15 15:28     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:28 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:12:05PM +0900, HATAYAMA Daisuke wrote:
> Some parts of old memory need to be copied in buffers on 2nd kernel to
> be remapped to user-space. To distinguish objects in the buffer on 2nd
> kernel and the ones on old memory, enum vmcore_type is introduced: the
> object in the buffer on 2nd kernel has VMCORE_2ND_KERNEL type, and the
> one on old memory has VMCORE_OLD_MEMORY type.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  include/linux/proc_fs.h |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 32676b3..4b153ed 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -97,11 +97,20 @@ struct kcore_list {
>  	int type;
>  };
>  
> +enum vmcore_type {
> +	VMCORE_OLD_MEMORY,
> +	VMCORE_2ND_KERNEL,

This VMCORE_2ND_KERNEL tag looks bad.

How about introducing a "unsigned int flag" element in "struct vmcore"
and set the flag MEM_TYPE_OLDMEM for any contents which come from oldmem.

If MEM_TYPE_OLDMEM is not set, it is assumed that contents are to be
fetched from current kernel using pointer vmcore->buf.

BTW, which elements you need to copy from first kernel?

> +};
> +
>  struct vmcore {
>  	struct list_head list;
> -	unsigned long long paddr;
> +	union {
> +		unsigned long long paddr;
> +		char *buf;
> +	};
>  	unsigned long long size;
>  	loff_t offset;
> +	enum vmcore_type type;
>  };
>  

Thanks
Vivek

>  #ifdef CONFIG_PROC_FS

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
@ 2013-02-15 15:28     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:28 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:12:05PM +0900, HATAYAMA Daisuke wrote:
> Some parts of old memory need to be copied in buffers on 2nd kernel to
> be remapped to user-space. To distinguish objects in the buffer on 2nd
> kernel and the ones on old memory, enum vmcore_type is introduced: the
> object in the buffer on 2nd kernel has VMCORE_2ND_KERNEL type, and the
> one on old memory has VMCORE_OLD_MEMORY type.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  include/linux/proc_fs.h |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 32676b3..4b153ed 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -97,11 +97,20 @@ struct kcore_list {
>  	int type;
>  };
>  
> +enum vmcore_type {
> +	VMCORE_OLD_MEMORY,
> +	VMCORE_2ND_KERNEL,

This VMCORE_2ND_KERNEL tag looks bad.

How about introducing a "unsigned int flag" element in "struct vmcore"
and set the flag MEM_TYPE_OLDMEM for any contents which come from oldmem.

If MEM_TYPE_OLDMEM is not set, it is assumed that contents are to be
fetched from current kernel using pointer vmcore->buf.

BTW, which elements you need to copy from first kernel?

> +};
> +
>  struct vmcore {
>  	struct list_head list;
> -	unsigned long long paddr;
> +	union {
> +		unsigned long long paddr;
> +		char *buf;
> +	};
>  	unsigned long long size;
>  	loff_t offset;
> +	enum vmcore_type type;
>  };
>  

Thanks
Vivek

>  #ifdef CONFIG_PROC_FS

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/13] vmcore: modify ELF32 code according to new type
  2013-02-14 10:12   ` HATAYAMA Daisuke
@ 2013-02-15 15:30     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:30 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:12:10PM +0900, HATAYAMA Daisuke wrote:
> On elf32 mmap() is not supported. All vmcore objects are in old
> memory.

This is odd. Why can't we support mmap() when 32bit headers have been
prepared?

Thanks
Vivek

> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 43d338a..7e3f922 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -389,6 +389,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			kfree(notes_section);
>  			return -ENOMEM;
>  		}
> +		new->type = VMCORE_OLD_MEMORY;
>  		new->paddr = phdr_ptr->p_offset;
>  		new->size = real_sz;
>  		list_add_tail(&new->list, vc_list);
> @@ -486,6 +487,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  		new = get_new_element();
>  		if (!new)
>  			return -ENOMEM;
> +		new->type = VMCORE_OLD_MEMORY;
>  		new->paddr = phdr_ptr->p_offset;
>  		new->size = phdr_ptr->p_memsz;
>  		list_add_tail(&new->list, vc_list);

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/13] vmcore: modify ELF32 code according to new type
@ 2013-02-15 15:30     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:30 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:12:10PM +0900, HATAYAMA Daisuke wrote:
> On elf32 mmap() is not supported. All vmcore objects are in old
> memory.

This is odd. Why can't we support mmap() when 32bit headers have been
prepared?

Thanks
Vivek

> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 43d338a..7e3f922 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -389,6 +389,7 @@ static int __init merge_note_headers_elf32(char *elfptr, size_t *elfsz,
>  			kfree(notes_section);
>  			return -ENOMEM;
>  		}
> +		new->type = VMCORE_OLD_MEMORY;
>  		new->paddr = phdr_ptr->p_offset;
>  		new->size = real_sz;
>  		list_add_tail(&new->list, vc_list);
> @@ -486,6 +487,7 @@ static int __init process_ptload_program_headers_elf32(char *elfptr,
>  		new = get_new_element();
>  		if (!new)
>  			return -ENOMEM;
> +		new->type = VMCORE_OLD_MEMORY;
>  		new->paddr = phdr_ptr->p_offset;
>  		new->size = phdr_ptr->p_memsz;
>  		list_add_tail(&new->list, vc_list);

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
  2013-02-14 10:12   ` HATAYAMA Daisuke
@ 2013-02-15 15:32     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:32 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:12:16PM +0900, HATAYAMA Daisuke wrote:
> Vmcore object has buffer on 2nd kernel if it has VMCORE_2ND_KERNEL
> type, which needs to be freed.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 7e3f922..77e0a0e 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -735,6 +735,15 @@ void vmcore_cleanup(void)
>  		struct vmcore *m;
>  
>  		m = list_entry(pos, struct vmcore, list);
> +
> +		switch (m->type) {
> +		case VMCORE_OLD_MEMORY:
> +			break;
> +		case VMCORE_2ND_KERNEL:
> +			free_pages((unsigned long)m->buf, get_order(m->size));
> +			break;

I think order of patches is little wrong. None of the patches so far has
done any memory allocation for VMCORE_2ND_KERNEL, and we are already
freeing the memory which will be allocated in future patches. May be just
merge this patch with the patch which does memory allocation. No need
to separate it out.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel
@ 2013-02-15 15:32     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:32 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:12:16PM +0900, HATAYAMA Daisuke wrote:
> Vmcore object has buffer on 2nd kernel if it has VMCORE_2ND_KERNEL
> type, which needs to be freed.
> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 7e3f922..77e0a0e 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -735,6 +735,15 @@ void vmcore_cleanup(void)
>  		struct vmcore *m;
>  
>  		m = list_entry(pos, struct vmcore, list);
> +
> +		switch (m->type) {
> +		case VMCORE_OLD_MEMORY:
> +			break;
> +		case VMCORE_2ND_KERNEL:
> +			free_pages((unsigned long)m->buf, get_order(m->size));
> +			break;

I think order of patches is little wrong. None of the patches so far has
done any memory allocation for VMCORE_2ND_KERNEL, and we are already
freeing the memory which will be allocated in future patches. May be just
merge this patch with the patch which does memory allocation. No need
to separate it out.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/13] vmcore: modify read_vmcore() to read buffer on 2nd kernel
  2013-02-14 10:12   ` HATAYAMA Daisuke
@ 2013-02-15 15:51     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:51 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:12:21PM +0900, HATAYAMA Daisuke wrote:
> If a given vmcore object has VMCORE_2ND_KERNEL type, target data is in
> the buffer on 2nd kernel.
> 

Looks like this patch is doing two things.

- Cleanup how read is performed. Get rid of map_offset_to_paddr() and
  open code it and use list_for_each_entry(). 

- Read some memory from current kernel if VMCORE_2ND_KERNEL is set.

Can you break it down in two patches.

Thanks
Vivek

> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |   64 ++++++++++++++++++++++++++----------------------------
>  1 files changed, 31 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 77e0a0e..4125a65 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -146,8 +146,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  {
>  	ssize_t acc = 0, tmp;
>  	size_t tsz;
> -	u64 start, nr_bytes;
> -	struct vmcore *curr_m = NULL;
> +	struct vmcore *m;
>  
>  	if (buflen == 0 || *fpos >= vmcore_size)
>  		return 0;
> @@ -173,39 +172,38 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> -	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
> -	if (!curr_m)
> -        	return -EINVAL;
> -	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
> -		tsz = buflen;
> -
> -	/* Calculate left bytes in current memory segment. */
> -	nr_bytes = (curr_m->size - (start - curr_m->paddr));
> -	if (tsz > nr_bytes)
> -		tsz = nr_bytes;
> -
> -	while (buflen) {
> -		tmp = read_from_oldmem(buffer, tsz, &start, 1);
> -		if (tmp < 0)
> -			return tmp;
> -		buflen -= tsz;
> -		*fpos += tsz;
> -		buffer += tsz;
> -		acc += tsz;
> -		if (start >= (curr_m->paddr + curr_m->size)) {
> -			if (curr_m->list.next == &vmcore_list)
> -				return acc;	/*EOF*/
> -			curr_m = list_entry(curr_m->list.next,
> -						struct vmcore, list);
> -			start = curr_m->paddr;
> +	list_for_each_entry(m, &vmcore_list, list) {
> +		if (*fpos < m->offset + m->size) {
> +			tsz = m->offset + m->size - *fpos;
> +			if (buflen < tsz)
> +				tsz = buflen;
> +			switch (m->type) {
> +			case VMCORE_OLD_MEMORY: {
> +				u64 paddr = m->paddr + *fpos - m->offset;
> +
> +				tmp = read_from_oldmem(buffer, tsz, &paddr, 1);
> +				if (tmp < 0)
> +					return tmp;
> +				break;
> +			}
> +			case VMCORE_2ND_KERNEL:
> +				if (copy_to_user(buffer,
> +						 m->buf + (*fpos - m->offset),
> +						 tsz))
> +					return -EFAULT;
> +				break;
> +			}
> +			buflen -= tsz;
> +			*fpos += tsz;
> +			buffer += tsz;
> +			acc += tsz;
> +
> +			/* leave now if filled buffer already */
> +			if (buflen == 0)
> +				return acc;
>  		}
> -		if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
> -			tsz = buflen;
> -		/* Calculate left bytes in current memory segment. */
> -		nr_bytes = (curr_m->size - (start - curr_m->paddr));
> -		if (tsz > nr_bytes)
> -			tsz = nr_bytes;
>  	}
> +
>  	return acc;
>  }
>  

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/13] vmcore: modify read_vmcore() to read buffer on 2nd kernel
@ 2013-02-15 15:51     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:51 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:12:21PM +0900, HATAYAMA Daisuke wrote:
> If a given vmcore object has VMCORE_2ND_KERNEL type, target data is in
> the buffer on 2nd kernel.
> 

Looks like this patch is doing two things.

- Cleanup how read is performed. Get rid of map_offset_to_paddr() and
  open code it and use list_for_each_entry(). 

- Read some memory from current kernel if VMCORE_2ND_KERNEL is set.

Can you break it down in two patches.

Thanks
Vivek

> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |   64 ++++++++++++++++++++++++++----------------------------
>  1 files changed, 31 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 77e0a0e..4125a65 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -146,8 +146,7 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  {
>  	ssize_t acc = 0, tmp;
>  	size_t tsz;
> -	u64 start, nr_bytes;
> -	struct vmcore *curr_m = NULL;
> +	struct vmcore *m;
>  
>  	if (buflen == 0 || *fpos >= vmcore_size)
>  		return 0;
> @@ -173,39 +172,38 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
>  			return acc;
>  	}
>  
> -	start = map_offset_to_paddr(*fpos, &vmcore_list, &curr_m);
> -	if (!curr_m)
> -        	return -EINVAL;
> -	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
> -		tsz = buflen;
> -
> -	/* Calculate left bytes in current memory segment. */
> -	nr_bytes = (curr_m->size - (start - curr_m->paddr));
> -	if (tsz > nr_bytes)
> -		tsz = nr_bytes;
> -
> -	while (buflen) {
> -		tmp = read_from_oldmem(buffer, tsz, &start, 1);
> -		if (tmp < 0)
> -			return tmp;
> -		buflen -= tsz;
> -		*fpos += tsz;
> -		buffer += tsz;
> -		acc += tsz;
> -		if (start >= (curr_m->paddr + curr_m->size)) {
> -			if (curr_m->list.next == &vmcore_list)
> -				return acc;	/*EOF*/
> -			curr_m = list_entry(curr_m->list.next,
> -						struct vmcore, list);
> -			start = curr_m->paddr;
> +	list_for_each_entry(m, &vmcore_list, list) {
> +		if (*fpos < m->offset + m->size) {
> +			tsz = m->offset + m->size - *fpos;
> +			if (buflen < tsz)
> +				tsz = buflen;
> +			switch (m->type) {
> +			case VMCORE_OLD_MEMORY: {
> +				u64 paddr = m->paddr + *fpos - m->offset;
> +
> +				tmp = read_from_oldmem(buffer, tsz, &paddr, 1);
> +				if (tmp < 0)
> +					return tmp;
> +				break;
> +			}
> +			case VMCORE_2ND_KERNEL:
> +				if (copy_to_user(buffer,
> +						 m->buf + (*fpos - m->offset),
> +						 tsz))
> +					return -EFAULT;
> +				break;
> +			}
> +			buflen -= tsz;
> +			*fpos += tsz;
> +			buffer += tsz;
> +			acc += tsz;
> +
> +			/* leave now if filled buffer already */
> +			if (buflen == 0)
> +				return acc;
>  		}
> -		if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
> -			tsz = buflen;
> -		/* Calculate left bytes in current memory segment. */
> -		nr_bytes = (curr_m->size - (start - curr_m->paddr));
> -		if (tsz > nr_bytes)
> -			tsz = nr_bytes;
>  	}
> +
>  	return acc;
>  }
>  

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/13] vmcore: remove unused helper function
  2013-02-14 10:12   ` HATAYAMA Daisuke
@ 2013-02-15 15:52     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:52 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:12:27PM +0900, HATAYAMA Daisuke wrote:
> Remove map_offset_to_paddr(), which is no longer used.

This along with how read logic is changed should be one patch. And
on top of it should be second patch which reads some part of memory
from current kernel instead of old kernel.

Thanks
Vivek

> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |   21 ---------------------
>  1 files changed, 0 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 4125a65..3aedb52 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -117,27 +117,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
>  	return read;
>  }
>  
> -/* Maps vmcore file offset to respective physical address in memroy. */
> -static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
> -					struct vmcore **m_ptr)
> -{
> -	struct vmcore *m;
> -	u64 paddr;
> -
> -	list_for_each_entry(m, vc_list, list) {
> -		u64 start, end;
> -		start = m->offset;
> -		end = m->offset + m->size - 1;
> -		if (offset >= start && offset <= end) {
> -			paddr = m->paddr + offset - start;
> -			*m_ptr = m;
> -			return paddr;
> -		}
> -	}
> -	*m_ptr = NULL;
> -	return 0;
> -}
> -
>  /* Read from the ELF header and then the crash dump. On error, negative value is
>   * returned otherwise number of bytes read are returned.
>   */

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/13] vmcore: remove unused helper function
@ 2013-02-15 15:52     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 15:52 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:12:27PM +0900, HATAYAMA Daisuke wrote:
> Remove map_offset_to_paddr(), which is no longer used.

This along with how read logic is changed should be one patch. And
on top of it should be second patch which reads some part of memory
from current kernel instead of old kernel.

Thanks
Vivek

> 
> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> ---
> 
>  fs/proc/vmcore.c |   21 ---------------------
>  1 files changed, 0 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 4125a65..3aedb52 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -117,27 +117,6 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
>  	return read;
>  }
>  
> -/* Maps vmcore file offset to respective physical address in memroy. */
> -static u64 map_offset_to_paddr(loff_t offset, struct list_head *vc_list,
> -					struct vmcore **m_ptr)
> -{
> -	struct vmcore *m;
> -	u64 paddr;
> -
> -	list_for_each_entry(m, vc_list, list) {
> -		u64 start, end;
> -		start = m->offset;
> -		end = m->offset + m->size - 1;
> -		if (offset >= start && offset <= end) {
> -			paddr = m->paddr + offset - start;
> -			*m_ptr = m;
> -			return paddr;
> -		}
> -	}
> -	*m_ptr = NULL;
> -	return 0;
> -}
> -
>  /* Read from the ELF header and then the crash dump. On error, negative value is
>   * returned otherwise number of bytes read are returned.
>   */

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
  2013-02-14 10:12   ` HATAYAMA Daisuke
@ 2013-02-15 16:53     ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 16:53 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Thu, Feb 14, 2013 at 07:12:32PM +0900, HATAYAMA Daisuke wrote:
> Objects exported from ELF note segments are in fact located apart from
> each other on old memory. But on /proc/vmcore they are exported as a
> single ELF note segment. To satisfy mmap()'s page-size boundary
> requirement, copy them in a page-size aligned buffer allocated by
> __get_free_pages() on 2nd kernel and remap the buffer to user-space.
> 
> The buffer for ELF note segments is added to vmcore_list as the object
> of VMCORE_2ND_KERNEL type.
> 
> Copy of ELF note segments is done in two pass: first pass tries to
> calculate real total size of ELF note segments, and then 2nd pass
> copies the segment data into the buffer of the real total size.

Ok, this is the part I am not very happy with. I don't like the idea
of copying notes into second kernel. It has potential to bloat our
memory usage requirements in second kernel.

For example, we allocate a 4K page for each cpu and a huge machine
say 4096 cpu, 16MB of more memory is required. Not that it is big
concern for a 4K cpu machine, still if we can avoid copying notes from
previous kernel, it will be good.

So the problem is that note size from previous kernel might not be
page aligned. And in /proc/vmcore view all the notes are supposed
to be contiguous. 

Thinking loud.

- Can we introduce multiple PT_NOTE program headers. One for each note
  data. I am not sure if this will break existing user space tools like
  gdb, crash etc.

- Or can we pad the notes with a new note type say "VMCORE_PAD". This is
  similar to "VMCOREINFO" just that it is used for padding to make sure
  notes can be page aligned. User space tools should simple ignore
  the VMCORE_PAD notes and move on to next note.

I think I like second idea better and given the fact that gdb did not
break with introduction of "VMCOREINFO" note type, it should not break
when we introduce another note type.

If this works, you don't have to copy notes in second kernel?

Eric, do you have any thoughts on this. What makes more sense.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
@ 2013-02-15 16:53     ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-15 16:53 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Thu, Feb 14, 2013 at 07:12:32PM +0900, HATAYAMA Daisuke wrote:
> Objects exported from ELF note segments are in fact located apart from
> each other on old memory. But on /proc/vmcore they are exported as a
> single ELF note segment. To satisfy mmap()'s page-size boundary
> requirement, copy them in a page-size aligned buffer allocated by
> __get_free_pages() on 2nd kernel and remap the buffer to user-space.
> 
> The buffer for ELF note segments is added to vmcore_list as the object
> of VMCORE_2ND_KERNEL type.
> 
> Copy of ELF note segments is done in two pass: first pass tries to
> calculate real total size of ELF note segments, and then 2nd pass
> copies the segment data into the buffer of the real total size.

Ok, this is the part I am not very happy with. I don't like the idea
of copying notes into second kernel. It has potential to bloat our
memory usage requirements in second kernel.

For example, we allocate a 4K page for each cpu and a huge machine
say 4096 cpu, 16MB of more memory is required. Not that it is big
concern for a 4K cpu machine, still if we can avoid copying notes from
previous kernel, it will be good.

So the problem is that note size from previous kernel might not be
page aligned. And in /proc/vmcore view all the notes are supposed
to be contiguous. 

Thinking loud.

- Can we introduce multiple PT_NOTE program headers. One for each note
  data. I am not sure if this will break existing user space tools like
  gdb, crash etc.

- Or can we pad the notes with a new note type say "VMCORE_PAD". This is
  similar to "VMCOREINFO" just that it is used for padding to make sure
  notes can be page aligned. User space tools should simple ignore
  the VMCORE_PAD notes and move on to next note.

I think I like second idea better and given the fact that gdb did not
break with introduction of "VMCOREINFO" note type, it should not break
when we introduce another note type.

If this works, you don't have to copy notes in second kernel?

Eric, do you have any thoughts on this. What makes more sense.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
  2013-02-15  3:57   ` Atsushi Kumagai
@ 2013-02-18  0:16     ` Hatayama, Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: Hatayama, Daisuke @ 2013-02-18  0:16 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, linux-kernel, lisa.mitchell, ebiederm, cpw, vgoyal

From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
Date: Fri, 15 Feb 2013 12:57:01 +0900

> On Thu, 14 Feb 2013 19:11:43 +0900
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

<cut>
>> TODO
>> ====
>> 
>> - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
>>   confirm whether we can see enough performance improvement.
> 
> As a first step, I'll make a prototype patch for benchmarking unless you
> have already done it.
> 

I have an idea, but I've not started developing it yet.

I think threre are the two points we should optimize. One is
write_kdump_pages() that reads target page frames, compresses them if
necessary, and writes each page frame data in order, and the other is
__exclude_unnecessary_pages() that reads mem_map array into page_cache
and processes it for filtering.

Optimising the former seems trivial by mmap(), but we have to consider
more for the latter case since it is virtually contiguous but not
guranteed to be physically contiguous; mem_map is mapped in the
virtual memory map region. Hence, the current implementation reads
mem_map array one by one in 4KB page with virtual-to-physical
translation. This is critical in performance and not sutable for
optimization by mmap(). We should fix this anyway.

My idea here is to focus on the fact that virtual memory map region is
actually mapped using PMD level page entry, i.e. 4MB page, if
currently used processor supports large pages. By this, the page
entries gained by each page translation is guranteed to be physically
contiguous in at least 4MB length. Looking at the benchmark, the
performance improvement is already saturated in 4MB case. So I guess
we can see enough performance improvement by mmap()ing mem_map array
in this 4MB page units.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
@ 2013-02-18  0:16     ` Hatayama, Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: Hatayama, Daisuke @ 2013-02-18  0:16 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: kexec, linux-kernel, lisa.mitchell, ebiederm, cpw, vgoyal

From: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Subject: Re: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore
Date: Fri, 15 Feb 2013 12:57:01 +0900

> On Thu, 14 Feb 2013 19:11:43 +0900
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:

<cut>
>> TODO
>> ====
>> 
>> - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to
>>   confirm whether we can see enough performance improvement.
> 
> As a first step, I'll make a prototype patch for benchmarking unless you
> have already done it.
> 

I have an idea, but I've not started developing it yet.

I think threre are the two points we should optimize. One is
write_kdump_pages() that reads target page frames, compresses them if
necessary, and writes each page frame data in order, and the other is
__exclude_unnecessary_pages() that reads mem_map array into page_cache
and processes it for filtering.

Optimising the former seems trivial by mmap(), but we have to consider
more for the latter case since it is virtually contiguous but not
guranteed to be physically contiguous; mem_map is mapped in the
virtual memory map region. Hence, the current implementation reads
mem_map array one by one in 4KB page with virtual-to-physical
translation. This is critical in performance and not sutable for
optimization by mmap(). We should fix this anyway.

My idea here is to focus on the fact that virtual memory map region is
actually mapped using PMD level page entry, i.e. 4MB page, if
currently used processor supports large pages. By this, the page
entries gained by each page translation is guranteed to be physically
contiguous in at least 4MB length. Looking at the benchmark, the
performance improvement is already saturated in 4MB case. So I guess
we can see enough performance improvement by mmap()ing mem_map array
in this 4MB page units.

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
  2013-02-15 15:18     ` Vivek Goyal
@ 2013-02-18 15:58       ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 15:58 UTC (permalink / raw)
  To: vgoyal; +Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
Date: Fri, 15 Feb 2013 10:18:21 -0500

> On Thu, Feb 14, 2013 at 07:11:54PM +0900, HATAYAMA Daisuke wrote:
>> To satisfy mmap() page-size boundary requirement, reound up buffer
>> size of ELF headers by PAGE_SIZE. The resulting value becomes offset
>> of ELF note segments and it's assigned in unique PT_NOTE program
>> header entry.
>> 
>> Also, some part that assumes past ELF headers' size is replaced by
>> this new rounded-up value.
>> 
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> ---
>> 
>>  fs/proc/vmcore.c |    9 +++++----
>>  1 files changed, 5 insertions(+), 4 deletions(-)
>> 
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 85714c3..5010ead 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -313,7 +313,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>>  	phdr.p_flags   = 0;
>>  	note_off = sizeof(Elf64_Ehdr) +
>>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
>> -	phdr.p_offset  = note_off;
>> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>>  	phdr.p_align   = 0;
>> @@ -331,6 +331,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>>  	/* Modify e_phnum to reflect merged headers. */
>>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>>  
> 
> Hi Hatayama,
> 
> While reading the /proc/vmcore code again, I realized that we are making
> a horrible assumption. And that is that all PT_NOTE program headers
> prepared by kexec-tools are contiguous. And we also seem to be assuming
> that all PT_NOTE phdrs are following immediately Elf Header.
> 
>         /* Add merged PT_NOTE program header*/
>         tmp = elfptr + sizeof(Elf64_Ehdr);
>         memcpy(tmp, &phdr, sizeof(phdr));
>         tmp += sizeof(phdr);
> 
>         /* Remove unwanted PT_NOTE program headers. */
>         i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>         *elfsz = *elfsz - i;
>         memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> 
> I know I wrote this code but now I realize that this is very bad
> assumption. We should not be assuming where PT_NOTE headers are and
> also should not be assuming that these are contiguous.
> 
> This will require fixing. I think we just need to read old elf headers
> in a buffer and prepare new headers (merged one) in a separate buffer
> instead of trying to make do with single buffer.
> 
> If it is not too much of trouble, can you please do this cleanup and
> rebase your patches on top of it.
> 
> Thanks
> Vivek

Yes, I'll do this.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
@ 2013-02-18 15:58       ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 15:58 UTC (permalink / raw)
  To: vgoyal; +Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE
Date: Fri, 15 Feb 2013 10:18:21 -0500

> On Thu, Feb 14, 2013 at 07:11:54PM +0900, HATAYAMA Daisuke wrote:
>> To satisfy mmap() page-size boundary requirement, reound up buffer
>> size of ELF headers by PAGE_SIZE. The resulting value becomes offset
>> of ELF note segments and it's assigned in unique PT_NOTE program
>> header entry.
>> 
>> Also, some part that assumes past ELF headers' size is replaced by
>> this new rounded-up value.
>> 
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> ---
>> 
>>  fs/proc/vmcore.c |    9 +++++----
>>  1 files changed, 5 insertions(+), 4 deletions(-)
>> 
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 85714c3..5010ead 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -313,7 +313,7 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>>  	phdr.p_flags   = 0;
>>  	note_off = sizeof(Elf64_Ehdr) +
>>  			(ehdr_ptr->e_phnum - nr_ptnote +1) * sizeof(Elf64_Phdr);
>> -	phdr.p_offset  = note_off;
>> +	phdr.p_offset  = roundup(note_off, PAGE_SIZE);
>>  	phdr.p_vaddr   = phdr.p_paddr = 0;
>>  	phdr.p_filesz  = phdr.p_memsz = phdr_sz;
>>  	phdr.p_align   = 0;
>> @@ -331,6 +331,8 @@ static int __init merge_note_headers_elf64(char *elfptr, size_t *elfsz,
>>  	/* Modify e_phnum to reflect merged headers. */
>>  	ehdr_ptr->e_phnum = ehdr_ptr->e_phnum - nr_ptnote + 1;
>>  
> 
> Hi Hatayama,
> 
> While reading the /proc/vmcore code again, I realized that we are making
> a horrible assumption. And that is that all PT_NOTE program headers
> prepared by kexec-tools are contiguous. And we also seem to be assuming
> that all PT_NOTE phdrs are following immediately Elf Header.
> 
>         /* Add merged PT_NOTE program header*/
>         tmp = elfptr + sizeof(Elf64_Ehdr);
>         memcpy(tmp, &phdr, sizeof(phdr));
>         tmp += sizeof(phdr);
> 
>         /* Remove unwanted PT_NOTE program headers. */
>         i = (nr_ptnote - 1) * sizeof(Elf64_Phdr);
>         *elfsz = *elfsz - i;
>         memmove(tmp, tmp+i, ((*elfsz)-sizeof(Elf64_Ehdr)-sizeof(Elf64_Phdr)));
> 
> I know I wrote this code but now I realize that this is very bad
> assumption. We should not be assuming where PT_NOTE headers are and
> also should not be assuming that these are contiguous.
> 
> This will require fixing. I think we just need to read old elf headers
> in a buffer and prepare new headers (merged one) in a separate buffer
> instead of trying to make do with single buffer.
> 
> If it is not too much of trouble, can you please do this cleanup and
> rebase your patches on top of it.
> 
> Thanks
> Vivek

Yes, I'll do this.

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
  2013-02-15 15:28     ` Vivek Goyal
@ 2013-02-18 16:06       ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 16:06 UTC (permalink / raw)
  To: vgoyal; +Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
Date: Fri, 15 Feb 2013 10:28:28 -0500

> On Thu, Feb 14, 2013 at 07:12:05PM +0900, HATAYAMA Daisuke wrote:
>> Some parts of old memory need to be copied in buffers on 2nd kernel to
>> be remapped to user-space. To distinguish objects in the buffer on 2nd
>> kernel and the ones on old memory, enum vmcore_type is introduced: the
>> object in the buffer on 2nd kernel has VMCORE_2ND_KERNEL type, and the
>> one on old memory has VMCORE_OLD_MEMORY type.
>> 
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> ---
>> 
>>  include/linux/proc_fs.h |   11 ++++++++++-
>>  1 files changed, 10 insertions(+), 1 deletions(-)
>> 
>> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
>> index 32676b3..4b153ed 100644
>> --- a/include/linux/proc_fs.h
>> +++ b/include/linux/proc_fs.h
>> @@ -97,11 +97,20 @@ struct kcore_list {
>>  	int type;
>>  };
>>  
>> +enum vmcore_type {
>> +	VMCORE_OLD_MEMORY,
>> +	VMCORE_2ND_KERNEL,
> 
> This VMCORE_2ND_KERNEL tag looks bad.
> 
> How about introducing a "unsigned int flag" element in "struct vmcore"
> and set the flag MEM_TYPE_OLDMEM for any contents which come from oldmem.
> 
> If MEM_TYPE_OLDMEM is not set, it is assumed that contents are to be
> fetched from current kernel using pointer vmcore->buf.

This sounds strange to me. There has not been contents to be fetched
from current kenrel so far. So, object with MEM_TYPE_OLDMEM seems more
normal than without. Should we prepare special type for objects in 2nd
kernel?

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
@ 2013-02-18 16:06       ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 16:06 UTC (permalink / raw)
  To: vgoyal; +Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
Date: Fri, 15 Feb 2013 10:28:28 -0500

> On Thu, Feb 14, 2013 at 07:12:05PM +0900, HATAYAMA Daisuke wrote:
>> Some parts of old memory need to be copied in buffers on 2nd kernel to
>> be remapped to user-space. To distinguish objects in the buffer on 2nd
>> kernel and the ones on old memory, enum vmcore_type is introduced: the
>> object in the buffer on 2nd kernel has VMCORE_2ND_KERNEL type, and the
>> one on old memory has VMCORE_OLD_MEMORY type.
>> 
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
>> ---
>> 
>>  include/linux/proc_fs.h |   11 ++++++++++-
>>  1 files changed, 10 insertions(+), 1 deletions(-)
>> 
>> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
>> index 32676b3..4b153ed 100644
>> --- a/include/linux/proc_fs.h
>> +++ b/include/linux/proc_fs.h
>> @@ -97,11 +97,20 @@ struct kcore_list {
>>  	int type;
>>  };
>>  
>> +enum vmcore_type {
>> +	VMCORE_OLD_MEMORY,
>> +	VMCORE_2ND_KERNEL,
> 
> This VMCORE_2ND_KERNEL tag looks bad.
> 
> How about introducing a "unsigned int flag" element in "struct vmcore"
> and set the flag MEM_TYPE_OLDMEM for any contents which come from oldmem.
> 
> If MEM_TYPE_OLDMEM is not set, it is assumed that contents are to be
> fetched from current kernel using pointer vmcore->buf.

This sounds strange to me. There has not been contents to be fetched
from current kenrel so far. So, object with MEM_TYPE_OLDMEM seems more
normal than without. Should we prepare special type for objects in 2nd
kernel?

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/13] vmcore: modify ELF32 code according to new type
  2013-02-15 15:30     ` Vivek Goyal
@ 2013-02-18 16:11       ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 16:11 UTC (permalink / raw)
  To: vgoyal; +Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 05/13] vmcore: modify ELF32 code according to new type
Date: Fri, 15 Feb 2013 10:30:48 -0500

> On Thu, Feb 14, 2013 at 07:12:10PM +0900, HATAYAMA Daisuke wrote:
>> On elf32 mmap() is not supported. All vmcore objects are in old
>> memory.
> 
> This is odd. Why can't we support mmap() when 32bit headers have been
> prepared?
> 
> Thanks
> Vivek
> 

I didin't think there's much usecase on ELF32 interface as I wrote in
the last patch description, but OK, I'll implement it on 32-bit
interface too.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/13] vmcore: modify ELF32 code according to new type
@ 2013-02-18 16:11       ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 16:11 UTC (permalink / raw)
  To: vgoyal; +Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 05/13] vmcore: modify ELF32 code according to new type
Date: Fri, 15 Feb 2013 10:30:48 -0500

> On Thu, Feb 14, 2013 at 07:12:10PM +0900, HATAYAMA Daisuke wrote:
>> On elf32 mmap() is not supported. All vmcore objects are in old
>> memory.
> 
> This is odd. Why can't we support mmap() when 32bit headers have been
> prepared?
> 
> Thanks
> Vivek
> 

I didin't think there's much usecase on ELF32 interface as I wrote in
the last patch description, but OK, I'll implement it on 32-bit
interface too.

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
  2013-02-15 16:53     ` Vivek Goyal
@ 2013-02-18 17:02       ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 17:02 UTC (permalink / raw)
  To: vgoyal; +Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
Date: Fri, 15 Feb 2013 11:53:27 -0500

> On Thu, Feb 14, 2013 at 07:12:32PM +0900, HATAYAMA Daisuke wrote:
>> Objects exported from ELF note segments are in fact located apart from
>> each other on old memory. But on /proc/vmcore they are exported as a
>> single ELF note segment. To satisfy mmap()'s page-size boundary
>> requirement, copy them in a page-size aligned buffer allocated by
>> __get_free_pages() on 2nd kernel and remap the buffer to user-space.
>> 
>> The buffer for ELF note segments is added to vmcore_list as the object
>> of VMCORE_2ND_KERNEL type.
>> 
>> Copy of ELF note segments is done in two pass: first pass tries to
>> calculate real total size of ELF note segments, and then 2nd pass
>> copies the segment data into the buffer of the real total size.
> 
> Ok, this is the part I am not very happy with. I don't like the idea
> of copying notes into second kernel. It has potential to bloat our
> memory usage requirements in second kernel.
> 
> For example, we allocate a 4K page for each cpu and a huge machine
> say 4096 cpu, 16MB of more memory is required. Not that it is big
> concern for a 4K cpu machine, still if we can avoid copying notes from
> previous kernel, it will be good.

I also estimated the worst case, but it was more optimistically done
than yours. In my case, estimation was at most less than 2MB on
x86_64: roundup(5112 cpus x sizeof (struct user_struct_regs),
PAGE_SIZE) is about 1MB. But I didn't consider other architectures and
now noticed s390 collects notes more agressively.

> 
> So the problem is that note size from previous kernel might not be
> page aligned. And in /proc/vmcore view all the notes are supposed
> to be contiguous. 
> 
> Thinking loud.
> 
> - Can we introduce multiple PT_NOTE program headers. One for each note
>   data. I am not sure if this will break existing user space tools like
>   gdb, crash etc.
> 
> - Or can we pad the notes with a new note type say "VMCORE_PAD". This is
>   similar to "VMCOREINFO" just that it is used for padding to make sure
>   notes can be page aligned. User space tools should simple ignore
>   the VMCORE_PAD notes and move on to next note.
> 
> I think I like second idea better and given the fact that gdb did not
> break with introduction of "VMCOREINFO" note type, it should not break
> when we introduce another note type.
> 
> If this works, you don't have to copy notes in second kernel?

I also think the second one is better. Yes, I have in fact already had
a similar idea. It's of course possible.

I have never seen tools assuming multiple PT_NOTE entries if I have
good memory. And, tools like gdb interpret note information not only
by their contents but also their order. For example, n-th NT_PRSTATUS
is considered as n-th thread or n-th CPU's data. It seems to me that
adding case of multiple PT_NOTE entires possibly make things
unnecessarily complicated.

BTW, on kexec/kdump design, we never assume that the first and the
second kernels are always the same. This means that we cannot assume
that the first kernel always puts their notes in page-size boundary in
the above way. So we need to check whether each note entry is in
page-size boundary one by one, and if one entry is not in page-size
boundary, then we need to copy it in the 2nd kernel (and appends the
pad note to it.) Copying is still necessary in the worst case.

Anyway, what I'll do in the next version, are in summary:

- append pad notes in each notes on the 1st kernel in every
  architectures, and
- check if each note is in page-size boundary, and if not so, copy it
  in the 2nd kernel and then append pad notes to it.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
@ 2013-02-18 17:02       ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-02-18 17:02 UTC (permalink / raw)
  To: vgoyal; +Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
Date: Fri, 15 Feb 2013 11:53:27 -0500

> On Thu, Feb 14, 2013 at 07:12:32PM +0900, HATAYAMA Daisuke wrote:
>> Objects exported from ELF note segments are in fact located apart from
>> each other on old memory. But on /proc/vmcore they are exported as a
>> single ELF note segment. To satisfy mmap()'s page-size boundary
>> requirement, copy them in a page-size aligned buffer allocated by
>> __get_free_pages() on 2nd kernel and remap the buffer to user-space.
>> 
>> The buffer for ELF note segments is added to vmcore_list as the object
>> of VMCORE_2ND_KERNEL type.
>> 
>> Copy of ELF note segments is done in two pass: first pass tries to
>> calculate real total size of ELF note segments, and then 2nd pass
>> copies the segment data into the buffer of the real total size.
> 
> Ok, this is the part I am not very happy with. I don't like the idea
> of copying notes into second kernel. It has potential to bloat our
> memory usage requirements in second kernel.
> 
> For example, we allocate a 4K page for each cpu and a huge machine
> say 4096 cpu, 16MB of more memory is required. Not that it is big
> concern for a 4K cpu machine, still if we can avoid copying notes from
> previous kernel, it will be good.

I also estimated the worst case, but it was more optimistically done
than yours. In my case, estimation was at most less than 2MB on
x86_64: roundup(5112 cpus x sizeof (struct user_struct_regs),
PAGE_SIZE) is about 1MB. But I didn't consider other architectures and
now noticed s390 collects notes more agressively.

> 
> So the problem is that note size from previous kernel might not be
> page aligned. And in /proc/vmcore view all the notes are supposed
> to be contiguous. 
> 
> Thinking loud.
> 
> - Can we introduce multiple PT_NOTE program headers. One for each note
>   data. I am not sure if this will break existing user space tools like
>   gdb, crash etc.
> 
> - Or can we pad the notes with a new note type say "VMCORE_PAD". This is
>   similar to "VMCOREINFO" just that it is used for padding to make sure
>   notes can be page aligned. User space tools should simple ignore
>   the VMCORE_PAD notes and move on to next note.
> 
> I think I like second idea better and given the fact that gdb did not
> break with introduction of "VMCOREINFO" note type, it should not break
> when we introduce another note type.
> 
> If this works, you don't have to copy notes in second kernel?

I also think the second one is better. Yes, I have in fact already had
a similar idea. It's of course possible.

I have never seen tools assuming multiple PT_NOTE entries if I have
good memory. And, tools like gdb interpret note information not only
by their contents but also their order. For example, n-th NT_PRSTATUS
is considered as n-th thread or n-th CPU's data. It seems to me that
adding case of multiple PT_NOTE entires possibly make things
unnecessarily complicated.

BTW, on kexec/kdump design, we never assume that the first and the
second kernels are always the same. This means that we cannot assume
that the first kernel always puts their notes in page-size boundary in
the above way. So we need to check whether each note entry is in
page-size boundary one by one, and if one entry is not in page-size
boundary, then we need to copy it in the 2nd kernel (and appends the
pad note to it.) Copying is still necessary in the worst case.

Anyway, what I'll do in the next version, are in summary:

- append pad notes in each notes on the 1st kernel in every
  architectures, and
- check if each note is in page-size boundary, and if not so, copy it
  in the 2nd kernel and then append pad notes to it.

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
  2013-02-18 17:02       ` HATAYAMA Daisuke
@ 2013-02-19 23:05         ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-19 23:05 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Tue, Feb 19, 2013 at 02:02:34AM +0900, HATAYAMA Daisuke wrote:

[..]
> Anyway, what I'll do in the next version, are in summary:
> 
> - append pad notes in each notes on the 1st kernel in every
>   architectures, and
> - check if each note is in page-size boundary, and if not so, copy it
>   in the 2nd kernel and then append pad notes to it.

Makes sense to me. Most of the time first kernel and second kernel are
same so no copying of notes will take place. Only in corner cases of
older kernel being used as first kernel, copying will take place.

One other possibility is that deny mmap() if first kernel did not
prepare notes on page size boundary.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
@ 2013-02-19 23:05         ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-19 23:05 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Tue, Feb 19, 2013 at 02:02:34AM +0900, HATAYAMA Daisuke wrote:

[..]
> Anyway, what I'll do in the next version, are in summary:
> 
> - append pad notes in each notes on the 1st kernel in every
>   architectures, and
> - check if each note is in page-size boundary, and if not so, copy it
>   in the 2nd kernel and then append pad notes to it.

Makes sense to me. Most of the time first kernel and second kernel are
same so no copying of notes will take place. Only in corner cases of
older kernel being used as first kernel, copying will take place.

One other possibility is that deny mmap() if first kernel did not
prepare notes on page size boundary.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
  2013-02-18 16:06       ` HATAYAMA Daisuke
@ 2013-02-19 23:07         ` Vivek Goyal
  -1 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-19 23:07 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Tue, Feb 19, 2013 at 01:06:31AM +0900, HATAYAMA Daisuke wrote:

[..]
> This sounds strange to me. There has not been contents to be fetched
> from current kenrel so far. So, object with MEM_TYPE_OLDMEM seems more
> normal than without. Should we prepare special type for objects in 2nd
> kernel?

Creating a flag for memory in current kernel and if flag is not set, then
assuming memory comes from old kernel is fine too.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel
@ 2013-02-19 23:07         ` Vivek Goyal
  0 siblings, 0 replies; 66+ messages in thread
From: Vivek Goyal @ 2013-02-19 23:07 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Tue, Feb 19, 2013 at 01:06:31AM +0900, HATAYAMA Daisuke wrote:

[..]
> This sounds strange to me. There has not been contents to be fetched
> from current kenrel so far. So, object with MEM_TYPE_OLDMEM seems more
> normal than without. Should we prepare special type for objects in 2nd
> kernel?

Creating a flag for memory in current kernel and if flag is not set, then
assuming memory comes from old kernel is fine too.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* makedumpfile mmap() benchmark
  2013-02-14 10:11 ` HATAYAMA Daisuke
@ 2013-03-27  5:51   ` Jingbai Ma
  -1 siblings, 0 replies; 66+ messages in thread
From: Jingbai Ma @ 2013-03-27  5:51 UTC (permalink / raw)
  To: HATAYAMA Daisuke, vgoyal, ebiederm, cpw, kumagai-atsushi,
	Mitchell, Lisa (MCLinux in Fort Collins),
	akpm
  Cc: jingbai.ma, kexec, linux-kernel, cpw, kumagai-atsushi

Hi,

I have tested the makedumpfile mmap patch on a machine with 2TB memory, 
here is testing results:
Test environment:
Machine: HP ProLiant DL980 G7 with 2TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 cpu was enabled the 2nd kernel)
Kernel: 3.9.0-rc3+ with mmap kernel patch v3
vmcore size: 2.0TB
Dump file size: 3.6GB
makedumpfile mmap branch with parameters: -c --message-level 23 -d 31 
--map-size <map-size>
All measured time from debug message of makedumpfile.

As a comparison, I also have tested with original kernel and original 
makedumpfile 1.5.1 and 1.5.3.
I added all [Excluding unnecessary pages] and [Excluding free pages] 
time together as "Filter Pages", and [Copyying Data] as "Copy data" here.

makedumjpfile	Kernel	map-size (KB)	Filter pages (s)	Copy data (s)	Total (s)
1.5.1	 3.7.0-0.36.el7.x86_64	N/A	940.28	1269.25	2209.53
1.5.3	 3.7.0-0.36.el7.x86_64	N/A	380.09	992.77	1372.86
1.5.3	v3.9-rc3	N/A	197.77	892.27	1090.04
1.5.3+mmap	v3.9-rc3+mmap	0	164.87	606.06	770.93
1.5.3+mmap	v3.9-rc3+mmap	4	88.62	576.07	664.69
1.5.3+mmap	v3.9-rc3+mmap	1024	83.66	477.23	560.89
1.5.3+mmap	v3.9-rc3+mmap	2048	83.44	477.21	560.65
1.5.3+mmap	v3.9-rc3+mmap	10240	83.84	476.56	560.4


Thanks,
Jingbai Ma

^ permalink raw reply	[flat|nested] 66+ messages in thread

* makedumpfile mmap() benchmark
@ 2013-03-27  5:51   ` Jingbai Ma
  0 siblings, 0 replies; 66+ messages in thread
From: Jingbai Ma @ 2013-03-27  5:51 UTC (permalink / raw)
  To: HATAYAMA Daisuke, vgoyal, ebiederm, cpw, kumagai-atsushi,
	Mitchell, Lisa (MCLinux in Fort Collins),
	akpm
  Cc: kexec, linux-kernel, jingbai.ma

Hi,

I have tested the makedumpfile mmap patch on a machine with 2TB memory, 
here is testing results:
Test environment:
Machine: HP ProLiant DL980 G7 with 2TB RAM.
CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
(Only 1 cpu was enabled the 2nd kernel)
Kernel: 3.9.0-rc3+ with mmap kernel patch v3
vmcore size: 2.0TB
Dump file size: 3.6GB
makedumpfile mmap branch with parameters: -c --message-level 23 -d 31 
--map-size <map-size>
All measured time from debug message of makedumpfile.

As a comparison, I also have tested with original kernel and original 
makedumpfile 1.5.1 and 1.5.3.
I added all [Excluding unnecessary pages] and [Excluding free pages] 
time together as "Filter Pages", and [Copyying Data] as "Copy data" here.

makedumjpfile	Kernel	map-size (KB)	Filter pages (s)	Copy data (s)	Total (s)
1.5.1	 3.7.0-0.36.el7.x86_64	N/A	940.28	1269.25	2209.53
1.5.3	 3.7.0-0.36.el7.x86_64	N/A	380.09	992.77	1372.86
1.5.3	v3.9-rc3	N/A	197.77	892.27	1090.04
1.5.3+mmap	v3.9-rc3+mmap	0	164.87	606.06	770.93
1.5.3+mmap	v3.9-rc3+mmap	4	88.62	576.07	664.69
1.5.3+mmap	v3.9-rc3+mmap	1024	83.66	477.23	560.89
1.5.3+mmap	v3.9-rc3+mmap	2048	83.44	477.21	560.65
1.5.3+mmap	v3.9-rc3+mmap	10240	83.84	476.56	560.4


Thanks,
Jingbai Ma

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: makedumpfile mmap() benchmark
  2013-03-27  5:51   ` Jingbai Ma
@ 2013-03-27  6:23     ` HATAYAMA Daisuke
  -1 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-03-27  6:23 UTC (permalink / raw)
  To: jingbai.ma
  Cc: vgoyal, ebiederm, cpw, kumagai-atsushi, lisa.mitchell, akpm,
	kexec, linux-kernel

From: Jingbai Ma <jingbai.ma@hp.com>
Subject: makedumpfile mmap() benchmark
Date: Wed, 27 Mar 2013 13:51:37 +0800

> Hi,
> 
> I have tested the makedumpfile mmap patch on a machine with 2TB
> memory, here is testing results:

Thanks for your benchmark. It's very helpful to see the benchmark on
different environments.

> Test environment:
> Machine: HP ProLiant DL980 G7 with 2TB RAM.
> CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
> (Only 1 cpu was enabled the 2nd kernel)
> Kernel: 3.9.0-rc3+ with mmap kernel patch v3
> vmcore size: 2.0TB
> Dump file size: 3.6GB
> makedumpfile mmap branch with parameters: -c --message-level 23 -d 31
> --map-size <map-size>

To reduce the benchmark time, I recommend LZO or snappy compressions
rather than zlib. zlib is used when -c option is specified, and it's
too slow for use of crash dump.

To build makedumpfile with each compression format supports, do
USELZO=on or USESNAPPY=on after installing necessary libraries.

> All measured time from debug message of makedumpfile.
> 
> As a comparison, I also have tested with original kernel and original
> makedumpfile 1.5.1 and 1.5.3.
> I added all [Excluding unnecessary pages] and [Excluding free pages]
> time together as "Filter Pages", and [Copyying Data] as "Copy data"
> here.
> 
> makedumjpfile Kernel map-size (KB) Filter pages (s) Copy data (s)
> Total (s)
> 1.5.1	 3.7.0-0.36.el7.x86_64	N/A	940.28	1269.25	2209.53
> 1.5.3	 3.7.0-0.36.el7.x86_64	N/A	380.09	992.77	1372.86
> 1.5.3	v3.9-rc3	N/A	197.77	892.27	1090.04
> 1.5.3+mmap	v3.9-rc3+mmap	0	164.87	606.06	770.93
> 1.5.3+mmap	v3.9-rc3+mmap	4	88.62	576.07	664.69
> 1.5.3+mmap	v3.9-rc3+mmap	1024	83.66	477.23	560.89
> 1.5.3+mmap	v3.9-rc3+mmap	2048	83.44	477.21	560.65
> 1.5.3+mmap	v3.9-rc3+mmap	10240	83.84	476.56	560.4

Did you calculate "Filter pages" by adding two [Excluding unnecessary
pages] lines? The first one of the two line is displayed by
get_num_dumpable_cyclic() during the calculation of the total number
of dumpable pages, which is later used to print progress of writing
pages in percentage.

For example, here is the log, where the number of cycles is 3, and

mem_map (16399)
  mem_map    : ffffea0801e00000
  pfn_start  : 20078000
  pfn_end    : 20080000
read /proc/vmcore with mmap()
STEP [Excluding unnecessary pages] : 13.703842 seconds <-- this part is by get_num_dumpable_cyclic()
STEP [Excluding unnecessary pages] : 13.842656 seconds
STEP [Excluding unnecessary pages] : 6.857910 seconds
STEP [Excluding unnecessary pages] : 13.554281 seconds <-- this part is by the main filtering processing.
STEP [Excluding unnecessary pages] : 14.103593 seconds
STEP [Excluding unnecessary pages] : 7.114239 seconds
STEP [Copying data               ] : 138.442116 seconds
Writing erase info...
offset_eraseinfo: 1f4680e40, size_eraseinfo: 0

Original pages  : 0x000000001ffc28a4
<cut>

So, get_num_dumpable_cyclic() actually does filtering operation but it
should not be included here.

If so, I guess each measured time would be about 42 seconds, right?
Then, it's almost same as the result I posted today: 35 seconds.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: makedumpfile mmap() benchmark
@ 2013-03-27  6:23     ` HATAYAMA Daisuke
  0 siblings, 0 replies; 66+ messages in thread
From: HATAYAMA Daisuke @ 2013-03-27  6:23 UTC (permalink / raw)
  To: jingbai.ma
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm,
	akpm, cpw, vgoyal

From: Jingbai Ma <jingbai.ma@hp.com>
Subject: makedumpfile mmap() benchmark
Date: Wed, 27 Mar 2013 13:51:37 +0800

> Hi,
> 
> I have tested the makedumpfile mmap patch on a machine with 2TB
> memory, here is testing results:

Thanks for your benchmark. It's very helpful to see the benchmark on
different environments.

> Test environment:
> Machine: HP ProLiant DL980 G7 with 2TB RAM.
> CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
> (Only 1 cpu was enabled the 2nd kernel)
> Kernel: 3.9.0-rc3+ with mmap kernel patch v3
> vmcore size: 2.0TB
> Dump file size: 3.6GB
> makedumpfile mmap branch with parameters: -c --message-level 23 -d 31
> --map-size <map-size>

To reduce the benchmark time, I recommend LZO or snappy compressions
rather than zlib. zlib is used when -c option is specified, and it's
too slow for use of crash dump.

To build makedumpfile with each compression format supports, do
USELZO=on or USESNAPPY=on after installing necessary libraries.

> All measured time from debug message of makedumpfile.
> 
> As a comparison, I also have tested with original kernel and original
> makedumpfile 1.5.1 and 1.5.3.
> I added all [Excluding unnecessary pages] and [Excluding free pages]
> time together as "Filter Pages", and [Copyying Data] as "Copy data"
> here.
> 
> makedumjpfile Kernel map-size (KB) Filter pages (s) Copy data (s)
> Total (s)
> 1.5.1	 3.7.0-0.36.el7.x86_64	N/A	940.28	1269.25	2209.53
> 1.5.3	 3.7.0-0.36.el7.x86_64	N/A	380.09	992.77	1372.86
> 1.5.3	v3.9-rc3	N/A	197.77	892.27	1090.04
> 1.5.3+mmap	v3.9-rc3+mmap	0	164.87	606.06	770.93
> 1.5.3+mmap	v3.9-rc3+mmap	4	88.62	576.07	664.69
> 1.5.3+mmap	v3.9-rc3+mmap	1024	83.66	477.23	560.89
> 1.5.3+mmap	v3.9-rc3+mmap	2048	83.44	477.21	560.65
> 1.5.3+mmap	v3.9-rc3+mmap	10240	83.84	476.56	560.4

Did you calculate "Filter pages" by adding two [Excluding unnecessary
pages] lines? The first one of the two line is displayed by
get_num_dumpable_cyclic() during the calculation of the total number
of dumpable pages, which is later used to print progress of writing
pages in percentage.

For example, here is the log, where the number of cycles is 3, and

mem_map (16399)
  mem_map    : ffffea0801e00000
  pfn_start  : 20078000
  pfn_end    : 20080000
read /proc/vmcore with mmap()
STEP [Excluding unnecessary pages] : 13.703842 seconds <-- this part is by get_num_dumpable_cyclic()
STEP [Excluding unnecessary pages] : 13.842656 seconds
STEP [Excluding unnecessary pages] : 6.857910 seconds
STEP [Excluding unnecessary pages] : 13.554281 seconds <-- this part is by the main filtering processing.
STEP [Excluding unnecessary pages] : 14.103593 seconds
STEP [Excluding unnecessary pages] : 7.114239 seconds
STEP [Copying data               ] : 138.442116 seconds
Writing erase info...
offset_eraseinfo: 1f4680e40, size_eraseinfo: 0

Original pages  : 0x000000001ffc28a4
<cut>

So, get_num_dumpable_cyclic() actually does filtering operation but it
should not be included here.

If so, I guess each measured time would be about 42 seconds, right?
Then, it's almost same as the result I posted today: 35 seconds.

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: makedumpfile mmap() benchmark
  2013-03-27  6:23     ` HATAYAMA Daisuke
@ 2013-03-27  6:35       ` Jingbai Ma
  -1 siblings, 0 replies; 66+ messages in thread
From: Jingbai Ma @ 2013-03-27  6:35 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: jingbai.ma, vgoyal, ebiederm, cpw, kumagai-atsushi,
	lisa.mitchell, akpm, kexec, linux-kernel

On 03/27/2013 02:23 PM, HATAYAMA Daisuke wrote:
> From: Jingbai Ma<jingbai.ma@hp.com>
> Subject: makedumpfile mmap() benchmark
> Date: Wed, 27 Mar 2013 13:51:37 +0800
>
>> Hi,
>>
>> I have tested the makedumpfile mmap patch on a machine with 2TB
>> memory, here is testing results:
>
> Thanks for your benchmark. It's very helpful to see the benchmark on
> different environments.

Thanks for your patch, there is a great performance improvement, very 
impressive!

>
>> Test environment:
>> Machine: HP ProLiant DL980 G7 with 2TB RAM.
>> CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
>> (Only 1 cpu was enabled the 2nd kernel)
>> Kernel: 3.9.0-rc3+ with mmap kernel patch v3
>> vmcore size: 2.0TB
>> Dump file size: 3.6GB
>> makedumpfile mmap branch with parameters: -c --message-level 23 -d 31
>> --map-size<map-size>
>
> To reduce the benchmark time, I recommend LZO or snappy compressions
> rather than zlib. zlib is used when -c option is specified, and it's
> too slow for use of crash dump.

That's a very helpful suggestion, I will try it again with LZO/snappy 
lib again.

>
> To build makedumpfile with each compression format supports, do
> USELZO=on or USESNAPPY=on after installing necessary libraries.
>
>> All measured time from debug message of makedumpfile.
>>
>> As a comparison, I also have tested with original kernel and original
>> makedumpfile 1.5.1 and 1.5.3.
>> I added all [Excluding unnecessary pages] and [Excluding free pages]
>> time together as "Filter Pages", and [Copyying Data] as "Copy data"
>> here.
>>
>> makedumjpfile Kernel map-size (KB) Filter pages (s) Copy data (s)
>> Total (s)
>> 1.5.1	 3.7.0-0.36.el7.x86_64	N/A	940.28	1269.25	2209.53
>> 1.5.3	 3.7.0-0.36.el7.x86_64	N/A	380.09	992.77	1372.86
>> 1.5.3	v3.9-rc3	N/A	197.77	892.27	1090.04
>> 1.5.3+mmap	v3.9-rc3+mmap	0	164.87	606.06	770.93
>> 1.5.3+mmap	v3.9-rc3+mmap	4	88.62	576.07	664.69
>> 1.5.3+mmap	v3.9-rc3+mmap	1024	83.66	477.23	560.89
>> 1.5.3+mmap	v3.9-rc3+mmap	2048	83.44	477.21	560.65
>> 1.5.3+mmap	v3.9-rc3+mmap	10240	83.84	476.56	560.4
>
> Did you calculate "Filter pages" by adding two [Excluding unnecessary
> pages] lines? The first one of the two line is displayed by
> get_num_dumpable_cyclic() during the calculation of the total number
> of dumpable pages, which is later used to print progress of writing
> pages in percentage.
>
> For example, here is the log, where the number of cycles is 3, and
>
> mem_map (16399)
>    mem_map    : ffffea0801e00000
>    pfn_start  : 20078000
>    pfn_end    : 20080000
> read /proc/vmcore with mmap()
> STEP [Excluding unnecessary pages] : 13.703842 seconds<-- this part is by get_num_dumpable_cyclic()
> STEP [Excluding unnecessary pages] : 13.842656 seconds
> STEP [Excluding unnecessary pages] : 6.857910 seconds
> STEP [Excluding unnecessary pages] : 13.554281 seconds<-- this part is by the main filtering processing.
> STEP [Excluding unnecessary pages] : 14.103593 seconds
> STEP [Excluding unnecessary pages] : 7.114239 seconds
> STEP [Copying data               ] : 138.442116 seconds
> Writing erase info...
> offset_eraseinfo: 1f4680e40, size_eraseinfo: 0
>
> Original pages  : 0x000000001ffc28a4
> <cut>
>
> So, get_num_dumpable_cyclic() actually does filtering operation but it
> should not be included here.
>
> If so, I guess each measured time would be about 42 seconds, right?
> Then, it's almost same as the result I posted today: 35 seconds.

Yes, I added them together, the following is one dump message log:
<Log>
makedumpfile  -c --message-level 23 -d 31 --map-size 10240 /proc/vmcore 
/sysroot/var/crash/vmcore_10240

cyclic buffer size has been changed: 77661798 => 77661184
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 24.777717 seconds
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 17.291935 seconds
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 24.498559 seconds
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 17.278414 seconds
Copying data                       : [100 %] STEP [Copying data 
       ] : 476.563428 seconds


Original pages  : 0x000000001ffe874d
   Excluded pages   : 0x000000001f79429e
     Pages filled with zero  : 0x00000000002b4c9c
     Cache pages             : 0x00000000000493bc
     Cache pages + private   : 0x00000000000011f3
     User process data pages : 0x0000000000005c55
     Free pages              : 0x000000001f48f3fe
     Hwpoison pages          : 0x0000000000000000
   Remaining pages  : 0x00000000008544af
   (The number of pages is reduced to 1%.)
Memory Hole     : 0x000000001c0178b3
--------------------------------------------------
Total pages     : 0x000000003c000000
</Log>

>
> Thanks.
> HATAYAMA, Daisuke
>


-- 
Thanks,
Jingbai Ma

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: makedumpfile mmap() benchmark
@ 2013-03-27  6:35       ` Jingbai Ma
  0 siblings, 0 replies; 66+ messages in thread
From: Jingbai Ma @ 2013-03-27  6:35 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, jingbai.ma,
	akpm, cpw, vgoyal, ebiederm

On 03/27/2013 02:23 PM, HATAYAMA Daisuke wrote:
> From: Jingbai Ma<jingbai.ma@hp.com>
> Subject: makedumpfile mmap() benchmark
> Date: Wed, 27 Mar 2013 13:51:37 +0800
>
>> Hi,
>>
>> I have tested the makedumpfile mmap patch on a machine with 2TB
>> memory, here is testing results:
>
> Thanks for your benchmark. It's very helpful to see the benchmark on
> different environments.

Thanks for your patch, there is a great performance improvement, very 
impressive!

>
>> Test environment:
>> Machine: HP ProLiant DL980 G7 with 2TB RAM.
>> CPU: Intel(R) Xeon(R) CPU E7- 2860  @ 2.27GHz (8 sockets, 10 cores)
>> (Only 1 cpu was enabled the 2nd kernel)
>> Kernel: 3.9.0-rc3+ with mmap kernel patch v3
>> vmcore size: 2.0TB
>> Dump file size: 3.6GB
>> makedumpfile mmap branch with parameters: -c --message-level 23 -d 31
>> --map-size<map-size>
>
> To reduce the benchmark time, I recommend LZO or snappy compressions
> rather than zlib. zlib is used when -c option is specified, and it's
> too slow for use of crash dump.

That's a very helpful suggestion, I will try it again with LZO/snappy 
lib again.

>
> To build makedumpfile with each compression format supports, do
> USELZO=on or USESNAPPY=on after installing necessary libraries.
>
>> All measured time from debug message of makedumpfile.
>>
>> As a comparison, I also have tested with original kernel and original
>> makedumpfile 1.5.1 and 1.5.3.
>> I added all [Excluding unnecessary pages] and [Excluding free pages]
>> time together as "Filter Pages", and [Copyying Data] as "Copy data"
>> here.
>>
>> makedumjpfile Kernel map-size (KB) Filter pages (s) Copy data (s)
>> Total (s)
>> 1.5.1	 3.7.0-0.36.el7.x86_64	N/A	940.28	1269.25	2209.53
>> 1.5.3	 3.7.0-0.36.el7.x86_64	N/A	380.09	992.77	1372.86
>> 1.5.3	v3.9-rc3	N/A	197.77	892.27	1090.04
>> 1.5.3+mmap	v3.9-rc3+mmap	0	164.87	606.06	770.93
>> 1.5.3+mmap	v3.9-rc3+mmap	4	88.62	576.07	664.69
>> 1.5.3+mmap	v3.9-rc3+mmap	1024	83.66	477.23	560.89
>> 1.5.3+mmap	v3.9-rc3+mmap	2048	83.44	477.21	560.65
>> 1.5.3+mmap	v3.9-rc3+mmap	10240	83.84	476.56	560.4
>
> Did you calculate "Filter pages" by adding two [Excluding unnecessary
> pages] lines? The first one of the two line is displayed by
> get_num_dumpable_cyclic() during the calculation of the total number
> of dumpable pages, which is later used to print progress of writing
> pages in percentage.
>
> For example, here is the log, where the number of cycles is 3, and
>
> mem_map (16399)
>    mem_map    : ffffea0801e00000
>    pfn_start  : 20078000
>    pfn_end    : 20080000
> read /proc/vmcore with mmap()
> STEP [Excluding unnecessary pages] : 13.703842 seconds<-- this part is by get_num_dumpable_cyclic()
> STEP [Excluding unnecessary pages] : 13.842656 seconds
> STEP [Excluding unnecessary pages] : 6.857910 seconds
> STEP [Excluding unnecessary pages] : 13.554281 seconds<-- this part is by the main filtering processing.
> STEP [Excluding unnecessary pages] : 14.103593 seconds
> STEP [Excluding unnecessary pages] : 7.114239 seconds
> STEP [Copying data               ] : 138.442116 seconds
> Writing erase info...
> offset_eraseinfo: 1f4680e40, size_eraseinfo: 0
>
> Original pages  : 0x000000001ffc28a4
> <cut>
>
> So, get_num_dumpable_cyclic() actually does filtering operation but it
> should not be included here.
>
> If so, I guess each measured time would be about 42 seconds, right?
> Then, it's almost same as the result I posted today: 35 seconds.

Yes, I added them together, the following is one dump message log:
<Log>
makedumpfile  -c --message-level 23 -d 31 --map-size 10240 /proc/vmcore 
/sysroot/var/crash/vmcore_10240

cyclic buffer size has been changed: 77661798 => 77661184
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 24.777717 seconds
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 17.291935 seconds
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 24.498559 seconds
Excluding unnecessary pages        : [100 %] STEP [Excluding unnecessary 
pages] : 17.278414 seconds
Copying data                       : [100 %] STEP [Copying data 
       ] : 476.563428 seconds


Original pages  : 0x000000001ffe874d
   Excluded pages   : 0x000000001f79429e
     Pages filled with zero  : 0x00000000002b4c9c
     Cache pages             : 0x00000000000493bc
     Cache pages + private   : 0x00000000000011f3
     User process data pages : 0x0000000000005c55
     Free pages              : 0x000000001f48f3fe
     Hwpoison pages          : 0x0000000000000000
   Remaining pages  : 0x00000000008544af
   (The number of pages is reduced to 1%.)
Memory Hole     : 0x000000001c0178b3
--------------------------------------------------
Total pages     : 0x000000003c000000
</Log>

>
> Thanks.
> HATAYAMA, Daisuke
>


-- 
Thanks,
Jingbai Ma

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2013-03-27  6:35 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-14 10:11 [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-02-14 10:11 ` HATAYAMA Daisuke
2013-02-14 10:11 ` [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment HATAYAMA Daisuke
2013-02-14 10:11   ` HATAYAMA Daisuke
2013-02-15 15:01   ` Vivek Goyal
2013-02-15 15:01     ` Vivek Goyal
2013-02-14 10:11 ` [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE HATAYAMA Daisuke
2013-02-14 10:11   ` HATAYAMA Daisuke
2013-02-15 15:18   ` Vivek Goyal
2013-02-15 15:18     ` Vivek Goyal
2013-02-18 15:58     ` HATAYAMA Daisuke
2013-02-18 15:58       ` HATAYAMA Daisuke
2013-02-14 10:11 ` [PATCH 03/13] vmcore: fill unused part of buffer for ELF headers with 0 HATAYAMA Daisuke
2013-02-14 10:11   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:28   ` Vivek Goyal
2013-02-15 15:28     ` Vivek Goyal
2013-02-18 16:06     ` HATAYAMA Daisuke
2013-02-18 16:06       ` HATAYAMA Daisuke
2013-02-19 23:07       ` Vivek Goyal
2013-02-19 23:07         ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 05/13] vmcore: modify ELF32 code according to new type HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:30   ` Vivek Goyal
2013-02-15 15:30     ` Vivek Goyal
2013-02-18 16:11     ` HATAYAMA Daisuke
2013-02-18 16:11       ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:32   ` Vivek Goyal
2013-02-15 15:32     ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 07/13] vmcore: modify read_vmcore() to read " HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:51   ` Vivek Goyal
2013-02-15 15:51     ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 08/13] vmcore: remove unused helper function HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:52   ` Vivek Goyal
2013-02-15 15:52     ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 16:53   ` Vivek Goyal
2013-02-15 16:53     ` Vivek Goyal
2013-02-18 17:02     ` HATAYAMA Daisuke
2013-02-18 17:02       ` HATAYAMA Daisuke
2013-02-19 23:05       ` Vivek Goyal
2013-02-19 23:05         ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 10/13] vmcore: round-up offset of vmcore object in page-size boundary HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 11/13] vmcore: count holes generated by round-up operation for vmcore size HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 12/13] vmcore: copy non page-size aligned head and tail pages in 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 13/13] vmcore: introduce mmap_vmcore() HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15  3:57 ` [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore Atsushi Kumagai
2013-02-15  3:57   ` Atsushi Kumagai
2013-02-18  0:16   ` Hatayama, Daisuke
2013-02-18  0:16     ` Hatayama, Daisuke
2013-03-27  5:51 ` makedumpfile mmap() benchmark Jingbai Ma
2013-03-27  5:51   ` Jingbai Ma
2013-03-27  6:23   ` HATAYAMA Daisuke
2013-03-27  6:23     ` HATAYAMA Daisuke
2013-03-27  6:35     ` Jingbai Ma
2013-03-27  6:35       ` Jingbai Ma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.