From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> To: ebiederm@xmission.com, vgoyal@redhat.com, cpw@sgi.com, kumagai-atsushi@mxc.nes.nec.co.jp, lisa.mitchell@hp.com Cc: kexec@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore Date: Thu, 14 Feb 2013 19:11:43 +0900 [thread overview] Message-ID: <20130214100945.22466.4172.stgit@localhost6.localdomain6> (raw) Currently, read to /proc/vmcore is done by read_oldmem() that uses ioremap/iounmap per a single page. For example, if memory is 1GB, ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144 times. This causes big performance degradation. To address the issue, this patch implements mmap() on /proc/vmcore to improve read performance. My simple benchmark shows the improvement from 200 [MiB/sec] to over 50.0 [GiB/sec]. Benchmark ========= = Machine spec - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*) - memory: 32GB - kernel: 3.8-rc6 with this patch - vmcore size: 31.7GB (*) only 1 cpu is used in the 2nd kernel now. = Benchmark Case 1) copy /proc/vmcore *WITHOUT* mmap() on /proc/vmcore $ time dd bs=4096 if=/proc/vmcore of=/dev/null 8307246+1 records in 8307246+1 records out real 2m 31.50s user 0m 1.06s sys 2m 27.60s So performance is 214.26 [MiB/sec]. 2) copy /proc/vmcore with mmap() I ran the next command and recorded real time: $ for n in $(seq 1 15) ; do \ > time copyvmcore2 --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \ > done where copyvmcore2 is an ad-hoc test tool that read data from /proc/vmcore via mmap() in given block-size unit and write them to some file. | n | map size | time | page table | performance | | | | (sec) | | [GiB/sec] | |----+----------+-------+------------+-------------| | 1 | 4 KiB | 78.35 | 8 iB | 0.40 | | 2 | 8 KiB | 45.29 | 16 iB | 0.70 | | 3 | 16 KiB | 23.82 | 32 iB | 1.33 | | 4 | 32 KiB | 12.90 | 64 iB | 2.46 | | 5 | 64 KiB | 6.13 | 128 iB | 5.17 | | 6 | 128 KiB | 3.26 | 256 iB | 9.72 | | 7 | 256 KiB | 1.86 | 512 iB | 17.04 | | 8 | 512 KiB | 1.13 | 1 KiB | 28.04 | | 9 | 1 MiB | 0.77 | 2 KiB | 41.16 | | 10 | 2 MiB | 0.58 | 4 KiB | 54.64 | | 11 | 4 MiB | 0.50 | 8 KiB | 63.38 | | 12 | 8 MiB | 0.46 | 16 KiB | 68.89 | | 13 | 16 MiB | 0.44 | 32 KiB | 72.02 | | 14 | 32 MiB | 0.44 | 64 KiB | 72.02 | | 15 | 64 MiB | 0.45 | 128 KiB | 70.42 | 3) copy /proc/vmcore with mmap() on /dev/oldmem I posted another patch series for mmap() on /dev/oldmem a few weeks ago. See: https://lkml.org/lkml/2013/2/3/431 Next is the table shown on the post showing the benchmark. | n | map size | time | page table | performance | | | | (sec) | | [GiB/sec] | |----+----------+-------+------------+-------------| | 1 | 4 KiB | 41.86 | 8 iB | 0.76 | | 2 | 8 KiB | 25.43 | 16 iB | 1.25 | | 3 | 16 KiB | 13.28 | 32 iB | 2.39 | | 4 | 32 KiB | 7.20 | 64 iB | 4.40 | | 5 | 64 KiB | 3.45 | 128 iB | 9.19 | | 6 | 128 KiB | 1.82 | 256 iB | 17.42 | | 7 | 256 KiB | 1.03 | 512 iB | 30.78 | | 8 | 512 KiB | 0.61 | 1K iB | 51.97 | | 9 | 1 MiB | 0.41 | 2K iB | 77.32 | | 10 | 2 MiB | 0.32 | 4K iB | 99.06 | | 11 | 4 MiB | 0.27 | 8K iB | 117.41 | | 12 | 8 MiB | 0.24 | 16 KiB | 132.08 | | 13 | 16 MiB | 0.23 | 32 KiB | 137.83 | | 14 | 32 MiB | 0.22 | 64 KiB | 144.09 | | 15 | 64 MiB | 0.22 | 128 KiB | 144.09 | = Discussion - For small map size, we can see performance degradation on mmap() case due to many page table modification and TLB flushes similarly to read_oldmem() case. But for large map size we can see the improved performance. Each application need to choose appropreate map size for their preferable performance. - mmap() on /dev/oldmem appears better than that on /proc/vmcore. But actual processing does not only copying but also IO work. This difference is not a problem. - Both mmap() case shows drastically better performance than previous RFC patch set's about 2.5 [GiB/sec] that maps all dump target memory in kernel direct mapping address space. This is because there's no longer memcpy() from kernel-space to user-space. Design ====== = Support Range - mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32 interface is used only if dump target size is less than 4GB. Then, the existing interface is enough in performance. = Change of /proc/vmcore format For mmap()'s page-size boundary requirement, /proc/vmcore changed its own shape and now put its objects in page-size boundary. - Allocate buffer for ELF headers in page-size boundary. => See [PATCH 01/13]. - Note objects scattered on old memory are copied in a single page-size aligned buffer on 2nd kernel, and it is remapped to user-space. => See [PATCH 09/13]. - The head and/or tail pages of memroy chunks are also copied on 2nd kernel if either of their ends is not page-size aligned. See => See [PATCH 12/13]. = 32-bit PAE limitation - On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory only since remap_pfn_range()'s third argument, pfn, has 32-bit length only, defined as unsigned long type. TODO ==== - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to confirm whether we can see enough performance improvement. Test ==== Done on x86-64, x86-32 both with 1GB and over 4GB memory environments. --- HATAYAMA Daisuke (13): vmcore: introduce mmap_vmcore() vmcore: copy non page-size aligned head and tail pages in 2nd kernel vmcore: count holes generated by round-up operation for vmcore size vmcore: round-up offset of vmcore object in page-size boundary vmcore: copy ELF note segments in buffer on 2nd kernel vmcore: remove unused helper function vmcore: modify read_vmcore() to read buffer on 2nd kernel vmcore: modify vmcore clean-up function to free buffer on 2nd kernel vmcore: modify ELF32 code according to new type vmcore: introduce types for objects copied in 2nd kernel vmcore: fill unused part of buffer for ELF headers with 0 vmcore: round up buffer size of ELF headers by PAGE_SIZE vmcore: allocate buffer for ELF headers on page-size alignment fs/proc/vmcore.c | 408 +++++++++++++++++++++++++++++++++++------------ include/linux/proc_fs.h | 11 + 2 files changed, 313 insertions(+), 106 deletions(-) -- Thanks. HATAYAMA, Daisuke
WARNING: multiple messages have this Message-ID (diff)
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> To: ebiederm@xmission.com, vgoyal@redhat.com, cpw@sgi.com, kumagai-atsushi@mxc.nes.nec.co.jp, lisa.mitchell@hp.com Cc: kexec@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore Date: Thu, 14 Feb 2013 19:11:43 +0900 [thread overview] Message-ID: <20130214100945.22466.4172.stgit@localhost6.localdomain6> (raw) Currently, read to /proc/vmcore is done by read_oldmem() that uses ioremap/iounmap per a single page. For example, if memory is 1GB, ioremap/iounmap is called (1GB / 4KB)-times, that is, 262144 times. This causes big performance degradation. To address the issue, this patch implements mmap() on /proc/vmcore to improve read performance. My simple benchmark shows the improvement from 200 [MiB/sec] to over 50.0 [GiB/sec]. Benchmark ========= = Machine spec - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*) - memory: 32GB - kernel: 3.8-rc6 with this patch - vmcore size: 31.7GB (*) only 1 cpu is used in the 2nd kernel now. = Benchmark Case 1) copy /proc/vmcore *WITHOUT* mmap() on /proc/vmcore $ time dd bs=4096 if=/proc/vmcore of=/dev/null 8307246+1 records in 8307246+1 records out real 2m 31.50s user 0m 1.06s sys 2m 27.60s So performance is 214.26 [MiB/sec]. 2) copy /proc/vmcore with mmap() I ran the next command and recorded real time: $ for n in $(seq 1 15) ; do \ > time copyvmcore2 --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \ > done where copyvmcore2 is an ad-hoc test tool that read data from /proc/vmcore via mmap() in given block-size unit and write them to some file. | n | map size | time | page table | performance | | | | (sec) | | [GiB/sec] | |----+----------+-------+------------+-------------| | 1 | 4 KiB | 78.35 | 8 iB | 0.40 | | 2 | 8 KiB | 45.29 | 16 iB | 0.70 | | 3 | 16 KiB | 23.82 | 32 iB | 1.33 | | 4 | 32 KiB | 12.90 | 64 iB | 2.46 | | 5 | 64 KiB | 6.13 | 128 iB | 5.17 | | 6 | 128 KiB | 3.26 | 256 iB | 9.72 | | 7 | 256 KiB | 1.86 | 512 iB | 17.04 | | 8 | 512 KiB | 1.13 | 1 KiB | 28.04 | | 9 | 1 MiB | 0.77 | 2 KiB | 41.16 | | 10 | 2 MiB | 0.58 | 4 KiB | 54.64 | | 11 | 4 MiB | 0.50 | 8 KiB | 63.38 | | 12 | 8 MiB | 0.46 | 16 KiB | 68.89 | | 13 | 16 MiB | 0.44 | 32 KiB | 72.02 | | 14 | 32 MiB | 0.44 | 64 KiB | 72.02 | | 15 | 64 MiB | 0.45 | 128 KiB | 70.42 | 3) copy /proc/vmcore with mmap() on /dev/oldmem I posted another patch series for mmap() on /dev/oldmem a few weeks ago. See: https://lkml.org/lkml/2013/2/3/431 Next is the table shown on the post showing the benchmark. | n | map size | time | page table | performance | | | | (sec) | | [GiB/sec] | |----+----------+-------+------------+-------------| | 1 | 4 KiB | 41.86 | 8 iB | 0.76 | | 2 | 8 KiB | 25.43 | 16 iB | 1.25 | | 3 | 16 KiB | 13.28 | 32 iB | 2.39 | | 4 | 32 KiB | 7.20 | 64 iB | 4.40 | | 5 | 64 KiB | 3.45 | 128 iB | 9.19 | | 6 | 128 KiB | 1.82 | 256 iB | 17.42 | | 7 | 256 KiB | 1.03 | 512 iB | 30.78 | | 8 | 512 KiB | 0.61 | 1K iB | 51.97 | | 9 | 1 MiB | 0.41 | 2K iB | 77.32 | | 10 | 2 MiB | 0.32 | 4K iB | 99.06 | | 11 | 4 MiB | 0.27 | 8K iB | 117.41 | | 12 | 8 MiB | 0.24 | 16 KiB | 132.08 | | 13 | 16 MiB | 0.23 | 32 KiB | 137.83 | | 14 | 32 MiB | 0.22 | 64 KiB | 144.09 | | 15 | 64 MiB | 0.22 | 128 KiB | 144.09 | = Discussion - For small map size, we can see performance degradation on mmap() case due to many page table modification and TLB flushes similarly to read_oldmem() case. But for large map size we can see the improved performance. Each application need to choose appropreate map size for their preferable performance. - mmap() on /dev/oldmem appears better than that on /proc/vmcore. But actual processing does not only copying but also IO work. This difference is not a problem. - Both mmap() case shows drastically better performance than previous RFC patch set's about 2.5 [GiB/sec] that maps all dump target memory in kernel direct mapping address space. This is because there's no longer memcpy() from kernel-space to user-space. Design ====== = Support Range - mmap() on /proc/vmcore is supported on ELF64 interface only. ELF32 interface is used only if dump target size is less than 4GB. Then, the existing interface is enough in performance. = Change of /proc/vmcore format For mmap()'s page-size boundary requirement, /proc/vmcore changed its own shape and now put its objects in page-size boundary. - Allocate buffer for ELF headers in page-size boundary. => See [PATCH 01/13]. - Note objects scattered on old memory are copied in a single page-size aligned buffer on 2nd kernel, and it is remapped to user-space. => See [PATCH 09/13]. - The head and/or tail pages of memroy chunks are also copied on 2nd kernel if either of their ends is not page-size aligned. See => See [PATCH 12/13]. = 32-bit PAE limitation - On 32-bit PAE limitation, mmap_vmcore() can handle upto 16TB memory only since remap_pfn_range()'s third argument, pfn, has 32-bit length only, defined as unsigned long type. TODO ==== - fix makedumpfile to use mmap() on /proc/vmcore and benchmark it to confirm whether we can see enough performance improvement. Test ==== Done on x86-64, x86-32 both with 1GB and over 4GB memory environments. --- HATAYAMA Daisuke (13): vmcore: introduce mmap_vmcore() vmcore: copy non page-size aligned head and tail pages in 2nd kernel vmcore: count holes generated by round-up operation for vmcore size vmcore: round-up offset of vmcore object in page-size boundary vmcore: copy ELF note segments in buffer on 2nd kernel vmcore: remove unused helper function vmcore: modify read_vmcore() to read buffer on 2nd kernel vmcore: modify vmcore clean-up function to free buffer on 2nd kernel vmcore: modify ELF32 code according to new type vmcore: introduce types for objects copied in 2nd kernel vmcore: fill unused part of buffer for ELF headers with 0 vmcore: round up buffer size of ELF headers by PAGE_SIZE vmcore: allocate buffer for ELF headers on page-size alignment fs/proc/vmcore.c | 408 +++++++++++++++++++++++++++++++++++------------ include/linux/proc_fs.h | 11 + 2 files changed, 313 insertions(+), 106 deletions(-) -- Thanks. HATAYAMA, Daisuke _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
next reply other threads:[~2013-02-14 10:11 UTC|newest] Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top 2013-02-14 10:11 HATAYAMA Daisuke [this message] 2013-02-14 10:11 ` [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke 2013-02-14 10:11 ` [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment HATAYAMA Daisuke 2013-02-14 10:11 ` HATAYAMA Daisuke 2013-02-15 15:01 ` Vivek Goyal 2013-02-15 15:01 ` Vivek Goyal 2013-02-14 10:11 ` [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE HATAYAMA Daisuke 2013-02-14 10:11 ` HATAYAMA Daisuke 2013-02-15 15:18 ` Vivek Goyal 2013-02-15 15:18 ` Vivek Goyal 2013-02-18 15:58 ` HATAYAMA Daisuke 2013-02-18 15:58 ` HATAYAMA Daisuke 2013-02-14 10:11 ` [PATCH 03/13] vmcore: fill unused part of buffer for ELF headers with 0 HATAYAMA Daisuke 2013-02-14 10:11 ` HATAYAMA Daisuke 2013-02-14 10:12 ` [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 15:28 ` Vivek Goyal 2013-02-15 15:28 ` Vivek Goyal 2013-02-18 16:06 ` HATAYAMA Daisuke 2013-02-18 16:06 ` HATAYAMA Daisuke 2013-02-19 23:07 ` Vivek Goyal 2013-02-19 23:07 ` Vivek Goyal 2013-02-14 10:12 ` [PATCH 05/13] vmcore: modify ELF32 code according to new type HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 15:30 ` Vivek Goyal 2013-02-15 15:30 ` Vivek Goyal 2013-02-18 16:11 ` HATAYAMA Daisuke 2013-02-18 16:11 ` HATAYAMA Daisuke 2013-02-14 10:12 ` [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 15:32 ` Vivek Goyal 2013-02-15 15:32 ` Vivek Goyal 2013-02-14 10:12 ` [PATCH 07/13] vmcore: modify read_vmcore() to read " HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 15:51 ` Vivek Goyal 2013-02-15 15:51 ` Vivek Goyal 2013-02-14 10:12 ` [PATCH 08/13] vmcore: remove unused helper function HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 15:52 ` Vivek Goyal 2013-02-15 15:52 ` Vivek Goyal 2013-02-14 10:12 ` [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 16:53 ` Vivek Goyal 2013-02-15 16:53 ` Vivek Goyal 2013-02-18 17:02 ` HATAYAMA Daisuke 2013-02-18 17:02 ` HATAYAMA Daisuke 2013-02-19 23:05 ` Vivek Goyal 2013-02-19 23:05 ` Vivek Goyal 2013-02-14 10:12 ` [PATCH 10/13] vmcore: round-up offset of vmcore object in page-size boundary HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-14 10:12 ` [PATCH 11/13] vmcore: count holes generated by round-up operation for vmcore size HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-14 10:12 ` [PATCH 12/13] vmcore: copy non page-size aligned head and tail pages in 2nd kernel HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-14 10:12 ` [PATCH 13/13] vmcore: introduce mmap_vmcore() HATAYAMA Daisuke 2013-02-14 10:12 ` HATAYAMA Daisuke 2013-02-15 3:57 ` [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore Atsushi Kumagai 2013-02-15 3:57 ` Atsushi Kumagai 2013-02-18 0:16 ` Hatayama, Daisuke 2013-02-18 0:16 ` Hatayama, Daisuke 2013-03-27 5:51 ` makedumpfile mmap() benchmark Jingbai Ma 2013-03-27 5:51 ` Jingbai Ma 2013-03-27 6:23 ` HATAYAMA Daisuke 2013-03-27 6:23 ` HATAYAMA Daisuke 2013-03-27 6:35 ` Jingbai Ma 2013-03-27 6:35 ` Jingbai Ma
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20130214100945.22466.4172.stgit@localhost6.localdomain6 \ --to=d.hatayama@jp.fujitsu.com \ --cc=cpw@sgi.com \ --cc=ebiederm@xmission.com \ --cc=kexec@lists.infradead.org \ --cc=kumagai-atsushi@mxc.nes.nec.co.jp \ --cc=linux-kernel@vger.kernel.org \ --cc=lisa.mitchell@hp.com \ --cc=vgoyal@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.