linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kdump, oldmem: support mmap on /dev/oldmem
@ 2013-02-04  4:59 Hatayama, Daisuke
  2013-02-05 15:12 ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Hatayama, Daisuke @ 2013-02-04  4:59 UTC (permalink / raw)
  To: ebiederm, vgoyal, cpw, kumagai-atsushi, lisa.mitchell; +Cc: kexec, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7228 bytes --]

Support mmap() on /dev/oldmem to improve performance of reading
/proc/vmcore. Currently, read to /proc/vmcore is done by read_oldmem()
that uses ioremap and iounmap per a single page; for example, if
memory is 1GB, ioremap/iounmap is called (1GB / 4KB)-times, that is,
262144 times. This causes big performance degradation.

By this patch, we saw improvement on simple benchmark from

  200 [MiB/sec] to over 100.00 [GiB/sec].

Benchmark
=========

= Machine spec
  - CPU: Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz (4 sockets, 8 cores) (*)
  - memory: 32GB
  - kernel: 3.8-rc5 with this patch
  - vmcore size: 31.7GB

  (*) only 1 cpu is used in the 2nd kernel now.

= Benchmark Case

1) copy /proc/vmcore with mmap() on /dev/oldmem

  I ran the next command and recorded real time:

  $ for n in $(seq 1 15) ; do \
  >   time copyvmcore --blocksize=$((4096 * (1 << (n - 1)))) /proc/vmcore /dev/null \
  > done

  where copyvmcore is an ad-hoc test tool that parses ELF headers
  and copies them sequentially using mmap() on /dev/oldmem.
  See attached file.

|  n | map size |  time | page table | performance |
|    |          | (sec) |            |   [GiB/sec] |
|----+----------+-------+------------+-------------|
|  1 | 4 KiB    | 41.86 | 8 iB       |        0.76 |
|  2 | 8 KiB    | 25.43 | 16 iB      |        1.25 |
|  3 | 16 KiB   | 13.28 | 32 iB      |        2.39 |
|  4 | 32 KiB   |  7.20 | 64 iB      |        4.40 |
|  5 | 64 KiB   |  3.45 | 128 iB     |        9.19 |
|  6 | 128 KiB  |  1.82 | 256 iB     |       17.42 |
|  7 | 256 KiB  |  1.03 | 512 iB     |       30.78 |
|  8 | 512 KiB  |  0.61 | 1K iB      |       51.97 |
|  9 | 1 MiB    |  0.41 | 2K iB      |       77.32 |
| 10 | 2 MiB    |  0.32 | 4K iB      |       99.06 |
| 11 | 4 MiB    |  0.27 | 8K iB      |      117.41 |
| 12 | 8 MiB    |  0.24 | 16 KiB     |      132.08 |
| 13 | 16 MiB   |  0.23 | 32 KiB     |      137.83 |
| 14 | 32 MiB   |  0.22 | 64 KiB     |      144.09 |
| 15 | 64 MiB   |  0.22 | 128 KiB    |      144.09 |

2) copy /proc/vmcore without mmap() on /dev/oldmem

$ time dd bs=4096 if=/proc/vmcore of=/dev/null
8307246+1 records in
8307246+1 records out
real    2m 31.50s
user    0m 1.06s
sys     2m 27.60s

So performance is 214.26 [MiB/sec].

3) The benchmark on previous patch

  See:

  http://lists.infradead.org/pipermail/kexec/2013-January/007758.html

  where more than 2.5 [GiB/sec] improvement was shown.

= Discussion

When map size is small, there are many mmap() calls and we can see the
same situation as ioremap() case. When map size is large enough, we
can see drastic improvement. This is because the number of mmap() is
as small enough as page table modification and TLB flush doesn't
matter. Another reason why performance is drastically better than the
previous patch's is that memory copy from kernel-space to user-space
is no longer performed now.

Performance improvement is saturated in relatively small map size and
so page table is relatively small. I guess we don't need to support
large pages on remap_pfn_range() for now or for ever.

Design Concern
==============

The previous patch mapped a whole memory range targeted by kdump at
the same time on linear direct-mapping region. But doing that way is,
on the worst case, difficult to estimate amount of memory used for
page table to map the range.

Although I then once tried to investigate how to improve the issue by
mapping all the DIMM ranges that are all expected to be 1GB-aligned
and so 1GB pages are effective for small memory, I didn't choose this
way due to one memory hot-plugging issue related to undefined
behaviour of reading physical memory hole typically resulting in
system hang and another issue complicating the address to the 1st
issue, that there's no reliable source of actually present DIMM ranges
for use of kernel; SMBIOS is not sufficient since not all the
firmwares exports them.

On the other hand, /dev/oldmem is a simple interface whose offset
value corresponds to physical address of a whole system memory, and
even by this, there's room for userland tool to improve performance in
enough quality. For example, makedumpfile causes the performance in
reading huge consequtive memory sequentially, such as huge mem_map
array or each chunk corresponding to PT_LOAD map. We can improve the
performance even by using mmap() only there.

For design decision, I didn't support mmap() on /proc/vmcore because
it abstracts old memory as ELF format, so there's range consequtive on
/proc/vmcore but not consequtive on the actual old memory. For
example, consider ELF headers on the 2nd kernel and the note objects,
memory chunks corresponding to PT_LOAD entries on the first kernel.
They are not consequtive on the old memory. So reampping them so
/proc/vmcore appears consequtive using existing remap_pfn_range() needs
some complicated work.

TODO
====

- fix makedumpfile to use mmap() on /dev/oldmem and benchmark it to
  confirm whether we can see enough performance improvement.

Test
====

Tested and built on x86_64.

Thanks.
HATAYAMA, Daisuke

>From cf89aace87c8e7192909eb35334a139143a806e8 Mon Sep 17 00:00:00 2001
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Date: Wed, 30 Jan 2013 13:02:02 +0900
Subject: [PATCH] kdump, oldmem: support mmap on /dev/oldmem

Support mmap() on /dev/oldmem to improve performance of reading
/proc/vmcore. Currently, read to /proc/vmcore is done by read_oldmem()
that uses ioremap and iounmap per a single page; for example, if
memory is 1GB, ioremap/iounmap is called (1GB / 4KB)-times, that is,
262144 times. This causes big performance degradation.

By this patch, we saw improvement on simple benchmark from

  200 [MiB/sec] to over 100.00 [GiB/sec].

We don't permit mapping over saved_max_pfn as is done in read_oldmem()
and mapping memory as neither writable nor executable.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---
 drivers/char/mem.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index c6fa3bc..e9046634 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -388,6 +388,32 @@ static ssize_t read_oldmem(struct file *file, char __user *buf,
 	}
 	return read;
 }
+
+/*
+ * Mmap memory corresponding to the old kernel.
+ */
+static int mmap_oldmem(struct file *file, struct vm_area_struct *vma)
+{
+	size_t size = vma->vm_end - vma->vm_start;
+	unsigned long pfn = vma->vm_pgoff;
+
+	if (pfn + (size >> PAGE_SHIFT) > saved_max_pfn + 1)
+		return -EINVAL;
+
+	if (vma->vm_flags & (VM_WRITE | VM_EXEC))
+		return -EPERM;
+
+	vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+
+	if (remap_pfn_range(vma,
+			    vma->vm_start,
+			    pfn,
+			    size,
+			    vma->vm_page_prot))
+		return -EAGAIN;
+
+	return 0;
+}
 #endif
 
 #ifdef CONFIG_DEVKMEM
@@ -806,6 +832,7 @@ static const struct file_operations oldmem_fops = {
 	.read	= read_oldmem,
 	.open	= open_oldmem,
 	.llseek = default_llseek,
+	.mmap	= mmap_oldmem,
 };
 #endif
 
-- 
1.7.7.6


[-- Attachment #2: tool.tar.gz --]
[-- Type: application/x-gzip, Size: 2173 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
  2013-02-04  4:59 [PATCH] kdump, oldmem: support mmap on /dev/oldmem Hatayama, Daisuke
@ 2013-02-05 15:12 ` Vivek Goyal
  2013-02-06  7:24   ` Hatayama, Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2013-02-05 15:12 UTC (permalink / raw)
  To: Hatayama, Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Mon, Feb 04, 2013 at 04:59:35AM +0000, Hatayama, Daisuke wrote:
> Support mmap() on /dev/oldmem to improve performance of reading
> /proc/vmcore. Currently, read to /proc/vmcore is done by read_oldmem()
> that uses ioremap and iounmap per a single page; for example, if
> memory is 1GB, ioremap/iounmap is called (1GB / 4KB)-times, that is,
> 262144 times. This causes big performance degradation.
> 
> By this patch, we saw improvement on simple benchmark from
> 
>   200 [MiB/sec] to over 100.00 [GiB/sec].

Impressve improvement. Thanks for the patch.

[..]
> For design decision, I didn't support mmap() on /proc/vmcore because
> it abstracts old memory as ELF format, so there's range consequtive on
> /proc/vmcore but not consequtive on the actual old memory. For
> example, consider ELF headers on the 2nd kernel and the note objects,
> memory chunks corresponding to PT_LOAD entries on the first kernel.
> They are not consequtive on the old memory. So reampping them so
> /proc/vmcore appears consequtive using existing remap_pfn_range() needs
> some complicated work.

Can't we call remap_pfn_range() multiple times. Once for each sequential
range of memory. /proc/vmcore already has list of contiguous memory areas.
So we can parse user passed file offset and size and map into respective
physical chunks and call rempa_pfn_range() on all these chunks.

I think supporting mmap() both on /dev/oldmem as well as /proc/vmcore will
be nice.

Agreed that supporting mmap() on /proc/vmcore is more work as compared to
/dev/oldmem but should be doable.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
  2013-02-05 15:12 ` Vivek Goyal
@ 2013-02-06  7:24   ` Hatayama, Daisuke
  2013-02-07 15:06     ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Hatayama, Daisuke @ 2013-02-06  7:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
Date: Tue, 5 Feb 2013 10:12:56 -0500

> On Mon, Feb 04, 2013 at 04:59:35AM +0000, Hatayama, Daisuke wrote:

> [..]
>> For design decision, I didn't support mmap() on /proc/vmcore because
>> it abstracts old memory as ELF format, so there's range consequtive on
>> /proc/vmcore but not consequtive on the actual old memory. For
>> example, consider ELF headers on the 2nd kernel and the note objects,
>> memory chunks corresponding to PT_LOAD entries on the first kernel.
>> They are not consequtive on the old memory. So reampping them so
>> /proc/vmcore appears consequtive using existing remap_pfn_range() needs
>> some complicated work.
> 
> Can't we call remap_pfn_range() multiple times. Once for each sequential
> range of memory. /proc/vmcore already has list of contiguous memory areas.
> So we can parse user passed file offset and size and map into respective
> physical chunks and call rempa_pfn_range() on all these chunks.
> 
> I think supporting mmap() both on /dev/oldmem as well as /proc/vmcore will
> be nice.
> 
> Agreed that supporting mmap() on /proc/vmcore is more work as compared to
> /dev/oldmem but should be doable.
> 

The complication to support mmap() on /proc/vmcore lies in kdump
side. Objects exported from /proc/vmcore needs to be page-size aligned
on /proc/vmcore. This comes from the restriction of mmap() that
requires user-space address and physical address to be page-size
aligned.

As I said in the description, objects implicitly referened by
/proc/vmcore are

  - ELF headers,
  - NOTE objects (NT_PRSTATUS entries x cpus, VMCOREINFO), and
  - memory chunks x (the number of PT_LOAD entries).

Note objects are scattered on old memory. They are exported as a
single NOTE entry from program headers, so they need to be gathered at
the same location in the 2nd kernel starting from the page-size
aligned address.

VMCOREINFO is about 1.5KB on 2.6.32 kernel. One NT_PRSTATUS is 355
bytes. Recent limit of NR_CPUS is 5120 on x86_64. So less than about 2
MB is enough even on the worst case.

Note that the format of /proc/vmcore need to change since offset of
each object need to be page-size aligned.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
  2013-02-06  7:24   ` Hatayama, Daisuke
@ 2013-02-07 15:06     ` Vivek Goyal
  2013-02-08  0:25       ` Hatayama, Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Vivek Goyal @ 2013-02-07 15:06 UTC (permalink / raw)
  To: Hatayama, Daisuke
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

On Wed, Feb 06, 2013 at 07:24:46AM +0000, Hatayama, Daisuke wrote:
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
> Date: Tue, 5 Feb 2013 10:12:56 -0500
> 
> > On Mon, Feb 04, 2013 at 04:59:35AM +0000, Hatayama, Daisuke wrote:
> 
> > [..]
> >> For design decision, I didn't support mmap() on /proc/vmcore because
> >> it abstracts old memory as ELF format, so there's range consequtive on
> >> /proc/vmcore but not consequtive on the actual old memory. For
> >> example, consider ELF headers on the 2nd kernel and the note objects,
> >> memory chunks corresponding to PT_LOAD entries on the first kernel.
> >> They are not consequtive on the old memory. So reampping them so
> >> /proc/vmcore appears consequtive using existing remap_pfn_range() needs
> >> some complicated work.
> > 
> > Can't we call remap_pfn_range() multiple times. Once for each sequential
> > range of memory. /proc/vmcore already has list of contiguous memory areas.
> > So we can parse user passed file offset and size and map into respective
> > physical chunks and call rempa_pfn_range() on all these chunks.
> > 
> > I think supporting mmap() both on /dev/oldmem as well as /proc/vmcore will
> > be nice.
> > 
> > Agreed that supporting mmap() on /proc/vmcore is more work as compared to
> > /dev/oldmem but should be doable.
> > 
> 
> The complication to support mmap() on /proc/vmcore lies in kdump
> side. Objects exported from /proc/vmcore needs to be page-size aligned
> on /proc/vmcore. This comes from the restriction of mmap() that
> requires user-space address and physical address to be page-size
> aligned.
> 
> As I said in the description, objects implicitly referened by
> /proc/vmcore are
> 
>   - ELF headers,
>   - NOTE objects (NT_PRSTATUS entries x cpus, VMCOREINFO), and
>   - memory chunks x (the number of PT_LOAD entries).
> 
> Note objects are scattered on old memory. They are exported as a
> single NOTE entry from program headers, so they need to be gathered at
> the same location in the 2nd kernel starting from the page-size
> aligned address.
> 
> VMCOREINFO is about 1.5KB on 2.6.32 kernel. One NT_PRSTATUS is 355
> bytes. Recent limit of NR_CPUS is 5120 on x86_64. So less than about 2
> MB is enough even on the worst case.
> 
> Note that the format of /proc/vmcore need to change since offset of
> each object need to be page-size aligned.

Ok, got it. So everything needs to be page aligned and if size is not
sufficient then we need a way to pad memory areas to make next object
page aligned. 

To begin with supporting mmap on /dev/oldmem is fine with me. Once that
gets in, it will be good to look at how to make all the individual items
page aligned  so that mmap can be supported on /proc/vmcore.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
  2013-02-07 15:06     ` Vivek Goyal
@ 2013-02-08  0:25       ` Hatayama, Daisuke
  2013-02-08  0:33         ` Hatayama, Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Hatayama, Daisuke @ 2013-02-08  0:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: ebiederm, cpw, kumagai-atsushi, lisa.mitchell, kexec, linux-kernel

> From: Vivek Goyal [mailto:vgoyal@redhat.com]
> Sent: Friday, February 08, 2013 12:06 AM

> On Wed, Feb 06, 2013 at 07:24:46AM +0000, Hatayama, Daisuke wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > Subject: Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
> > Date: Tue, 5 Feb 2013 10:12:56 -0500
> >
> > > On Mon, Feb 04, 2013 at 04:59:35AM +0000, Hatayama, Daisuke wrote:
> >
> > > [..]
> > >> For design decision, I didn't support mmap() on /proc/vmcore because
> > >> it abstracts old memory as ELF format, so there's range consequtive
> on
> > >> /proc/vmcore but not consequtive on the actual old memory. For
> > >> example, consider ELF headers on the 2nd kernel and the note objects,
> > >> memory chunks corresponding to PT_LOAD entries on the first kernel.
> > >> They are not consequtive on the old memory. So reampping them so
> > >> /proc/vmcore appears consequtive using existing remap_pfn_range()
> needs
> > >> some complicated work.
> > >
> > > Can't we call remap_pfn_range() multiple times. Once for each sequential
> > > range of memory. /proc/vmcore already has list of contiguous memory
> areas.
> > > So we can parse user passed file offset and size and map into respective
> > > physical chunks and call rempa_pfn_range() on all these chunks.
> > >
> > > I think supporting mmap() both on /dev/oldmem as well as /proc/vmcore
> will
> > > be nice.
> > >
> > > Agreed that supporting mmap() on /proc/vmcore is more work as compared
> to
> > > /dev/oldmem but should be doable.
> > >
> >
> > The complication to support mmap() on /proc/vmcore lies in kdump
> > side. Objects exported from /proc/vmcore needs to be page-size aligned
> > on /proc/vmcore. This comes from the restriction of mmap() that
> > requires user-space address and physical address to be page-size
> > aligned.
> >
> > As I said in the description, objects implicitly referened by
> > /proc/vmcore are
> >
> >   - ELF headers,
> >   - NOTE objects (NT_PRSTATUS entries x cpus, VMCOREINFO), and
> >   - memory chunks x (the number of PT_LOAD entries).
> >
> > Note objects are scattered on old memory. They are exported as a
> > single NOTE entry from program headers, so they need to be gathered at
> > the same location in the 2nd kernel starting from the page-size
> > aligned address.
> >
> > VMCOREINFO is about 1.5KB on 2.6.32 kernel. One NT_PRSTATUS is 355
> > bytes. Recent limit of NR_CPUS is 5120 on x86_64. So less than about 2
> > MB is enough even on the worst case.
> >
> > Note that the format of /proc/vmcore need to change since offset of
> > each object need to be page-size aligned.
> 
> Ok, got it. So everything needs to be page aligned and if size is not
> sufficient then we need a way to pad memory areas to make next object
> page aligned.
> 
> To begin with supporting mmap on /dev/oldmem is fine with me. Once that
> gets in, it will be good to look at how to make all the individual items
> page aligned  so that mmap can be supported on /proc/vmcore.

I'm already beginning with making the patch set. At the time when I was writing /dev/oldmem patch, I was confused that remap_pfn_range must have been rewritten. But the complication is in fact simpler. I would post it early next week.

By the way, the third argument pfn of remap_pfn_range is defined as unsigned long, of 4 bytes on 32-bit x86. On PAE paging 32-bit linear addresses are converted maximally to 52-bit physical addresses (4 PiB). We need 40 bits to fully represent all page frame numbers over 52-bit physical memory space, so 4 bytes is not enough. But it seems to me unlikely that there's users who want to use huge memory with 32-bit kernel. I'll not support 32-bit x86 at least on the next patch.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
  2013-02-08  0:25       ` Hatayama, Daisuke
@ 2013-02-08  0:33         ` Hatayama, Daisuke
  2013-02-11 20:44           ` Vivek Goyal
  0 siblings, 1 reply; 7+ messages in thread
From: Hatayama, Daisuke @ 2013-02-08  0:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

> From: kexec-bounces@lists.infradead.org
> Sent: Friday, February 08, 2013 9:26 AM

> > From: Vivek Goyal [mailto:vgoyal@redhat.com]
> > Sent: Friday, February 08, 2013 12:06 AM
> 
> > On Wed, Feb 06, 2013 at 07:24:46AM +0000, Hatayama, Daisuke wrote:
> > > From: Vivek Goyal <vgoyal@redhat.com>
> > > Subject: Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
> > > Date: Tue, 5 Feb 2013 10:12:56 -0500
> > >
> > > > On Mon, Feb 04, 2013 at 04:59:35AM +0000, Hatayama, Daisuke wrote:
> > >
> > > > [..]
> > > >> For design decision, I didn't support mmap() on /proc/vmcore because
> > > >> it abstracts old memory as ELF format, so there's range consequtive
> > on
> > > >> /proc/vmcore but not consequtive on the actual old memory. For
> > > >> example, consider ELF headers on the 2nd kernel and the note objects,
> > > >> memory chunks corresponding to PT_LOAD entries on the first kernel.
> > > >> They are not consequtive on the old memory. So reampping them so
> > > >> /proc/vmcore appears consequtive using existing remap_pfn_range()
> > needs
> > > >> some complicated work.
> > > >
> > > > Can't we call remap_pfn_range() multiple times. Once for each
> sequential
> > > > range of memory. /proc/vmcore already has list of contiguous memory
> > areas.
> > > > So we can parse user passed file offset and size and map into respective
> > > > physical chunks and call rempa_pfn_range() on all these chunks.
> > > >
> > > > I think supporting mmap() both on /dev/oldmem as well as /proc/vmcore
> > will
> > > > be nice.
> > > >
> > > > Agreed that supporting mmap() on /proc/vmcore is more work as compared
> > to
> > > > /dev/oldmem but should be doable.
> > > >
> > >
> > > The complication to support mmap() on /proc/vmcore lies in kdump
> > > side. Objects exported from /proc/vmcore needs to be page-size aligned
> > > on /proc/vmcore. This comes from the restriction of mmap() that
> > > requires user-space address and physical address to be page-size
> > > aligned.
> > >
> > > As I said in the description, objects implicitly referened by
> > > /proc/vmcore are
> > >
> > >   - ELF headers,
> > >   - NOTE objects (NT_PRSTATUS entries x cpus, VMCOREINFO), and
> > >   - memory chunks x (the number of PT_LOAD entries).
> > >
> > > Note objects are scattered on old memory. They are exported as a
> > > single NOTE entry from program headers, so they need to be gathered
> at
> > > the same location in the 2nd kernel starting from the page-size
> > > aligned address.
> > >
> > > VMCOREINFO is about 1.5KB on 2.6.32 kernel. One NT_PRSTATUS is 355
> > > bytes. Recent limit of NR_CPUS is 5120 on x86_64. So less than about
> 2
> > > MB is enough even on the worst case.
> > >
> > > Note that the format of /proc/vmcore need to change since offset of
> > > each object need to be page-size aligned.
> >
> > Ok, got it. So everything needs to be page aligned and if size is not
> > sufficient then we need a way to pad memory areas to make next object
> > page aligned.
> >
> > To begin with supporting mmap on /dev/oldmem is fine with me. Once that
> > gets in, it will be good to look at how to make all the individual items
> > page aligned  so that mmap can be supported on /proc/vmcore.
> 
> I'm already beginning with making the patch set. At the time when I was
> writing /dev/oldmem patch, I was confused that remap_pfn_range must have
> been rewritten. But the complication is in fact simpler. I would post it
> early next week.
> 
> By the way, the third argument pfn of remap_pfn_range is defined as unsigned
> long, of 4 bytes on 32-bit x86. On PAE paging 32-bit linear addresses are
> converted maximally to 52-bit physical addresses (4 PiB). We need 40 bits
> to fully represent all page frame numbers over 52-bit physical memory space,
> so 4 bytes is not enough. But it seems to me unlikely that there's users
> who want to use huge memory with 32-bit kernel. I'll not support 32-bit
> x86 at least on the next patch.

Also, remap_pfn_range is the function exported into kernel modules, so changing the third argument means changing ABI. Introducing a kind of remap_pfn_range_64 is a good idea? if there's someone needing the feature on 32-bit kernel.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kdump, oldmem: support mmap on /dev/oldmem
  2013-02-08  0:33         ` Hatayama, Daisuke
@ 2013-02-11 20:44           ` Vivek Goyal
  0 siblings, 0 replies; 7+ messages in thread
From: Vivek Goyal @ 2013-02-11 20:44 UTC (permalink / raw)
  To: Hatayama, Daisuke
  Cc: kexec, linux-kernel, lisa.mitchell, kumagai-atsushi, ebiederm, cpw

On Fri, Feb 08, 2013 at 12:33:49AM +0000, Hatayama, Daisuke wrote:

[..]
> Also, remap_pfn_range is the function exported into kernel modules, so changing the third argument means changing ABI. Introducing a kind of remap_pfn_range_64 is a good idea? if there's someone needing the feature on 32-bit kernel.

I thought kernel does not care about compatibility with out of tree
modules and any user of in-tree module is updated to use new API. I could
be wrong though...

Thanks
Vivek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-02-11 20:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-04  4:59 [PATCH] kdump, oldmem: support mmap on /dev/oldmem Hatayama, Daisuke
2013-02-05 15:12 ` Vivek Goyal
2013-02-06  7:24   ` Hatayama, Daisuke
2013-02-07 15:06     ` Vivek Goyal
2013-02-08  0:25       ` Hatayama, Daisuke
2013-02-08  0:33         ` Hatayama, Daisuke
2013-02-11 20:44           ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).