All of lore.kernel.org
 help / color / mirror / Atom feed
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
To: vgoyal@redhat.com
Cc: ebiederm@xmission.com, cpw@sgi.com,
	kumagai-atsushi@mxc.nes.nec.co.jp, lisa.mitchell@hp.com,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
Date: Tue, 19 Feb 2013 02:02:34 +0900 (JST)	[thread overview]
Message-ID: <20130219.020234.255275278.d.hatayama@jp.fujitsu.com> (raw)
In-Reply-To: <20130215165327.GH27784@redhat.com>

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
Date: Fri, 15 Feb 2013 11:53:27 -0500

> On Thu, Feb 14, 2013 at 07:12:32PM +0900, HATAYAMA Daisuke wrote:
>> Objects exported from ELF note segments are in fact located apart from
>> each other on old memory. But on /proc/vmcore they are exported as a
>> single ELF note segment. To satisfy mmap()'s page-size boundary
>> requirement, copy them in a page-size aligned buffer allocated by
>> __get_free_pages() on 2nd kernel and remap the buffer to user-space.
>> 
>> The buffer for ELF note segments is added to vmcore_list as the object
>> of VMCORE_2ND_KERNEL type.
>> 
>> Copy of ELF note segments is done in two pass: first pass tries to
>> calculate real total size of ELF note segments, and then 2nd pass
>> copies the segment data into the buffer of the real total size.
> 
> Ok, this is the part I am not very happy with. I don't like the idea
> of copying notes into second kernel. It has potential to bloat our
> memory usage requirements in second kernel.
> 
> For example, we allocate a 4K page for each cpu and a huge machine
> say 4096 cpu, 16MB of more memory is required. Not that it is big
> concern for a 4K cpu machine, still if we can avoid copying notes from
> previous kernel, it will be good.

I also estimated the worst case, but it was more optimistically done
than yours. In my case, estimation was at most less than 2MB on
x86_64: roundup(5112 cpus x sizeof (struct user_struct_regs),
PAGE_SIZE) is about 1MB. But I didn't consider other architectures and
now noticed s390 collects notes more agressively.

> 
> So the problem is that note size from previous kernel might not be
> page aligned. And in /proc/vmcore view all the notes are supposed
> to be contiguous. 
> 
> Thinking loud.
> 
> - Can we introduce multiple PT_NOTE program headers. One for each note
>   data. I am not sure if this will break existing user space tools like
>   gdb, crash etc.
> 
> - Or can we pad the notes with a new note type say "VMCORE_PAD". This is
>   similar to "VMCOREINFO" just that it is used for padding to make sure
>   notes can be page aligned. User space tools should simple ignore
>   the VMCORE_PAD notes and move on to next note.
> 
> I think I like second idea better and given the fact that gdb did not
> break with introduction of "VMCOREINFO" note type, it should not break
> when we introduce another note type.
> 
> If this works, you don't have to copy notes in second kernel?

I also think the second one is better. Yes, I have in fact already had
a similar idea. It's of course possible.

I have never seen tools assuming multiple PT_NOTE entries if I have
good memory. And, tools like gdb interpret note information not only
by their contents but also their order. For example, n-th NT_PRSTATUS
is considered as n-th thread or n-th CPU's data. It seems to me that
adding case of multiple PT_NOTE entires possibly make things
unnecessarily complicated.

BTW, on kexec/kdump design, we never assume that the first and the
second kernels are always the same. This means that we cannot assume
that the first kernel always puts their notes in page-size boundary in
the above way. So we need to check whether each note entry is in
page-size boundary one by one, and if one entry is not in page-size
boundary, then we need to copy it in the 2nd kernel (and appends the
pad note to it.) Copying is still necessary in the worst case.

Anyway, what I'll do in the next version, are in summary:

- append pad notes in each notes on the 1st kernel in every
  architectures, and
- check if each note is in page-size boundary, and if not so, copy it
  in the 2nd kernel and then append pad notes to it.

Thanks.
HATAYAMA, Daisuke


WARNING: multiple messages have this Message-ID (diff)
From: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
To: vgoyal@redhat.com
Cc: kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	lisa.mitchell@hp.com, kumagai-atsushi@mxc.nes.nec.co.jp,
	ebiederm@xmission.com, cpw@sgi.com
Subject: Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
Date: Tue, 19 Feb 2013 02:02:34 +0900 (JST)	[thread overview]
Message-ID: <20130219.020234.255275278.d.hatayama@jp.fujitsu.com> (raw)
In-Reply-To: <20130215165327.GH27784@redhat.com>

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel
Date: Fri, 15 Feb 2013 11:53:27 -0500

> On Thu, Feb 14, 2013 at 07:12:32PM +0900, HATAYAMA Daisuke wrote:
>> Objects exported from ELF note segments are in fact located apart from
>> each other on old memory. But on /proc/vmcore they are exported as a
>> single ELF note segment. To satisfy mmap()'s page-size boundary
>> requirement, copy them in a page-size aligned buffer allocated by
>> __get_free_pages() on 2nd kernel and remap the buffer to user-space.
>> 
>> The buffer for ELF note segments is added to vmcore_list as the object
>> of VMCORE_2ND_KERNEL type.
>> 
>> Copy of ELF note segments is done in two pass: first pass tries to
>> calculate real total size of ELF note segments, and then 2nd pass
>> copies the segment data into the buffer of the real total size.
> 
> Ok, this is the part I am not very happy with. I don't like the idea
> of copying notes into second kernel. It has potential to bloat our
> memory usage requirements in second kernel.
> 
> For example, we allocate a 4K page for each cpu and a huge machine
> say 4096 cpu, 16MB of more memory is required. Not that it is big
> concern for a 4K cpu machine, still if we can avoid copying notes from
> previous kernel, it will be good.

I also estimated the worst case, but it was more optimistically done
than yours. In my case, estimation was at most less than 2MB on
x86_64: roundup(5112 cpus x sizeof (struct user_struct_regs),
PAGE_SIZE) is about 1MB. But I didn't consider other architectures and
now noticed s390 collects notes more agressively.

> 
> So the problem is that note size from previous kernel might not be
> page aligned. And in /proc/vmcore view all the notes are supposed
> to be contiguous. 
> 
> Thinking loud.
> 
> - Can we introduce multiple PT_NOTE program headers. One for each note
>   data. I am not sure if this will break existing user space tools like
>   gdb, crash etc.
> 
> - Or can we pad the notes with a new note type say "VMCORE_PAD". This is
>   similar to "VMCOREINFO" just that it is used for padding to make sure
>   notes can be page aligned. User space tools should simple ignore
>   the VMCORE_PAD notes and move on to next note.
> 
> I think I like second idea better and given the fact that gdb did not
> break with introduction of "VMCOREINFO" note type, it should not break
> when we introduce another note type.
> 
> If this works, you don't have to copy notes in second kernel?

I also think the second one is better. Yes, I have in fact already had
a similar idea. It's of course possible.

I have never seen tools assuming multiple PT_NOTE entries if I have
good memory. And, tools like gdb interpret note information not only
by their contents but also their order. For example, n-th NT_PRSTATUS
is considered as n-th thread or n-th CPU's data. It seems to me that
adding case of multiple PT_NOTE entires possibly make things
unnecessarily complicated.

BTW, on kexec/kdump design, we never assume that the first and the
second kernels are always the same. This means that we cannot assume
that the first kernel always puts their notes in page-size boundary in
the above way. So we need to check whether each note entry is in
page-size boundary one by one, and if one entry is not in page-size
boundary, then we need to copy it in the 2nd kernel (and appends the
pad note to it.) Copying is still necessary in the worst case.

Anyway, what I'll do in the next version, are in summary:

- append pad notes in each notes on the 1st kernel in every
  architectures, and
- check if each note is in page-size boundary, and if not so, copy it
  in the 2nd kernel and then append pad notes to it.

Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2013-02-18 17:02 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-14 10:11 [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore HATAYAMA Daisuke
2013-02-14 10:11 ` HATAYAMA Daisuke
2013-02-14 10:11 ` [PATCH 01/13] vmcore: allocate buffer for ELF headers on page-size alignment HATAYAMA Daisuke
2013-02-14 10:11   ` HATAYAMA Daisuke
2013-02-15 15:01   ` Vivek Goyal
2013-02-15 15:01     ` Vivek Goyal
2013-02-14 10:11 ` [PATCH 02/13] vmcore: round up buffer size of ELF headers by PAGE_SIZE HATAYAMA Daisuke
2013-02-14 10:11   ` HATAYAMA Daisuke
2013-02-15 15:18   ` Vivek Goyal
2013-02-15 15:18     ` Vivek Goyal
2013-02-18 15:58     ` HATAYAMA Daisuke
2013-02-18 15:58       ` HATAYAMA Daisuke
2013-02-14 10:11 ` [PATCH 03/13] vmcore: fill unused part of buffer for ELF headers with 0 HATAYAMA Daisuke
2013-02-14 10:11   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 04/13] vmcore: introduce types for objects copied in 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:28   ` Vivek Goyal
2013-02-15 15:28     ` Vivek Goyal
2013-02-18 16:06     ` HATAYAMA Daisuke
2013-02-18 16:06       ` HATAYAMA Daisuke
2013-02-19 23:07       ` Vivek Goyal
2013-02-19 23:07         ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 05/13] vmcore: modify ELF32 code according to new type HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:30   ` Vivek Goyal
2013-02-15 15:30     ` Vivek Goyal
2013-02-18 16:11     ` HATAYAMA Daisuke
2013-02-18 16:11       ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 06/13] vmcore: modify vmcore clean-up function to free buffer on 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:32   ` Vivek Goyal
2013-02-15 15:32     ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 07/13] vmcore: modify read_vmcore() to read " HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:51   ` Vivek Goyal
2013-02-15 15:51     ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 08/13] vmcore: remove unused helper function HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 15:52   ` Vivek Goyal
2013-02-15 15:52     ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 09/13] vmcore: copy ELF note segments in buffer on 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15 16:53   ` Vivek Goyal
2013-02-15 16:53     ` Vivek Goyal
2013-02-18 17:02     ` HATAYAMA Daisuke [this message]
2013-02-18 17:02       ` HATAYAMA Daisuke
2013-02-19 23:05       ` Vivek Goyal
2013-02-19 23:05         ` Vivek Goyal
2013-02-14 10:12 ` [PATCH 10/13] vmcore: round-up offset of vmcore object in page-size boundary HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 11/13] vmcore: count holes generated by round-up operation for vmcore size HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 12/13] vmcore: copy non page-size aligned head and tail pages in 2nd kernel HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-14 10:12 ` [PATCH 13/13] vmcore: introduce mmap_vmcore() HATAYAMA Daisuke
2013-02-14 10:12   ` HATAYAMA Daisuke
2013-02-15  3:57 ` [PATCH 00/13] kdump, vmcore: support mmap() on /proc/vmcore Atsushi Kumagai
2013-02-15  3:57   ` Atsushi Kumagai
2013-02-18  0:16   ` Hatayama, Daisuke
2013-02-18  0:16     ` Hatayama, Daisuke
2013-03-27  5:51 ` makedumpfile mmap() benchmark Jingbai Ma
2013-03-27  5:51   ` Jingbai Ma
2013-03-27  6:23   ` HATAYAMA Daisuke
2013-03-27  6:23     ` HATAYAMA Daisuke
2013-03-27  6:35     ` Jingbai Ma
2013-03-27  6:35       ` Jingbai Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130219.020234.255275278.d.hatayama@jp.fujitsu.com \
    --to=d.hatayama@jp.fujitsu.com \
    --cc=cpw@sgi.com \
    --cc=ebiederm@xmission.com \
    --cc=kexec@lists.infradead.org \
    --cc=kumagai-atsushi@mxc.nes.nec.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lisa.mitchell@hp.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.