linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v2] align crash_notes allocation to make it be inside one physical page
@ 2015-08-03 12:50 Baoquan He
  2015-08-03 22:04 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Baoquan He @ 2015-08-03 12:50 UTC (permalink / raw)
  To: akpm, ebiederm, vgoyal, dyoung, mhuang, lisa.mitchell, tatsu,
	seiji.aguchi.tr
  Cc: linux-kernel, kexec, Baoquan He

People reported that crash_notes in /proc/vmcore were corrupted and
this cause crash kdump failure. With code debugging and log we got
the root cause. This is because percpu variable crash_notes are
allocated in 2 vmalloc pages. Currently percpu is based on vmalloc
by default. Vmalloc can't guarantee 2 continuous vmalloc pages
are also on 2 continuous physical pages. So when 1st kernel exports
the starting address and size of crash_notes through sysfs like below:

/sys/devices/system/cpu/cpux/crash_notes
/sys/devices/system/cpu/cpux/crash_notes_size

kdump kernel use them to get the content of crash_notes. However the
2nd part may not be in the next neighbouring physical page as we
expected if crash_notes are allocated accross 2 vmalloc pages. That's
why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in
update_note_header_size_elf64() and cause note header merging failure
or some warnings.

In this patch change to call __alloc_percpu() to passed in the align
value by rounding crash_notes_size up to the nearest power of two.
This make sure the crash_notes is allocated inside one physical page
since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE.
Meanwhile add a WARN_ON in case it grows to be bigger than PAGE_SIZE
in the future.

v1->v2:
    Minfei mentioned percpu can't take align value bigger then PAGE_SIZE.
    So limit align to be PAGE_SIZE at most.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 kernel/kexec.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c10..9f4b070 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
 static int __init crash_notes_memory_init(void)
 {
 	/* Allocate memory for saving cpu registers. */
-	crash_notes = alloc_percpu(note_buf_t);
+	size_t size, align;
+	int order;
+
+	size = sizeof(note_buf_t);
+	order = get_count_order(size);
+	align = min_t(size_t, (1<<order), PAGE_SIZE);
+
+	WARN_ON(size > PAGE_SIZE);
+
+	crash_notes = __alloc_percpu(size, align);
 	if (!crash_notes) {
 		pr_warn("Kexec: Memory allocation for saving cpu register states failed\n");
 		return -ENOMEM;
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Patch v2] align crash_notes allocation to make it be inside one physical page
  2015-08-03 12:50 [Patch v2] align crash_notes allocation to make it be inside one physical page Baoquan He
@ 2015-08-03 22:04 ` Andrew Morton
  2015-08-03 23:01   ` Baoquan He
  2015-08-11  6:33   ` Baoquan He
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2015-08-03 22:04 UTC (permalink / raw)
  To: Baoquan He
  Cc: ebiederm, vgoyal, dyoung, mhuang, lisa.mitchell, tatsu,
	seiji.aguchi.tr, linux-kernel, kexec

On Mon,  3 Aug 2015 20:50:43 +0800 Baoquan He <bhe@redhat.com> wrote:

> People reported that crash_notes in /proc/vmcore were corrupted and
> this cause crash kdump failure. With code debugging and log we got
> the root cause. This is because percpu variable crash_notes are
> allocated in 2 vmalloc pages. Currently percpu is based on vmalloc
> by default. Vmalloc can't guarantee 2 continuous vmalloc pages
> are also on 2 continuous physical pages. So when 1st kernel exports
> the starting address and size of crash_notes through sysfs like below:
> 
> /sys/devices/system/cpu/cpux/crash_notes
> /sys/devices/system/cpu/cpux/crash_notes_size
> 
> kdump kernel use them to get the content of crash_notes. However the
> 2nd part may not be in the next neighbouring physical page as we
> expected if crash_notes are allocated accross 2 vmalloc pages. That's
> why nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in
> update_note_header_size_elf64() and cause note header merging failure
> or some warnings.
> 
> In this patch change to call __alloc_percpu() to passed in the align
> value by rounding crash_notes_size up to the nearest power of two.
> This make sure the crash_notes is allocated inside one physical page
> since sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE.
> Meanwhile add a WARN_ON in case it grows to be bigger than PAGE_SIZE
> in the future.
> 
> ...
>
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>  static int __init crash_notes_memory_init(void)
>  {
>  	/* Allocate memory for saving cpu registers. */
> -	crash_notes = alloc_percpu(note_buf_t);
> +	size_t size, align;
> +	int order;
> +
> +	size = sizeof(note_buf_t);
> +	order = get_count_order(size);
> +	align = min_t(size_t, (1<<order), PAGE_SIZE);
> +
> +	WARN_ON(size > PAGE_SIZE);
> +
> +	crash_notes = __alloc_percpu(size, align);

A code comment would be helpful - the reason for this code's existence
is otherwise utterly unobvious.

I think it can be done this way:

	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);


I never noticed get_count_order() before.  afaict it does the same as
order_base_2(), except get_count_order() generates better code and has
a ridiculous name.

And I think the WARN_ON can be replaced with a
BUILD_BUG_ON(sizeof>PAGE_SIZE)?  That would avoid adding runtime
overhead.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch v2] align crash_notes allocation to make it be inside one physical page
  2015-08-03 22:04 ` Andrew Morton
@ 2015-08-03 23:01   ` Baoquan He
  2015-08-11  6:33   ` Baoquan He
  1 sibling, 0 replies; 6+ messages in thread
From: Baoquan He @ 2015-08-03 23:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ebiederm, vgoyal, dyoung, mhuang, lisa.mitchell, tatsu,
	seiji.aguchi.tr, linux-kernel, kexec

Hi Andrew,

Thanks a lot for your reviewing and suggestiong.

On 08/03/15 at 03:04pm, Andrew Morton wrote:
> On Mon,  3 Aug 2015 20:50:43 +0800 Baoquan He <bhe@redhat.com> wrote:
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> >  static int __init crash_notes_memory_init(void)
> >  {
> >  	/* Allocate memory for saving cpu registers. */
> > -	crash_notes = alloc_percpu(note_buf_t);
> > +	size_t size, align;
> > +	int order;
> > +
> > +	size = sizeof(note_buf_t);
> > +	order = get_count_order(size);
> > +	align = min_t(size_t, (1<<order), PAGE_SIZE);
> > +
> > +	WARN_ON(size > PAGE_SIZE);
> > +
> > +	crash_notes = __alloc_percpu(size, align);
> 
> A code comment would be helpful - the reason for this code's existence
> is otherwise utterly unobvious.

Will add in new post.

> 
> I think it can be done this way:
> 
> 	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
> 
> 
> I never noticed get_count_order() before.  afaict it does the same as
> order_base_2(), except get_count_order() generates better code and has
> a ridiculous name.

OK, will change the code as you suggested.

> 
> And I think the WARN_ON can be replaced with a
> BUILD_BUG_ON(sizeof>PAGE_SIZE)?  That would avoid adding runtime
> overhead.

I am not sure about this. BUILD_BUG_ON will break kernel compiling.
Before we got the root cause several work around fix were introduced to
skip this kind of crash_note.

  c4082f3 vmcore: continue vmcore initialization if PT_NOTE is found empty
  38dfac8 vmcore: prevent PT_NOTE p_memsz overflow during header update

That means if (sizeof(note_buf_t)>PAGE_SIZE) really happened, normal
kernel works well, kdump kernel can work but we will lose those
crash_notes. And if on one certain ARCH sizeof(note_buf_t) is bigger
than PAGE_SIZE, the design here must be changed to avoid using percpu
variable or adjust their note_buf_t. That may take a not short time to
discuss and review. Comparing with this it may be better to tolerate the
dumping vmcore with uncomplete crash_notes for a while until new design
is taken.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch v2] align crash_notes allocation to make it be inside one physical page
  2015-08-03 22:04 ` Andrew Morton
  2015-08-03 23:01   ` Baoquan He
@ 2015-08-11  6:33   ` Baoquan He
  2015-08-11  8:01     ` Minfei Huang
  1 sibling, 1 reply; 6+ messages in thread
From: Baoquan He @ 2015-08-11  6:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ebiederm, vgoyal, dyoung, mhuang, lisa.mitchell, tatsu,
	seiji.aguchi.tr, linux-kernel, kexec

Hi Andrew,

On 08/03/15 at 03:04pm, Andrew Morton wrote:
> On Mon,  3 Aug 2015 20:50:43 +0800 Baoquan He <bhe@redhat.com> wrote:
> And I think the WARN_ON can be replaced with a
> BUILD_BUG_ON(sizeof>PAGE_SIZE)?  That would avoid adding runtime
> overhead.

Rethink about this, you are right. Using BUILD_BUG_ON is better.
Anyone who found this compiling break should check if his/her code
changes increase the crash_notes size. If possible that increase need be
avoidable. Otherwise he should report this to upstream why it's
unavoidable to increase crash_notes size, then let's consider the
redesign the crash_notes data structure. 

So I will use BUILD_BUG_ON and repost.

Thanks
Baoquan
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch v2] align crash_notes allocation to make it be inside one physical page
  2015-08-11  6:33   ` Baoquan He
@ 2015-08-11  8:01     ` Minfei Huang
  2015-08-11  8:42       ` Baoquan He
  0 siblings, 1 reply; 6+ messages in thread
From: Minfei Huang @ 2015-08-11  8:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: Andrew Morton, ebiederm, vgoyal, dyoung, lisa.mitchell, tatsu,
	seiji.aguchi.tr, linux-kernel, kexec

On 08/11/15 at 02:33pm, Baoquan He wrote:
> Hi Andrew,
> 
> On 08/03/15 at 03:04pm, Andrew Morton wrote:
> > On Mon,  3 Aug 2015 20:50:43 +0800 Baoquan He <bhe@redhat.com> wrote:
> > And I think the WARN_ON can be replaced with a
> > BUILD_BUG_ON(sizeof>PAGE_SIZE)?  That would avoid adding runtime
> > overhead.
> 
> Rethink about this, you are right. Using BUILD_BUG_ON is better.
> Anyone who found this compiling break should check if his/her code
> changes increase the crash_notes size. If possible that increase need be
> avoidable. Otherwise he should report this to upstream why it's
> unavoidable to increase crash_notes size, then let's consider the
> redesign the crash_notes data structure. 
> 
> So I will use BUILD_BUG_ON and repost.

Baoquan.

If the size of notes never be exceeded to PAGE_SIZE, I think we can
revert below patch, since the situation which describes in patch does
not happen.

    commit 38dfac843cb6d7be1874888839817404a15a6b3c
    Author: Greg Pearson <greg.pearson@hp.com>
    Date:   Mon Feb 10 14:25:36 2014 -0800
    
        vmcore: prevent PT_NOTE p_memsz overflow during header update

What do you think about this?

Thanks
Minfei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch v2] align crash_notes allocation to make it be inside one physical page
  2015-08-11  8:01     ` Minfei Huang
@ 2015-08-11  8:42       ` Baoquan He
  0 siblings, 0 replies; 6+ messages in thread
From: Baoquan He @ 2015-08-11  8:42 UTC (permalink / raw)
  To: Minfei Huang
  Cc: Andrew Morton, ebiederm, vgoyal, dyoung, lisa.mitchell, tatsu,
	seiji.aguchi.tr, linux-kernel, kexec

Hi,

On 08/11/15 at 04:01pm, Minfei Huang wrote:
> Baoquan.
> 
> If the size of notes never be exceeded to PAGE_SIZE, I think we can
> revert below patch, since the situation which describes in patch does
> not happen.
> 
>     commit 38dfac843cb6d7be1874888839817404a15a6b3c
>     Author: Greg Pearson <greg.pearson@hp.com>
>     Date:   Mon Feb 10 14:25:36 2014 -0800
>     
>         vmcore: prevent PT_NOTE p_memsz overflow during header update
> 
> What do you think about this?

Yeah, I am fine with this reverting. If people all agree with this I can
post patch to revert this. In fact I am eagerer to revert below commit
since it's  a littble bit confusing after crash_notes crossing 2
pages bug is fixed. Below commit is a work around fix , but it's too much. 
I am willing to hear people's idea.

commit 34b47764297130b21aaeb4cc6119bb811814b8e3
Author: WANG Chao <chaowang@redhat.com>
Date:   Tue Feb 17 13:46:01 2015 -0800

    vmcore: fix PT_NOTE n_namesz, n_descsz overflow issue

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-11  8:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-03 12:50 [Patch v2] align crash_notes allocation to make it be inside one physical page Baoquan He
2015-08-03 22:04 ` Andrew Morton
2015-08-03 23:01   ` Baoquan He
2015-08-11  6:33   ` Baoquan He
2015-08-11  8:01     ` Minfei Huang
2015-08-11  8:42       ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).