linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kdump: force page alignment for per-CPU crash notes.
@ 2012-02-29 17:21 Eugene Surovegin
  2012-03-01  1:18 ` Simon Horman
  2012-03-01 15:09 ` Vivek Goyal
  0 siblings, 2 replies; 11+ messages in thread
From: Eugene Surovegin @ 2012-02-29 17:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: Eugene Surovegin, Eric Biederman, Vivek Goyal, kexec-list

Per-CPU allocations are not guaranteed to be physically contiguous.
However, kdump kernel and user-space code assumes that per-CPU
memory, used for saving CPU registers on crash, is.
This can cause corrupted /proc/vmcore in some cases - the main
symptom being huge ELF note section.

Force page alignment for note_buf_t to ensure that this assumption holds.

Signed-off-by: Eugene Surovegin <surovegin@google.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: kexec-list <kexec@lists.infradead.org>
---
 kernel/kexec.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 7b08867..e641b5c 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
 
 static int __init crash_notes_memory_init(void)
 {
-	/* Allocate memory for saving cpu registers. */
-	crash_notes = alloc_percpu(note_buf_t);
+	/* Allocate memory for saving cpu registers.
+	 * Force page alignment to avoid crossing physical page boundary -
+	 * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
+	 * chunks are physically contiguous.
+	 */
+	crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
+							    PAGE_SIZE);
 	if (!crash_notes) {
 		printk("Kexec: Memory allocation for saving cpu register"
 		" states failed\n");
-- 
1.7.9.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-02-29 17:21 [PATCH] kdump: force page alignment for per-CPU crash notes Eugene Surovegin
@ 2012-03-01  1:18 ` Simon Horman
       [not found]   ` <CANrfoUXd4L2bjBO+L8ZBCkc1dfnZahn39sB25zsrUMqGA5zY5A@mail.gmail.com>
  2012-03-01 15:09 ` Vivek Goyal
  1 sibling, 1 reply; 11+ messages in thread
From: Simon Horman @ 2012-03-01  1:18 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linux-kernel, kexec-list, Eric Biederman, Vivek Goyal

On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> Per-CPU allocations are not guaranteed to be physically contiguous.
> However, kdump kernel and user-space code assumes that per-CPU
> memory, used for saving CPU registers on crash, is.
> This can cause corrupted /proc/vmcore in some cases - the main
> symptom being huge ELF note section.
> 
> Force page alignment for note_buf_t to ensure that this assumption holds.

Ouch. I'm surprised there is an allocation on crash, perhaps
it could at least be done earlier? And am I right in thinking
that this change increases the likely hood that the allocation
could fail?

> 
> Signed-off-by: Eugene Surovegin <surovegin@google.com>
> CC: Eric Biederman <ebiederm@xmission.com>
> CC: Vivek Goyal <vgoyal@redhat.com>
> CC: kexec-list <kexec@lists.infradead.org>
> ---
>  kernel/kexec.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 7b08867..e641b5c 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>  
>  static int __init crash_notes_memory_init(void)
>  {
> -	/* Allocate memory for saving cpu registers. */
> -	crash_notes = alloc_percpu(note_buf_t);
> +	/* Allocate memory for saving cpu registers.
> +	 * Force page alignment to avoid crossing physical page boundary -
> +	 * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
> +	 * chunks are physically contiguous.
> +	 */
> +	crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
> +							    PAGE_SIZE);
>  	if (!crash_notes) {
>  		printk("Kexec: Memory allocation for saving cpu register"
>  		" states failed\n");
> -- 
> 1.7.9.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
       [not found]   ` <CANrfoUXd4L2bjBO+L8ZBCkc1dfnZahn39sB25zsrUMqGA5zY5A@mail.gmail.com>
@ 2012-03-01  1:32     ` Simon Horman
  2012-03-01  1:39       ` Eugene Surovegin
  0 siblings, 1 reply; 11+ messages in thread
From: Simon Horman @ 2012-03-01  1:32 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linux-kernel, kexec-list, Eric Biederman, Vivek Goyal

On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <horms@verge.net.au> wrote:
> 
> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> > > However, kdump kernel and user-space code assumes that per-CPU
> > > memory, used for saving CPU registers on crash, is.
> > > This can cause corrupted /proc/vmcore in some cases - the main
> > > symptom being huge ELF note section.
> > >
> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >
> > Ouch. I'm surprised there is an allocation on crash, perhaps
> > it could at least be done earlier? And am I right in thinking
> > that this change increases the likely hood that the allocation
> > could fail?
> >
> 
> I'm not following. This allocation is done on start-up, not on crash.
> If you cannot allocate this much memory on system boot, I'm not sure what
> else you can do on this system....

Sorry, my eyes deceived me. You are correct and I agree.

Is it the case that note_buf_t is never larger than PAGE_SIZE?
If so I your patch looks good to me.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01  1:32     ` Simon Horman
@ 2012-03-01  1:39       ` Eugene Surovegin
  2012-03-01  1:51         ` HATAYAMA Daisuke
  2012-03-01  2:53         ` Simon Horman
  0 siblings, 2 replies; 11+ messages in thread
From: Eugene Surovegin @ 2012-03-01  1:39 UTC (permalink / raw)
  To: Simon Horman; +Cc: linux-kernel, kexec-list, Eric Biederman, Vivek Goyal

On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <horms@verge.net.au> wrote:
> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <horms@verge.net.au> wrote:
>>
>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>> > > However, kdump kernel and user-space code assumes that per-CPU
>> > > memory, used for saving CPU registers on crash, is.
>> > > This can cause corrupted /proc/vmcore in some cases - the main
>> > > symptom being huge ELF note section.
>> > >
>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>> >
>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>> > it could at least be done earlier? And am I right in thinking
>> > that this change increases the likely hood that the allocation
>> > could fail?
>> >
>>
>> I'm not following. This allocation is done on start-up, not on crash.
>> If you cannot allocate this much memory on system boot, I'm not sure what
>> else you can do on this system....
>
> Sorry, my eyes deceived me. You are correct and I agree.
>
> Is it the case that note_buf_t is never larger than PAGE_SIZE?
> If so I your patch looks good to me.

Currently, maximum note size is hardcoded in kexec-tools to 1024
(MAX_NOTE_BYTES).
Usually it's way less. IIRC on x86_64 it's 336 bytes.

--
Eugene

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01  1:39       ` Eugene Surovegin
@ 2012-03-01  1:51         ` HATAYAMA Daisuke
  2012-03-01  1:56           ` Eugene Surovegin
  2012-03-01  2:50           ` Simon Horman
  2012-03-01  2:53         ` Simon Horman
  1 sibling, 2 replies; 11+ messages in thread
From: HATAYAMA Daisuke @ 2012-03-01  1:51 UTC (permalink / raw)
  To: surovegin; +Cc: horms, kexec, linux-kernel, vgoyal, ebiederm

From: Eugene Surovegin <surovegin@google.com>
Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
Date: Wed, 29 Feb 2012 17:39:55 -0800

> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <horms@verge.net.au> wrote:
>> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <horms@verge.net.au> wrote:
>>>
>>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>>> > > However, kdump kernel and user-space code assumes that per-CPU
>>> > > memory, used for saving CPU registers on crash, is.
>>> > > This can cause corrupted /proc/vmcore in some cases - the main
>>> > > symptom being huge ELF note section.
>>> > >
>>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>>> >
>>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>>> > it could at least be done earlier? And am I right in thinking
>>> > that this change increases the likely hood that the allocation
>>> > could fail?
>>> >
>>>
>>> I'm not following. This allocation is done on start-up, not on crash.
>>> If you cannot allocate this much memory on system boot, I'm not sure what
>>> else you can do on this system....
>>
>> Sorry, my eyes deceived me. You are correct and I agree.
>>
>> Is it the case that note_buf_t is never larger than PAGE_SIZE?
>> If so I your patch looks good to me.
> 
> Currently, maximum note size is hardcoded in kexec-tools to 1024
> (MAX_NOTE_BYTES).
> Usually it's way less. IIRC on x86_64 it's 336 bytes.
> 

This is elf_prstatus and I guess it's mostly equal to registers.

crash> p sizeof(struct elf_prstatus)
$3 = 336
crash> ptype struct elf_prstatus
type = struct elf_prstatus {
    struct elf_siginfo pr_info;
    short int pr_cursig;
    long unsigned int pr_sigpend;
    long unsigned int pr_sighold;
    pid_t pr_pid;
    pid_t pr_ppid;
    pid_t pr_pgrp;
    pid_t pr_sid;
    struct timeval pr_utime;
    struct timeval pr_stime;
    struct timeval pr_cutime;
    struct timeval pr_cstime;
    elf_gregset_t pr_reg; <-- this
    int pr_fpvalid;
}

What kinds of architecture does have so many registers? It's just my
interest. Or possibly other kinds of notes is written here?

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01  1:51         ` HATAYAMA Daisuke
@ 2012-03-01  1:56           ` Eugene Surovegin
  2012-03-01  2:50           ` Simon Horman
  1 sibling, 0 replies; 11+ messages in thread
From: Eugene Surovegin @ 2012-03-01  1:56 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: horms, kexec, linux-kernel, vgoyal, ebiederm

On Wed, Feb 29, 2012 at 5:51 PM, HATAYAMA Daisuke
<d.hatayama@jp.fujitsu.com> wrote:
> From: Eugene Surovegin <surovegin@google.com>
> Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
> Date: Wed, 29 Feb 2012 17:39:55 -0800
>
>> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <horms@verge.net.au> wrote:
>>> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
>>>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <horms@verge.net.au> wrote:
>>>>
>>>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>>>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
>>>> > > However, kdump kernel and user-space code assumes that per-CPU
>>>> > > memory, used for saving CPU registers on crash, is.
>>>> > > This can cause corrupted /proc/vmcore in some cases - the main
>>>> > > symptom being huge ELF note section.
>>>> > >
>>>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
>>>> >
>>>> > Ouch. I'm surprised there is an allocation on crash, perhaps
>>>> > it could at least be done earlier? And am I right in thinking
>>>> > that this change increases the likely hood that the allocation
>>>> > could fail?
>>>> >
>>>>
>>>> I'm not following. This allocation is done on start-up, not on crash.
>>>> If you cannot allocate this much memory on system boot, I'm not sure what
>>>> else you can do on this system....
>>>
>>> Sorry, my eyes deceived me. You are correct and I agree.
>>>
>>> Is it the case that note_buf_t is never larger than PAGE_SIZE?
>>> If so I your patch looks good to me.
>>
>> Currently, maximum note size is hardcoded in kexec-tools to 1024
>> (MAX_NOTE_BYTES).
>> Usually it's way less. IIRC on x86_64 it's 336 bytes.
>>
>
> This is elf_prstatus and I guess it's mostly equal to registers.
>
> crash> p sizeof(struct elf_prstatus)
> $3 = 336
> crash> ptype struct elf_prstatus
> type = struct elf_prstatus {
>    struct elf_siginfo pr_info;
>    short int pr_cursig;
>    long unsigned int pr_sigpend;
>    long unsigned int pr_sighold;
>    pid_t pr_pid;
>    pid_t pr_ppid;
>    pid_t pr_pgrp;
>    pid_t pr_sid;
>    struct timeval pr_utime;
>    struct timeval pr_stime;
>    struct timeval pr_cutime;
>    struct timeval pr_cstime;
>    elf_gregset_t pr_reg; <-- this
>    int pr_fpvalid;
> }
>
> What kinds of architecture does have so many registers? It's just my
> interest. Or possibly other kinds of notes is written here?

I'm not sure about other archs, but we don't write there anything
except for 'elf_prstatus' and sentinel "final" note.

--
Eugene

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01  1:51         ` HATAYAMA Daisuke
  2012-03-01  1:56           ` Eugene Surovegin
@ 2012-03-01  2:50           ` Simon Horman
  1 sibling, 0 replies; 11+ messages in thread
From: Simon Horman @ 2012-03-01  2:50 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: surovegin, kexec, linux-kernel, vgoyal, ebiederm

On Thu, Mar 01, 2012 at 10:51:26AM +0900, HATAYAMA Daisuke wrote:
> From: Eugene Surovegin <surovegin@google.com>
> Subject: Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
> Date: Wed, 29 Feb 2012 17:39:55 -0800
> 
> > On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <horms@verge.net.au> wrote:
> >> On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> >>> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <horms@verge.net.au> wrote:
> >>>
> >>> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >>> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> >>> > > However, kdump kernel and user-space code assumes that per-CPU
> >>> > > memory, used for saving CPU registers on crash, is.
> >>> > > This can cause corrupted /proc/vmcore in some cases - the main
> >>> > > symptom being huge ELF note section.
> >>> > >
> >>> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >>> >
> >>> > Ouch. I'm surprised there is an allocation on crash, perhaps
> >>> > it could at least be done earlier? And am I right in thinking
> >>> > that this change increases the likely hood that the allocation
> >>> > could fail?
> >>> >
> >>>
> >>> I'm not following. This allocation is done on start-up, not on crash.
> >>> If you cannot allocate this much memory on system boot, I'm not sure what
> >>> else you can do on this system....
> >>
> >> Sorry, my eyes deceived me. You are correct and I agree.
> >>
> >> Is it the case that note_buf_t is never larger than PAGE_SIZE?
> >> If so I your patch looks good to me.
> > 
> > Currently, maximum note size is hardcoded in kexec-tools to 1024
> > (MAX_NOTE_BYTES).
> > Usually it's way less. IIRC on x86_64 it's 336 bytes.

I presume that MAX_NOTE_BYTES was chosen to be large to
minimise the chance that it would ever need to be changed.

> This is elf_prstatus and I guess it's mostly equal to registers.
> 
> crash> p sizeof(struct elf_prstatus)
> $3 = 336
> crash> ptype struct elf_prstatus
> type = struct elf_prstatus {
>     struct elf_siginfo pr_info;
>     short int pr_cursig;
>     long unsigned int pr_sigpend;
>     long unsigned int pr_sighold;
>     pid_t pr_pid;
>     pid_t pr_ppid;
>     pid_t pr_pgrp;
>     pid_t pr_sid;
>     struct timeval pr_utime;
>     struct timeval pr_stime;
>     struct timeval pr_cutime;
>     struct timeval pr_cstime;
>     elf_gregset_t pr_reg; <-- this
>     int pr_fpvalid;
> }
> 
> What kinds of architecture does have so many registers? It's just my
> interest. Or possibly other kinds of notes is written here?

The winner seems to be ia64 with 128.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01  1:39       ` Eugene Surovegin
  2012-03-01  1:51         ` HATAYAMA Daisuke
@ 2012-03-01  2:53         ` Simon Horman
  1 sibling, 0 replies; 11+ messages in thread
From: Simon Horman @ 2012-03-01  2:53 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linux-kernel, kexec-list, Eric Biederman, Vivek Goyal

On Wed, Feb 29, 2012 at 05:39:55PM -0800, Eugene Surovegin wrote:
> On Wed, Feb 29, 2012 at 5:32 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Wed, Feb 29, 2012 at 05:23:10PM -0800, Eugene Surovegin wrote:
> >> On Wed, Feb 29, 2012 at 5:18 PM, Simon Horman <horms@verge.net.au> wrote:
> >>
> >> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >> > > Per-CPU allocations are not guaranteed to be physically contiguous.
> >> > > However, kdump kernel and user-space code assumes that per-CPU
> >> > > memory, used for saving CPU registers on crash, is.
> >> > > This can cause corrupted /proc/vmcore in some cases - the main
> >> > > symptom being huge ELF note section.
> >> > >
> >> > > Force page alignment for note_buf_t to ensure that this assumption holds.
> >> >
> >> > Ouch. I'm surprised there is an allocation on crash, perhaps
> >> > it could at least be done earlier? And am I right in thinking
> >> > that this change increases the likely hood that the allocation
> >> > could fail?
> >> >
> >>
> >> I'm not following. This allocation is done on start-up, not on crash.
> >> If you cannot allocate this much memory on system boot, I'm not sure what
> >> else you can do on this system....
> >
> > Sorry, my eyes deceived me. You are correct and I agree.
> >
> > Is it the case that note_buf_t is never larger than PAGE_SIZE?
> > If so I your patch looks good to me.
> 
> Currently, maximum note size is hardcoded in kexec-tools to 1024
> (MAX_NOTE_BYTES).
> Usually it's way less. IIRC on x86_64 it's 336 bytes.

Ok, understood.

Reviewed-by: Simon Horman <horms@verge.net.au>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-02-29 17:21 [PATCH] kdump: force page alignment for per-CPU crash notes Eugene Surovegin
  2012-03-01  1:18 ` Simon Horman
@ 2012-03-01 15:09 ` Vivek Goyal
  2012-03-01 17:04   ` Eugene Surovegin
  1 sibling, 1 reply; 11+ messages in thread
From: Vivek Goyal @ 2012-03-01 15:09 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linux-kernel, Eric Biederman, kexec-list

On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> Per-CPU allocations are not guaranteed to be physically contiguous.
> However, kdump kernel and user-space code assumes that per-CPU
> memory, used for saving CPU registers on crash, is.
> This can cause corrupted /proc/vmcore in some cases - the main
> symptom being huge ELF note section.
> 
> Force page alignment for note_buf_t to ensure that this assumption holds.
> 

Hi Eugene,

Where do we make assumption that crash_notes address is page aligned?

In first kernel we save notes using virtual addresses as returned by
per_cpu_ptr(). So first kernel should be fine (crash_save_cpu()).

In second kernel, fs/proc/vmcore.c code does not seem to be assuming 
that notes are stored as page aligned address.

merge_note_headers_elf64()
  read_from_oldmem()
       offset = (unsigned long)(*ppos % PAGE_SIZE);
        pfn = (unsigned long)(*ppos / PAGE_SIZE);

We see to calculate pfn and offset inside the page. Map the pfn and then
access offset.

So as long as whole of the note section is stored on a single physical
page it is not a problem.

Are you referring to the fact that note could be stored on two different
physical pages and then kexec-tools does not know about the physical
address of second page, hence second kernel does not know about it
and we never read that data. That sounds like a problem.

So it make sense to force the page size alignment and if notes ever
grow beyond 1 page, we need to come up with more complex ways of
communicating more than 1 physical address to second kernel.

Thanks
Vivek


> Signed-off-by: Eugene Surovegin <surovegin@google.com>
> CC: Eric Biederman <ebiederm@xmission.com>
> CC: Vivek Goyal <vgoyal@redhat.com>
> CC: kexec-list <kexec@lists.infradead.org>
> ---
>  kernel/kexec.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 7b08867..e641b5c 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1232,8 +1232,13 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>  
>  static int __init crash_notes_memory_init(void)
>  {
> -	/* Allocate memory for saving cpu registers. */
> -	crash_notes = alloc_percpu(note_buf_t);
> +	/* Allocate memory for saving cpu registers.
> +	 * Force page alignment to avoid crossing physical page boundary -
> +	 * kexec-tools and kernel /proc/vmcore handler assume these per-CPU
> +	 * chunks are physically contiguous.
> +	 */
> +	crash_notes = (note_buf_t __percpu *)__alloc_percpu(sizeof(note_buf_t),
> +							    PAGE_SIZE);
>  	if (!crash_notes) {
>  		printk("Kexec: Memory allocation for saving cpu register"
>  		" states failed\n");
> -- 
> 1.7.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01 15:09 ` Vivek Goyal
@ 2012-03-01 17:04   ` Eugene Surovegin
  2012-03-01 18:31     ` Vivek Goyal
  0 siblings, 1 reply; 11+ messages in thread
From: Eugene Surovegin @ 2012-03-01 17:04 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel, Eric Biederman, kexec-list

On Thu, Mar 1, 2012 at 7:09 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
>> Per-CPU allocations are not guaranteed to be physically contiguous.
>> However, kdump kernel and user-space code assumes that per-CPU
>> memory, used for saving CPU registers on crash, is.
>> This can cause corrupted /proc/vmcore in some cases - the main
>> symptom being huge ELF note section.
>>
>> Force page alignment for note_buf_t to ensure that this assumption holds.
>>
>
> Hi Eugene,
>
> Where do we make assumption that crash_notes address is page aligned?

I never said that. I said that assumption was that note chunks were
physically contiguous.

[snip]

> Are you referring to the fact that note could be stored on two different
> physical pages and then kexec-tools does not know about the physical
> address of second page, hence second kernel does not know about it
> and we never read that data. That sounds like a problem.

Yes. This is exactly what happens in some cases.

> So it make sense to force the page size alignment and if notes ever
> grow beyond 1 page, we need to come up with more complex ways of
> communicating more than 1 physical address to second kernel.

Yes. Thankfully notes so far are smaller than a page.
And if they do grow larger, than kexec-tools have to be modified
anyways - currently maximum note size (1K) is hardcoded there.
If this happens, I think instead of coming up with a way to
communicate phys to virt mapping, it'd be easier just to change th way
notes are allocated in the kernel so they are phys contiguous.

--
Eugene

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kdump: force page alignment for per-CPU crash notes.
  2012-03-01 17:04   ` Eugene Surovegin
@ 2012-03-01 18:31     ` Vivek Goyal
  0 siblings, 0 replies; 11+ messages in thread
From: Vivek Goyal @ 2012-03-01 18:31 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linux-kernel, Eric Biederman, kexec-list

On Thu, Mar 01, 2012 at 09:04:32AM -0800, Eugene Surovegin wrote:
> On Thu, Mar 1, 2012 at 7:09 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Feb 29, 2012 at 09:21:23AM -0800, Eugene Surovegin wrote:
> >> Per-CPU allocations are not guaranteed to be physically contiguous.
> >> However, kdump kernel and user-space code assumes that per-CPU
> >> memory, used for saving CPU registers on crash, is.
> >> This can cause corrupted /proc/vmcore in some cases - the main
> >> symptom being huge ELF note section.
> >>
> >> Force page alignment for note_buf_t to ensure that this assumption holds.
> >>
> >
> > Hi Eugene,
> >
> > Where do we make assumption that crash_notes address is page aligned?
> 
> I never said that. I said that assumption was that note chunks were
> physically contiguous.
> 
> [snip]
> 
> > Are you referring to the fact that note could be stored on two different
> > physical pages and then kexec-tools does not know about the physical
> > address of second page, hence second kernel does not know about it
> > and we never read that data. That sounds like a problem.
> 
> Yes. This is exactly what happens in some cases.

Ok. Thanks for clarifying. Patch looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-03-01 18:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-29 17:21 [PATCH] kdump: force page alignment for per-CPU crash notes Eugene Surovegin
2012-03-01  1:18 ` Simon Horman
     [not found]   ` <CANrfoUXd4L2bjBO+L8ZBCkc1dfnZahn39sB25zsrUMqGA5zY5A@mail.gmail.com>
2012-03-01  1:32     ` Simon Horman
2012-03-01  1:39       ` Eugene Surovegin
2012-03-01  1:51         ` HATAYAMA Daisuke
2012-03-01  1:56           ` Eugene Surovegin
2012-03-01  2:50           ` Simon Horman
2012-03-01  2:53         ` Simon Horman
2012-03-01 15:09 ` Vivek Goyal
2012-03-01 17:04   ` Eugene Surovegin
2012-03-01 18:31     ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).