All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] KEXEC: allocate crash note buffers at boot time
  2011-11-29 18:56 [RFC] KEXEC: allocate crash note buffers at boot time Andrew Cooper
@ 2011-11-29 11:19 ` Keir Fraser
  2011-11-30 13:14   ` Andrew Cooper
  2011-11-30  9:20 ` [RFC] KEXEC: allocate crash note buffers at boot time Jan Beulich
  1 sibling, 1 reply; 23+ messages in thread
From: Keir Fraser @ 2011-11-29 11:19 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel

On 29/11/2011 18:56, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:

> Hello,
> 
> As I have little to no knowledge of this stage of the boot process, is
> this a sensible way to be setting up the per_cpu areas?  I have a
> sneaking suspicion that it will fall over if a CPU is onlined after
> boot, and may also fall over if a CPU is offlined and reonlined later.
> There appears to be no infrastructure currently in place for this type
> of initialization, which is quite possibly why the code exists in its
> current form.

No it's bad. For starters you should use for_each_online_cpu, not do an
open-coded for-loop. Secondly you should register a cpu hotplug notifier
(register_cpu_notifier()) to pick up and handle future
CPU_UP_PREPARE/CPU_UP_CANCELED/CPU_DEAD events. This can be hung off an
__initcall. See common/stop_machine.c for example, or common/timer.c, which
doesn't even require a for_each_online_cpu loop because its init code gets
run before we bring up secondary CPUs.

 -- Keir 

> Thanks,

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC] KEXEC: allocate crash note buffers at boot time
@ 2011-11-29 18:56 Andrew Cooper
  2011-11-29 11:19 ` Keir Fraser
  2011-11-30  9:20 ` [RFC] KEXEC: allocate crash note buffers at boot time Jan Beulich
  0 siblings, 2 replies; 23+ messages in thread
From: Andrew Cooper @ 2011-11-29 18:56 UTC (permalink / raw)
  To: Keir Fraser, Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

Hello,

As I have little to no knowledge of this stage of the boot process, is
this a sensible way to be setting up the per_cpu areas?  I have a
sneaking suspicion that it will fall over if a CPU is onlined after
boot, and may also fall over if a CPU is offlined and reonlined later. 
There appears to be no infrastructure currently in place for this type
of initialization, which is quite possibly why the code exists in its
current form.

Thanks,

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 4195 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 0a0c02a61676 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -154,6 +154,50 @@ void __init set_kexec_crash_area_size(u6
     }
 }
 
+/* Allocate Xen Crash Note buffers. */
+static __init int kexec_notes_init(void)
+{
+    int c;
+    Elf_Note * note;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    int nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_into note. */
+    int cpu0_nr_bytes = nr_bytes +
+        sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    for ( c = 0; c < num_online_cpus(); ++c )
+    {
+        note = xmalloc_bytes( c ? nr_bytes : cpu0_nr_bytes );
+        if ( ! note )
+            return -ENOMEM;
+
+        per_cpu(crash_notes, c) = note;
+
+        /* Setup CORE note. */
+        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+        note = ELFNOTE_NEXT(note);
+
+        /* Setup Xen CORE note. */
+        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+                   sizeof(crash_xen_core_t));
+
+        if ( 0 == c )
+        {
+            /* Set up Xen Crash Info note. */
+            xen_crash_note = note = ELFNOTE_NEXT(note);
+            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                       sizeof(crash_xen_info_t));
+        }
+    }
+
+    return 0;
+}
+__initcall(kexec_notes_init);
+
 static void one_cpu_only(void)
 {
     /* Only allow the first cpu to continue - force other cpus to spin */
@@ -306,30 +350,6 @@ static int kexec_get_cpu(xen_kexec_range
     if ( nr == 0 )
         nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
 
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
     range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
     range->size = nr_bytes;
     return 0;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time
  2011-11-29 18:56 [RFC] KEXEC: allocate crash note buffers at boot time Andrew Cooper
  2011-11-29 11:19 ` Keir Fraser
@ 2011-11-30  9:20 ` Jan Beulich
  2011-11-30 14:01   ` Andrew Cooper
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2011-11-30  9:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Keir Fraser

>>> On 29.11.11 at 19:56, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> As I have little to no knowledge of this stage of the boot process, is
> this a sensible way to be setting up the per_cpu areas?  I have a
> sneaking suspicion that it will fall over if a CPU is onlined after
> boot, and may also fall over if a CPU is offlined and reonlined later. 
> There appears to be no infrastructure currently in place for this type
> of initialization, which is quite possibly why the code exists in its
> current form.

I actually wonder how you came to those 4 statements you make in
the description - none of these seem to me like they are really an
issue (this would instead be plain bugs in Dom0). Did you actually look
at the existing Dom0 implementation(s)?

Further, while not being a huge waste of memory, it still is one in case
kexec gets never enabled, especially when considering a Dom0 kernel
that's being built without CONFIG_KEXEC (or an incapable on, like any
pv-ops kernel to date). So I also conceptually question that change.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time
  2011-11-29 11:19 ` Keir Fraser
@ 2011-11-30 13:14   ` Andrew Cooper
  2011-11-30 17:24     ` [RFC] KEXEC: allocate crash note buffers at boot time v2 Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-11-30 13:14 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Jan Beulich

On 29/11/11 11:19, Keir Fraser wrote:
> On 29/11/2011 18:56, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:
>
>> Hello,
>>  
>> As I have little to no knowledge of this stage of the boot process, is
>> this a sensible way to be setting up the per_cpu areas?  I have a
>> sneaking suspicion that it will fall over if a CPU is onlined after
>> boot, and may also fall over if a CPU is offlined and reonlined later.
>> There appears to be no infrastructure currently in place for this type
>> of initialization, which is quite possibly why the code exists in its
>> current form.
> No it's bad. For starters you should use for_each_online_cpu, not do an
> open-coded for-loop. Secondly you should register a cpu hotplug notifier
> (register_cpu_notifier()) to pick up and handle future
> CPU_UP_PREPARE/CPU_UP_CANCELED/CPU_DEAD events. This can be hung off an
> __initcall. See common/stop_machine.c for example, or common/timer.c, which
> doesn't even require a for_each_online_cpu loop because its init code gets
> run before we bring up secondary CPUs.
>
>  -- Keir 
>

Thanks - this is exactly what I was looking for.

Sadly, after thinking about cpu hotplug safety, it is not safe to store
the crash note pointer in the per cpu data.  Once you have reported an
area to dom0 via KEXEC_CMD_get_reserve, the areas reported cant possibly
move.

Therefore, I need to redesign a little bit.

Thanks,

~Andrew

>> Thanks,
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time
  2011-11-30  9:20 ` [RFC] KEXEC: allocate crash note buffers at boot time Jan Beulich
@ 2011-11-30 14:01   ` Andrew Cooper
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2011-11-30 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir (Xen.org)

On 30/11/11 09:20, Jan Beulich wrote:
>>>> On 29.11.11 at 19:56, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> As I have little to no knowledge of this stage of the boot process, is
>> this a sensible way to be setting up the per_cpu areas?  I have a
>> sneaking suspicion that it will fall over if a CPU is onlined after
>> boot, and may also fall over if a CPU is offlined and reonlined later. 
>> There appears to be no infrastructure currently in place for this type
>> of initialization, which is quite possibly why the code exists in its
>> current form.
> I actually wonder how you came to those 4 statements you make in
> the description - none of these seem to me like they are really an
> issue (this would instead be plain bugs in Dom0). Did you actually look
> at the existing Dom0 implementation(s)?
>
> Further, while not being a huge waste of memory, it still is one in case
> kexec gets never enabled, especially when considering a Dom0 kernel
> that's being built without CONFIG_KEXEC (or an incapable on, like any
> pv-ops kernel to date). So I also conceptually question that change.
>
> Jan

We (XenServer) have had many cases of the kexec path failing on customer
boxes under weird and seemingly inexplicable circumstances.  This is why
I am working on trying to bullet-proofing the entire path.

We have 1 ticket where the contents of a crash note is clearly bogus (a
PRSTATUS is not 2GB long).  We have a ticket where it appears that the
kdump kernel has failed to reassemble /proc/vmcore from elfcorehdr as a
few pcpus worth of crash notes are missing.  I seem to remember a ticket
from before my time with a crash while writing the crash notes in Xen
itself. We even have a ticket stating that you get different crash notes
depending on whether you crash using the Xen debug keys (crash notes
appear completely bogus) or /proc/sysrq-trigger in dom0 (seems to be fine).

All of these are uncertain on reproducibility (except the final one
which was shown to reproduce on Xen-3.x and not on Xen-4.x so was not
investigated further) and have a habit of being unreproducible on any of
our local hardware, which makes fixing the problems tricky.

So yes -  the 4 points I have made are certainly not regular or common
behavior, but given some of the tickets we have, I am fairly sure it is
not a bug-free path.  I have checked the 2.6.32 implementation of dom0's
side of this and agree that it looks ok.  However, it is my opinion that
relying on a certain hypercalling pattern from dom0 is a perilous route
for Xen, whether it is likely for that pattern to change in the future
or not.

Having said all of this, I agree with your second paragraph.  As already
noted in my other email in this thread, I need to change the
implementation of this, so I will key the initial allocation of memory
on whether crashkernel= has been passed.  This should be sufficient
indication as to whether the user minds having the space allocated or not.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC] KEXEC: allocate crash note buffers at boot time v2
  2011-11-30 13:14   ` Andrew Cooper
@ 2011-11-30 17:24     ` Andrew Cooper
  2011-12-01  9:08       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-11-30 17:24 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

Hello,

Presented is version 2 of this patch, which uses cpu hotplug
notifications to be rather safer when allocating buffers.

It occurs to me that there is a bit of an API problem with it comes to
combining a crashdump kernel with potential hotplug events.

If dom0 does not get notification of new/removed pcpus, and if it
doesn't re-evaluate /proc/iomem by recalling things like
KEXEC_CMD_get_range, then subsequent calls to /sbin/kexec -p will bundle
up stale information into the kdump magic bundle.

Even if dom0 does get a notification of pcpu hotplug events, and it
updates its iomem map, /sbin/kexec would still need to be called to
bundle the new information into the kdump magic bundle.  Possibly doable
off a udev CPU hotplug event.

Even if /sbin/kexec gets called after hotplug events, a crash before the
new kexec magic bundle has been loaded will still result in the old
bundle being used, with its stale information regarding the position and
number of crash notes.

Sadly, I don't see any easy or sensible solution to these problems. 
However, it is probably worth knowing as a potential limitation.

I have worked around this problem by never deallocating crash notes, so
even if information is stale as to the number of crash notes to be
expected, the stale physical addresses still point to allocated notes
buffers which have been partially initialized to have sensible note
headers in.  This unfortunately means that an offlined cpu which was
present at boot time will have a notes section with zero'd data.  It
also means that a cpu onlined after boot which crashes will not have its
register state presented to the kdump environment.

I would like to hope that both of these cases are unlikely, but again,
it is still a potential limitation.

The cpu crash notes themselves (PRSTATUS and XEN_ELFNOTE_CRASH_REGS)
don't contain a reference to which pcpu they represent.  This is
inferred by /sbin/kexec from the order in which they appear in dom0's
/proc/iomem, meaning that the kdump environments idea of which pcpu is
which might differ from Xen's. This depending on whether Xen allocates
the notes buffer in ascending order, whether dom0 sorts memory addresses
reported, and, as it currently gets it correct, whether either of these
behaviors change in the future.

This final issue is within my ability to fix and I will do so in the
near future, along with some other additions I already need to make to
the per cpu crash notes.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 8621 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v2

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.

 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.

 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,14 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
 
 bool_t kexecing = FALSE;
 
-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+static void * crash_notes[NR_CPUS];
 
 static Elf_Note *xen_crash_note;
 
@@ -165,7 +166,7 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note = crash_notes[cpu];
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;
 
@@ -245,25 +246,6 @@ static long kexec_reboot(void *_image)
     return 0;
 }
 
-static void do_crashdump_trigger(unsigned char key)
-{
-    printk("'%c' pressed -> triggering crashdump\n", key);
-    kexec_crash();
-    printk(" * no crash kernel loaded!\n");
-}
-
-static struct keyhandler crashdump_trigger_keyhandler = {
-    .u.fn = do_crashdump_trigger,
-    .desc = "trigger a crashdump"
-};
-
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +262,110 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }
 
+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const int cpu)
+{
+    Elf_Note * note;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu < 0 || cpu >= NR_CPUS );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu] )
+        return 0;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( 0 == cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+    if ( ! note )
+        /* Ideally, this would be -ENOMEM.  However, there are more problems
+         * assocated with trying to deal with -ENOMEM sensibly than just
+         * pretending that the crash note area doesn't exist. */
+        return 0;
+
+    crash_notes[cpu] = note;
+
+    /* Setup CORE note. */
+    setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+    note = ELFNOTE_NEXT(note);
+
+    /* Setup Xen CORE note. */
+    setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+               sizeof(crash_xen_core_t));
+
+    if ( 0 == cpu )
+    {
+        /* Set up Xen Crash Info note. */
+        xen_crash_note = note = ELFNOTE_NEXT(note);
+        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                   sizeof(crash_xen_info_t));
+    }
+
+    return 0;
+}
+
+static void do_crashdump_trigger(unsigned char key)
+{
+    printk("'%c' pressed -> triggering crashdump\n", key);
+    kexec_crash();
+    printk(" * no crash kernel loaded!\n");
+}
+
+static struct keyhandler crashdump_trigger_keyhandler = {
+    .u.fn = do_crashdump_trigger,
+    .desc = "trigger a crashdump"
+};
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(long)smp_processor_id();
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( 0 == kexec_crash_area.size )
+        return 0;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -296,7 +382,7 @@ static int kexec_get_cpu(xen_kexec_range
     int nr = range->nr;
     int nr_bytes = 0;
 
-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] )
         return -EINVAL;
 
     nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
@@ -306,31 +392,7 @@ static int kexec_get_cpu(xen_kexec_range
     if ( nr == 0 )
         nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
 
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
+    range->start = __pa((unsigned long)crash_notes[nr]);
     range->size = nr_bytes;
     return 0;
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 12:56               ` Jan Beulich
@ 2011-12-01  5:20                 ` Keir Fraser
  2011-12-01 14:00                   ` Andrew Cooper
  2011-12-01 13:59                 ` Andrew Cooper
  2011-12-01 15:02                 ` Andrew Cooper
  2 siblings, 1 reply; 23+ messages in thread
From: Keir Fraser @ 2011-12-01  5:20 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel

On 01/12/2011 12:56, "Jan Beulich" <JBeulich@suse.com> wrote:

>> +    register_keyhandler('C', &crashdump_trigger_keyhandler);
>> +
>> +    /* If no crash area, no need to allocate space for notes. */
>> +    if ( 0 == kexec_crash_area.size )
>> +        return 0;
> 
> Wouldn't it make sense to switch the order of these?

I also really hate this constant-first ordering.

 -- Keir

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v2
  2011-11-30 17:24     ` [RFC] KEXEC: allocate crash note buffers at boot time v2 Andrew Cooper
@ 2011-12-01  9:08       ` Jan Beulich
  2011-12-01  9:49         ` Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2011-12-01  9:08 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

>>> On 30.11.11 at 18:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>+static void * crash_notes[NR_CPUS];

If at all possible, can you please allocate this dynamically using
nr_cpu_ids for the size?

>+static int kexec_init_cpu_notes(const int cpu)

If the function's argument was made 'unsigned int' (as it already is in
its caller), then ...

>+    BUG_ON( cpu < 0 || cpu >= NR_CPUS );

... only the second check would be needed. Furthermore, it's
pointless to use signed types for CPU numbers (and it's
inefficient on various architectures), despite there being numerous
(bad) examples throughout the tree.

>+    if ( 0 == cpu )
>+        nr_bytes = nr_bytes +
>+            sizeof_note("Xen", sizeof(crash_xen_info_t));

Minor nit: Why not use += here (presumably allowing the statement to
fit on one line)?

>+    if ( ! note )
>+        /* Ideally, this would be -ENOMEM.  However, there are more problems
>+         * assocated with trying to deal with -ENOMEM sensibly than just
>+         * pretending that the crash note area doesn't exist. */
>+        return 0;

The only current caller ignores the return value, so why the bogus
return value?

>+    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] )
>...
>-    if ( per_cpu(crash_notes, nr) == NULL )
>-    {
>...
>-    }

I would suggest to retry allocation here. Even if this isn't suitable for
a 32-bit kdump kernel on large systems, there#s no reason to penalize
fully 64-bit environment.

And here it would also become meaningful that kexec_init_cpu_notes()
actually returns a meaningful error upon failure.

Also, I can't see the reason for the patch to move around
do_crashdump_trigger() and crashdump_trigger_keyhandler.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v2
  2011-12-01  9:08       ` Jan Beulich
@ 2011-12-01  9:49         ` Andrew Cooper
  2011-12-01 10:01           ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-01  9:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

On 01/12/11 09:08, Jan Beulich wrote:
>>>> On 30.11.11 at 18:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> -static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>> +static void * crash_notes[NR_CPUS];
> If at all possible, can you please allocate this dynamically using
> nr_cpu_ids for the size?

Ok

>> +static int kexec_init_cpu_notes(const int cpu)
> If the function's argument was made 'unsigned int' (as it already is in
> its caller), then ...
>
>> +    BUG_ON( cpu < 0 || cpu >= NR_CPUS );
> ... only the second check would be needed. Furthermore, it's
> pointless to use signed types for CPU numbers (and it's
> inefficient on various architectures), despite there being numerous
> (bad) examples throughout the tree.

Good point.  I will clean this up

>> +    if ( 0 == cpu )
>> +        nr_bytes = nr_bytes +
>> +            sizeof_note("Xen", sizeof(crash_xen_info_t));
> Minor nit: Why not use += here (presumably allowing the statement to
> fit on one line)?

Because the next few patches in my series adds a new crash notes at this
point.  I felt it was neater to leave in this form for a reduced diff
next patch, and gcc will optimize it to +=

>> +    if ( ! note )
>> +        /* Ideally, this would be -ENOMEM.  However, there are more problems
>> +         * assocated with trying to deal with -ENOMEM sensibly than just
>> +         * pretending that the crash note area doesn't exist. */
>> +        return 0;
> The only current caller ignores the return value, so why the bogus
> return value?

Originally it passed the return value back up to the cpu hotplug
notifier, but I decided that causing an -ENOMEM at this point was a
little silly given that:
1) a lack of mem on boot will quickly kill the host in other ways, and
2) while there is no way useful way to ensure that the crashdump kernel
gets reloaded with new notes pointers, blocking on not being able to
allocate notes is silly.

Would you consider this enough of a problem to actually fail the
CPU_PREPARE_UP ?

On further thought from my musings yesterday, it would be fantastic if
we were able to point the kdump kernel at a PT_NOTE header without
discerned length, at which point Xen can rewrite its crash notes without
having to get /sbin/kexec to repackage its magic elf blog pointing to
new/different memory locations.  However, as this would involve changes
to kexec-tools, Linux (and Xen) in a not trivially backward compatible
fashion, the chances of it happening are slim.

>> +    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] )
>> ...
>> -    if ( per_cpu(crash_notes, nr) == NULL )
>> -    {
>> ...
>> -    }
> I would suggest to retry allocation here. Even if this isn't suitable for
> a 32-bit kdump kernel on large systems, there#s no reason to penalize
> fully 64-bit environment.

At this point, none of the allocation makes it suitable for a 32bit
system.  For that, I need to start investigating something akin to
xalloc_below(), which is probably going to be todays task.  If we have
previously failed to allocate the notes buffer (and are somehow still
running), what if anything makes it likely that we will successfully
allocate memory this time?

> And here it would also become meaningful that kexec_init_cpu_notes()
> actually returns a meaningful error upon failure.
> Also, I can't see the reason for the patch to move around
> do_crashdump_trigger() and crashdump_trigger_keyhandler.

This is probably collateral damage from having to reorder sizeof_note()
and setup_note() in preference to making a forward declaration of them. 
I will see about fixing.

> Jan
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v2
  2011-12-01  9:49         ` Andrew Cooper
@ 2011-12-01 10:01           ` Jan Beulich
  2011-12-01 12:29             ` [RFC] KEXEC: allocate crash note buffers at boot time v3 Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2011-12-01 10:01 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

>>> On 01.12.11 at 10:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 01/12/11 09:08, Jan Beulich wrote:
>>>>> On 30.11.11 at 18:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> +    if ( ! note )
>>> +        /* Ideally, this would be -ENOMEM.  However, there are more problems
>>> +         * assocated with trying to deal with -ENOMEM sensibly than just
>>> +         * pretending that the crash note area doesn't exist. */
>>> +        return 0;
>> The only current caller ignores the return value, so why the bogus
>> return value?
> 
> Originally it passed the return value back up to the cpu hotplug
> notifier, but I decided that causing an -ENOMEM at this point was a
> little silly given that:
> 1) a lack of mem on boot will quickly kill the host in other ways, and
> 2) while there is no way useful way to ensure that the crashdump kernel
> gets reloaded with new notes pointers, blocking on not being able to
> allocate notes is silly.
> 
> Would you consider this enough of a problem to actually fail the
> CPU_PREPARE_UP ?

No, absolutely not. Ignoring the return value there is fine.

>>> +    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] )
>>> ...
>>> -    if ( per_cpu(crash_notes, nr) == NULL )
>>> -    {
>>> ...
>>> -    }
>> I would suggest to retry allocation here. Even if this isn't suitable for
>> a 32-bit kdump kernel on large systems, there#s no reason to penalize
>> fully 64-bit environment.
> 
> At this point, none of the allocation makes it suitable for a 32bit
> system.  For that, I need to start investigating something akin to
> xalloc_below(), which is probably going to be todays task.  If we have
> previously failed to allocate the notes buffer (and are somehow still
> running), what if anything makes it likely that we will successfully
> allocate memory this time?

Because memory got freed meanwhile? I'm particularly having post-
boot onlining of CPUs in mind; of course, if allocation failed during
boot we have bigger problems than this one.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 10:01           ` Jan Beulich
@ 2011-12-01 12:29             ` Andrew Cooper
  2011-12-01 12:56               ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-01 12:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

Here is v3 of this patch.

After some consideration, I am not so certain the spinlock is strictly
necessary.  However, as it is not a common codepath, I figure that it is
better to be safe than sorry.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 9531 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v3

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v2:

 *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
    rather than statically using NR_CPUS.
 *  Fix the incorrect use of signed integers for cpu id.
 *  Fix collateral damage to do_crashdump_trigger() and
    crashdump_trigger_handler caused by reordering sizeof_note() and
    setup_note()
 *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
    No functional change.
 *  Change kexec_get_cpu() to attempt to allocate crash note buffers
    in case we have more free memory now than when the pcpu came up.
 *  Now that there are two codepaths possibly allocating crash notes,
    protect the allocation itself with a spinlock.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.
 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.
 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,15 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
 
 bool_t kexecing = FALSE;
 
-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+static void ** crash_notes;
+static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
 
 static Elf_Note *xen_crash_note;
 
@@ -165,13 +167,17 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note;
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;
 
+    BUG_ON( ! crash_notes );
+
     if ( cpumask_test_and_set_cpu(cpu, &crash_saved_cpus) )
         return;
 
+    note = crash_notes[cpu];
+
     prstatus = (ELF_Prstatus *)ELFNOTE_DESC(note);
 
     note = ELFNOTE_NEXT(note);
@@ -257,13 +263,6 @@ static struct keyhandler crashdump_trigg
     .desc = "trigger a crashdump"
 };
 
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +279,112 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }
 
+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const unsigned long cpu)
+{
+    Elf_Note * note;
+    int ret = 0;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu >= NR_CPUS || ! crash_notes );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu] )
+        return ret;
+
+    spin_lock(&crash_notes_lock);
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( 0 == cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+    if ( ! note )
+    {
+        ret = -ENOMEM;
+        goto unlock;
+    }
+
+    crash_notes[cpu] = note;
+
+    /* Setup CORE note. */
+    setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+    note = ELFNOTE_NEXT(note);
+
+    /* Setup Xen CORE note. */
+    setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+               sizeof(crash_xen_core_t));
+
+    if ( 0 == cpu )
+    {
+        /* Set up Xen Crash Info note. */
+        xen_crash_note = note = ELFNOTE_NEXT(note);
+        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                   sizeof(crash_xen_info_t));
+    }
+
+unlock:
+    spin_unlock(&crash_notes_lock);
+
+    return ret;
+}
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned long cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        /* Ignore return value.  If this boot time, -ENOMEM will cause all
+         * manner of problems elsewhere very soon, and if it is during runtime,
+         * then failing to allocate crash notes is not a good enough reason to
+         * fail the CPU_UP_PREPARE */
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(unsigned long)smp_processor_id();
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( 0 == kexec_crash_area.size )
+        return 0;
+
+    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
+    if ( ! crash_notes )
+        return -ENOMEM;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -296,7 +401,12 @@ static int kexec_get_cpu(xen_kexec_range
     int nr = range->nr;
     int nr_bytes = 0;
 
-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes)
+        return -EINVAL;
+
+    /* Try once again to allocate room for the crash notes.  It is just possible
+     * that more space has become available since we last tried. */
+    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
         return -EINVAL;
 
     nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
@@ -306,31 +416,7 @@ static int kexec_get_cpu(xen_kexec_range
     if ( nr == 0 )
         nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
 
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
+    range->start = __pa((unsigned long)crash_notes[nr]);
     range->size = nr_bytes;
     return 0;
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 12:29             ` [RFC] KEXEC: allocate crash note buffers at boot time v3 Andrew Cooper
@ 2011-12-01 12:56               ` Jan Beulich
  2011-12-01  5:20                 ` Keir Fraser
                                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jan Beulich @ 2011-12-01 12:56 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>+static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;

Please use DEFINE_SPINLOCK() here.

>+    register_keyhandler('C', &crashdump_trigger_keyhandler);
>+
>+    /* If no crash area, no need to allocate space for notes. */
>+    if ( 0 == kexec_crash_area.size )
>+        return 0;

Wouldn't it make sense to switch the order of these?

>+    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));

Please use xmalloc_array() here.

>+    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )

The first check is pointless - the function will return zero if the
allocation was already done.

Further, you shouldn't take a lock around a call to xmalloc() or alike
unless absolutely necessary. It is pretty simple to avoid here - you
really only need to lock around the storing of the pointer and maybe
the setup_note() calls (but be careful with returning -ENOMEM - you
shouldn't if the allocation fails, but you then find - under the lock -
that a pointer was already set by another CPU).

Finally, one thing I failed to notice on the previous version - the
nr_bytes calculations are now being done twice. This should
probably be moved into a helper function, especially since you
said you intend to add stuff here subsequently.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 12:56               ` Jan Beulich
  2011-12-01  5:20                 ` Keir Fraser
@ 2011-12-01 13:59                 ` Andrew Cooper
  2011-12-01 15:14                   ` Jan Beulich
  2011-12-01 15:02                 ` Andrew Cooper
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-01 13:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

On 01/12/11 12:56, Jan Beulich wrote:
>>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
> Please use DEFINE_SPINLOCK() here.

Ok

>> +    register_keyhandler('C', &crashdump_trigger_keyhandler);
>> +
>> +    /* If no crash area, no need to allocate space for notes. */
>> +    if ( 0 == kexec_crash_area.size )
>> +        return 0;
> Wouldn't it make sense to switch the order of these?

Possibly.  In the case where a crash kernel has not been loaded, it
would degrade to a reboot, so it is still of some use if the there is no
kexec area.  Having said that, there is an explicit reboot handler, so
making this one disappear is probably a good thing.

>> +    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
> Please use xmalloc_array() here.

Yes - it was dim of me to forget that.

>> +    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
> The first check is pointless - the function will return zero if the
> allocation was already done.

Good point - I missed that.

> Further, you shouldn't take a lock around a call to xmalloc() or alike
> unless absolutely necessary. It is pretty simple to avoid here - you
> really only need to lock around the storing of the pointer and maybe
> the setup_note() calls (but be careful with returning -ENOMEM - you
> shouldn't if the allocation fails, but you then find - under the lock -
> that a pointer was already set by another CPU).

So what we should do is this:

xmalloc
take lock
check to see if the entry is been filled in the meantime.  if so, free
the malloc'd region
release lock
only return -ENOMEM if we fail the malloc and the crash_note is still
NULL when we take the lock

I think this ought to cover all possible cases ?

(In reality I think the xmalloc itself should be covered by the fact we
will fail the !cpu_online(nr) test before we consider trying to
reallocate the buffer, but that doesn't preclude future proofing the code)

> Finally, one thing I failed to notice on the previous version - the
> nr_bytes calculations are now being done twice. This should
> probably be moved into a helper function, especially since you
> said you intend to add stuff here subsequently.

I had noticed this and was going to let it slide for now, considering
what would be best to do about it.  Playing with void pointers and
calculating lengths with sizeof is always more dangerous than
calculating a size, malloc'ing it and filling in a range start and size.

Given that it is such a rare codepath, I am honestly not sure which is
the better tradeof - an extra function call in 2 places or doubling the
size of the crash_notes array by introducing a size as well as a start. 
Both seem very minor in the grand scheme of things.

> Jan
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01  5:20                 ` Keir Fraser
@ 2011-12-01 14:00                   ` Andrew Cooper
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2011-12-01 14:00 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Jan Beulich

On 01/12/11 05:20, Keir Fraser wrote:
> On 01/12/2011 12:56, "Jan Beulich" <JBeulich@suse.com> wrote:
>
>>> +    register_keyhandler('C', &crashdump_trigger_keyhandler);
>>> +
>>> +    /* If no crash area, no need to allocate space for notes. */
>>> +    if ( 0 == kexec_crash_area.size )
>>> +        return 0;
>> Wouldn't it make sense to switch the order of these?
> I also really hate this constant-first ordering.
>
>  -- Keir
>

It was a useful habit I got into when learning C/C++ and has stayed with
me.  I will endeavor to not do it.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 12:56               ` Jan Beulich
  2011-12-01  5:20                 ` Keir Fraser
  2011-12-01 13:59                 ` Andrew Cooper
@ 2011-12-01 15:02                 ` Andrew Cooper
  2011-12-01 15:15                   ` Jan Beulich
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-01 15:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

On 01/12/11 12:56, Jan Beulich wrote:
>>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
> Please use DEFINE_SPINLOCK() here.
>
>> +    register_keyhandler('C', &crashdump_trigger_keyhandler);
>> +
>> +    /* If no crash area, no need to allocate space for notes. */
>> +    if ( 0 == kexec_crash_area.size )
>> +        return 0;
> Wouldn't it make sense to switch the order of these?
>
>> +    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
> Please use xmalloc_array() here.
>
>> +    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
> The first check is pointless - the function will return zero if the
> allocation was already done.

I forgot to say in my previous email.  Short circuit evaluation should
prevent the kexec_init_cpu_notes() function call actually being made.

> Further, you shouldn't take a lock around a call to xmalloc() or alike
> unless absolutely necessary. It is pretty simple to avoid here - you
> really only need to lock around the storing of the pointer and maybe
> the setup_note() calls (but be careful with returning -ENOMEM - you
> shouldn't if the allocation fails, but you then find - under the lock -
> that a pointer was already set by another CPU).
>
> Finally, one thing I failed to notice on the previous version - the
> nr_bytes calculations are now being done twice. This should
> probably be moved into a helper function, especially since you
> said you intend to add stuff here subsequently.
>
> Jan
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 13:59                 ` Andrew Cooper
@ 2011-12-01 15:14                   ` Jan Beulich
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2011-12-01 15:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

>>> On 01.12.11 at 14:59, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 01/12/11 12:56, Jan Beulich wrote:
>> Further, you shouldn't take a lock around a call to xmalloc() or alike
>> unless absolutely necessary. It is pretty simple to avoid here - you
>> really only need to lock around the storing of the pointer and maybe
>> the setup_note() calls (but be careful with returning -ENOMEM - you
>> shouldn't if the allocation fails, but you then find - under the lock -
>> that a pointer was already set by another CPU).
> 
> So what we should do is this:
> 
> xmalloc
> take lock
> check to see if the entry is been filled in the meantime.  if so, free
> the malloc'd region

Don't call xfree() with the lock held either, if possible.

> release lock
> only return -ENOMEM if we fail the malloc and the crash_note is still
> NULL when we take the lock

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v3
  2011-12-01 15:02                 ` Andrew Cooper
@ 2011-12-01 15:15                   ` Jan Beulich
  2011-12-01 17:14                     ` [RFC] KEXEC: allocate crash note buffers at boot time v4 Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2011-12-01 15:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

>>> On 01.12.11 at 16:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 01/12/11 12:56, Jan Beulich wrote:
>>>>> On 01.12.11 at 13:29, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> +static spinlock_t crash_notes_lock = SPIN_LOCK_UNLOCKED;
>> Please use DEFINE_SPINLOCK() here.
>>
>>> +    register_keyhandler('C', &crashdump_trigger_keyhandler);
>>> +
>>> +    /* If no crash area, no need to allocate space for notes. */
>>> +    if ( 0 == kexec_crash_area.size )
>>> +        return 0;
>> Wouldn't it make sense to switch the order of these?
>>
>>> +    crash_notes = xmalloc_bytes(nr_cpu_ids * sizeof(void*));
>> Please use xmalloc_array() here.
>>
>>> +    if ( !crash_notes[nr] && 0 != kexec_init_cpu_notes(nr) )
>> The first check is pointless - the function will return zero if the
>> allocation was already done.
> 
> I forgot to say in my previous email.  Short circuit evaluation should
> prevent the kexec_init_cpu_notes() function call actually being made.

What would that buy you? Performance is not an issue on this code
path.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v4
  2011-12-01 15:15                   ` Jan Beulich
@ 2011-12-01 17:14                     ` Andrew Cooper
  2011-12-02  8:02                       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-01 17:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 286 bytes --]

Version 4 attached.

Fixed the logic regarding locking and x{malloc,free}'ing, added a few
more comments in places, and changed the coding style to avoid
constant-first compares.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 10666 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v4

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v3:

 *  Alter the spinlocks to avoid calling xmalloc/xfree while holding
    the lock.
 *  Tidy up the coding style used.

Changes since v2:

 *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
    rather than statically using NR_CPUS.
 *  Fix the incorrect use of signed integers for cpu id.
 *  Fix collateral damage to do_crashdump_trigger() and
    crashdump_trigger_handler caused by reordering sizeof_note() and
    setup_note()
 *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
    No functional change.
 *  Change kexec_get_cpu() to attempt to allocate crash note buffers
    in case we have more free memory now than when the pcpu came up.
 *  Now that there are two codepaths possibly allocating crash notes,
    protect the allocation itself with a spinlock.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.
 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.
 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,18 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
 
 bool_t kexecing = FALSE;
 
-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+/* Memory regions to store the per cpu register state etc. on a crash. */
+static void ** crash_notes;
+
+/* Lock to prevent race conditions when allocating the crash note buffers. */
+static DEFINE_SPINLOCK(crash_notes_lock);
 
 static Elf_Note *xen_crash_note;
 
@@ -165,13 +170,17 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note;
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;
 
+    BUG_ON( ! crash_notes );
+
     if ( cpumask_test_and_set_cpu(cpu, &crash_saved_cpus) )
         return;
 
+    note = crash_notes[cpu];
+
     prstatus = (ELF_Prstatus *)ELFNOTE_DESC(note);
 
     note = ELFNOTE_NEXT(note);
@@ -257,13 +266,6 @@ static struct keyhandler crashdump_trigg
     .desc = "trigger a crashdump"
 };
 
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +282,128 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }
 
+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const unsigned long cpu)
+{
+    Elf_Note * note;
+    int ret = 0;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu >= nr_cpu_ids || ! crash_notes );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu] )
+        return ret;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( ! cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+
+    /* Protect the write into crash_notes[] with a spinlock, as this function
+     * is on a hotplug path and a hypercall path. */
+    spin_lock(&crash_notes_lock);
+
+    /* If we are racing with another CPU and it has beaten us, give up
+     * gracefully. */
+    if ( crash_notes[cpu] )
+    {
+        spin_unlock(&crash_notes_lock);
+        /* Always return ok, because whether we successfully allocated or not,
+         * another CPU has successfully allocated. */
+        if ( note )
+            xfree(note);
+    }
+    else
+    {
+        crash_notes[cpu] = note;
+        spin_unlock(&crash_notes_lock);
+
+        /* If the allocation failed, and another CPU did not beat us, give
+         * up with ENOMEM. */
+        if ( ! note )
+            ret = -ENOMEM;
+        /* else all is good so lets set up the notes. */
+        else
+        {
+            /* Set up CORE note. */
+            setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+            note = ELFNOTE_NEXT(note);
+
+            /* Set up Xen CORE note. */
+            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+                       sizeof(crash_xen_core_t));
+
+            if ( ! cpu )
+            {
+                /* Set up Xen Crash Info note. */
+                xen_crash_note = note = ELFNOTE_NEXT(note);
+                setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                           sizeof(crash_xen_info_t));
+            }
+        }
+    }
+
+    return ret;
+}
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned long cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        /* Ignore return value.  If this boot time, -ENOMEM will cause all
+         * manner of problems elsewhere very soon, and if it is during runtime,
+         * then failing to allocate crash notes is not a good enough reason to
+         * fail the CPU_UP_PREPARE */
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(unsigned long)smp_processor_id();
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( !kexec_crash_area.size )
+        return 0;
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    crash_notes = xmalloc_array(void *, nr_cpu_ids);
+    if ( ! crash_notes )
+        return -ENOMEM;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -296,7 +420,14 @@ static int kexec_get_cpu(xen_kexec_range
     int nr = range->nr;
     int nr_bytes = 0;
 
-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes )
+        return -EINVAL;
+
+    /* Try once again to allocate room for the crash notes.  It is just possible
+     * that more space has become available since we last tried.  If space has
+     * already been allocated, kexec_init_cpu_notes() will return early with 0.
+     */
+    if ( kexec_init_cpu_notes(nr) )
         return -EINVAL;
 
     nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
@@ -306,31 +437,7 @@ static int kexec_get_cpu(xen_kexec_range
     if ( nr == 0 )
         nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
 
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
+    range->start = __pa((unsigned long)crash_notes[nr]);
     range->size = nr_bytes;
     return 0;
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v4
  2011-12-01 17:14                     ` [RFC] KEXEC: allocate crash note buffers at boot time v4 Andrew Cooper
@ 2011-12-02  8:02                       ` Jan Beulich
  2011-12-02 12:33                         ` Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2011-12-02  8:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

>>> On 01.12.11 at 18:14, Andrew Cooper <andrew.cooper3@citrix.com> wrote:

Looks good to me now except for one minor thing (below), and the
fact that you still decided to retain the two duplicates of the size
calculation (I'll have to remember to clean this up if you don't, unless
Keir explicitly agrees with the duplication).

>+    /* Try once again to allocate room for the crash notes.  It is just possible
>+     * that more space has become available since we last tried.  If space has
>+     * already been allocated, kexec_init_cpu_notes() will return early with 0.
>+     */
>+    if ( kexec_init_cpu_notes(nr) )
>         return -EINVAL;

The function can fail only with -ENOMEM, so why not return this here?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] KEXEC: allocate crash note buffers at boot time v4
  2011-12-02  8:02                       ` Jan Beulich
@ 2011-12-02 12:33                         ` Andrew Cooper
  2011-12-02 15:19                           ` KEXEC: allocate crash note buffers at boot time v5 Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-02 12:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

On 02/12/11 08:02, Jan Beulich wrote:
>>>> On 01.12.11 at 18:14, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Looks good to me now except for one minor thing (below), and the
> fact that you still decided to retain the two duplicates of the size
> calculation (I'll have to remember to clean this up if you don't, unless
> Keir explicitly agrees with the duplication).

Ok - I will turn them into a start,size pair

>> +    /* Try once again to allocate room for the crash notes.  It is just possible
>> +     * that more space has become available since we last tried.  If space has
>> +     * already been allocated, kexec_init_cpu_notes() will return early with 0.
>> +     */
>> +    if ( kexec_init_cpu_notes(nr) )
>>         return -EINVAL;
> The function can fail only with -ENOMEM, so why not return this here?
>
> Jan
>

Actually, returning -EINVAL here is counter productive.  -EINVAL is used
by dom0 to work out when it has asked for each CPU.  This in itself is a
little broken because there is nothing stopping a middle CPU from being
offline at the time these hypercalls are made.  The other thing is that
there is nothing stopping an offline cpu from having a valid notes
section.  Therefore, the test of online should be removed, so -EINVAL
only gets returned for cpu out of range, or not set up notes at all in
the first place.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 12:33                         ` Andrew Cooper
@ 2011-12-02 15:19                           ` Andrew Cooper
  2011-12-02 16:04                             ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2011-12-02 15:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

Here is v5.

I have tested quite carefully the impact of changing the error
conditions in kexec_get_cpu().  dom0 is happy to accept ERANGE as well
as EINVAL, and ERANGE is a more sensible response.  Moreover, this new
scheme (ignoring for a moment ENOMEM) allows for all nr_cpu_ids to give
note ranges to dom0 whether they are up or not at the moment.   It is
better to have more notes provided with some of them potentially not
filled in, than to miss a crashed CPU register state because it was
offline when the crash kernel was loaded.

All this testing has been done via backport to 4.1.2.  Minimal changes
were necessary, such as the lack of xzalloc_array() and NR_CPUS instead
of nr_cpu_ids, but it was otherwise functionally identical.

I am not sure whether to recomend this for backport into 4.1-testing or
not.  It does make the crashkernel more likely to get crash notes,
although in reality, it was only special circumstances which caused
problems.  If you consider it worth backporting, I can provide my
already-backported patch.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 11725 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v5

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v4:

 *  Replace the current cpu crash note scheme of using void pointers
    and hand calculating the size each time is needed, by a range
    structure containing a pointer and a size.  This removes duplicate
    times where the size is calculated.
 *  Tweak kexec_get_cpu().  Don't fail if a cpu is offline because it
    may already have crash notes, and may be up by the time a crash
    happens.  Split the error conditions up to return ERANGE for an
    out-of-range cpu request rather than EINVAL.  Finally, returning a
    range of zeros is acceptable, so do this in preference to failing.

Changes since v3:

 *  Alter the spinlocks to avoid calling xmalloc/xfree while holding
    the lock.
 *  Tidy up the coding style used.

Changes since v2:

 *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
    rather than statically using NR_CPUS.
 *  Fix the incorrect use of signed integers for cpu id.
 *  Fix collateral damage to do_crashdump_trigger() and
    crashdump_trigger_handler caused by reordering sizeof_note() and
    setup_note()
 *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
    No functional change.
 *  Change kexec_get_cpu() to attempt to allocate crash note buffers
    in case we have more free memory now than when the pcpu came up.
 *  Now that there are two codepaths possibly allocating crash notes,
    protect the allocation itself with a spinlock.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.
 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.
 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,19 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
 
 bool_t kexecing = FALSE;
 
-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+/* Memory regions to store the per cpu register state etc. on a crash. */
+typedef struct { Elf_Note * start; size_t size; } crash_note_range_t;
+static crash_note_range_t * crash_notes;
+
+/* Lock to prevent race conditions when allocating the crash note buffers. */
+static DEFINE_SPINLOCK(crash_notes_lock);
 
 static Elf_Note *xen_crash_note;
 
@@ -165,13 +171,17 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note;
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;
 
+    BUG_ON( ! crash_notes );
+
     if ( cpumask_test_and_set_cpu(cpu, &crash_saved_cpus) )
         return;
 
+    note = crash_notes[cpu].start;
+
     prstatus = (ELF_Prstatus *)ELFNOTE_DESC(note);
 
     note = ELFNOTE_NEXT(note);
@@ -257,13 +267,6 @@ static struct keyhandler crashdump_trigg
     .desc = "trigger a crashdump"
 };
 
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +283,129 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }
 
+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const unsigned long cpu)
+{
+    Elf_Note * note;
+    int ret = 0;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu >= nr_cpu_ids || ! crash_notes );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu].start )
+        return ret;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( ! cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+
+    /* Protect the write into crash_notes[] with a spinlock, as this function
+     * is on a hotplug path and a hypercall path. */
+    spin_lock(&crash_notes_lock);
+
+    /* If we are racing with another CPU and it has beaten us, give up
+     * gracefully. */
+    if ( crash_notes[cpu].start )
+    {
+        spin_unlock(&crash_notes_lock);
+        /* Always return ok, because whether we successfully allocated or not,
+         * another CPU has successfully allocated. */
+        if ( note )
+            xfree(note);
+    }
+    else
+    {
+        crash_notes[cpu].start = note;
+        crash_notes[cpu].size = nr_bytes;
+        spin_unlock(&crash_notes_lock);
+
+        /* If the allocation failed, and another CPU did not beat us, give
+         * up with ENOMEM. */
+        if ( ! note )
+            ret = -ENOMEM;
+        /* else all is good so lets set up the notes. */
+        else
+        {
+            /* Set up CORE note. */
+            setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+            note = ELFNOTE_NEXT(note);
+
+            /* Set up Xen CORE note. */
+            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+                       sizeof(crash_xen_core_t));
+
+            if ( ! cpu )
+            {
+                /* Set up Xen Crash Info note. */
+                xen_crash_note = note = ELFNOTE_NEXT(note);
+                setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                           sizeof(crash_xen_info_t));
+            }
+        }
+    }
+
+    return ret;
+}
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned long cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        /* Ignore return value.  If this boot time, -ENOMEM will cause all
+         * manner of problems elsewhere very soon, and if it is during runtime,
+         * then failing to allocate crash notes is not a good enough reason to
+         * fail the CPU_UP_PREPARE */
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(unsigned long)smp_processor_id();
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( !kexec_crash_area.size )
+        return 0;
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    crash_notes = xzalloc_array(crash_note_range_t, nr_cpu_ids);
+    if ( ! crash_notes )
+        return -ENOMEM;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -294,44 +420,23 @@ static int kexec_get_reserve(xen_kexec_r
 static int kexec_get_cpu(xen_kexec_range_t *range)
 {
     int nr = range->nr;
-    int nr_bytes = 0;
 
-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids )
+        return -ERANGE;
+
+    if ( ! crash_notes )
         return -EINVAL;
 
-    nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
-    nr_bytes += sizeof_note("Xen", sizeof(crash_xen_core_t));
+    /* Try once again to allocate room for the crash notes.  It is just possible
+     * that more space has become available since we last tried.  If space has
+     * already been allocated, kexec_init_cpu_notes() will return early with 0.
+     */
+    kexec_init_cpu_notes(nr);
 
-    /* The Xen info note is included in CPU0's range. */
-    if ( nr == 0 )
-        nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
-
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
-    range->size = nr_bytes;
+    /* In the case of still not having enough memory to allocate buffer room,
+     * returning a range of 0,0 is still valid. */
+    range->start = __pa((unsigned long)crash_notes[nr].start);
+    range->size = crash_notes[nr].size;
     return 0;
 }
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 15:19                           ` KEXEC: allocate crash note buffers at boot time v5 Andrew Cooper
@ 2011-12-02 16:04                             ` Jan Beulich
  2011-12-02 16:10                               ` KEXEC: allocate crash note buffers at boot time v Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2011-12-02 16:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

(Andrew, my previous reply dropped all Cc-s, but I'm re-sending not only
because of that - I also adjusted the first part of the response.)

>>> On 02.12.11 at 16:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote:

I just had another look at the Dom0 side of things, and I fail to see why
you think moving this out of per-CPU data is necessary: All Dom0 does
with the provided info is set up the resource tree. The data doesn't get
stored for any post-boot use. What am I overlooking?

>+    /* In the case of still not having enough memory to allocate buffer room,
>+     * returning a range of 0,0 is still valid. */
>+    range->start = __pa((unsigned long)crash_notes[nr].start);
>+    range->size = crash_notes[nr].size;

Comment and implementation don't match - __pa(NULL) is not zero
(and the cast to 'unsigned long' is pointless afaict).

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v
  2011-12-02 16:04                             ` Jan Beulich
@ 2011-12-02 16:10                               ` Andrew Cooper
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2011-12-02 16:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

On 02/12/11 16:04, Jan Beulich wrote:
> (Andrew, my previous reply dropped all Cc-s, but I'm re-sending not only
> because of that - I also adjusted the first part of the response.)
>
>>>> On 02.12.11 at 16:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> I just had another look at the Dom0 side of things, and I fail to see why
> you think moving this out of per-CPU data is necessary: All Dom0 does
> with the provided info is set up the resource tree. The data doesn't get
> stored for any post-boot use. What am I overlooking?

(for the benifit of the list)

/sbin/kexec opens /proc/iomem and parses the strings looking for "Crash
Note" and pulls the ranges out that way.  (It also assumes that the
lowest crash note is pcpu0 and so on, which is also something I will
work around in later notes changes).

>> +    /* In the case of still not having enough memory to allocate buffer room,
>> +     * returning a range of 0,0 is still valid. */
>> +    range->start = __pa((unsigned long)crash_notes[nr].start);
>> +    range->size = crash_notes[nr].size;
> Comment and implementation don't match - __pa(NULL) is not zero
> (and the cast to 'unsigned long' is pointless afaict).
> Jan

Very silly mistake on my behalf.  v6 is attached with that fixed.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 11903 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v6

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v5:
 *  Fix silly mistake about returning a 0 range in kexec_get_cpu()

Changes since v4:

 *  Replace the current cpu crash note scheme of using void pointers
    and hand calculating the size each time is needed, by a range
    structure containing a pointer and a size.  This removes duplicate
    times where the size is calculated.
 *  Tweak kexec_get_cpu().  Don't fail if a cpu is offline because it
    may already have crash notes, and may be up by the time a crash
    happens.  Split the error conditions up to return ERANGE for an
    out-of-range cpu request rather than EINVAL.  Finally, returning a
    range of zeros is acceptable, so do this in preference to failing.

Changes since v3:

 *  Alter the spinlocks to avoid calling xmalloc/xfree while holding
    the lock.
 *  Tidy up the coding style used.

Changes since v2:

 *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
    rather than statically using NR_CPUS.
 *  Fix the incorrect use of signed integers for cpu id.
 *  Fix collateral damage to do_crashdump_trigger() and
    crashdump_trigger_handler caused by reordering sizeof_note() and
    setup_note()
 *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
    No functional change.
 *  Change kexec_get_cpu() to attempt to allocate crash note buffers
    in case we have more free memory now than when the pcpu came up.
 *  Now that there are two codepaths possibly allocating crash notes,
    protect the allocation itself with a spinlock.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.
 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.
 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,19 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
 
 bool_t kexecing = FALSE;
 
-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+/* Memory regions to store the per cpu register state etc. on a crash. */
+typedef struct { Elf_Note * start; size_t size; } crash_note_range_t;
+static crash_note_range_t * crash_notes;
+
+/* Lock to prevent race conditions when allocating the crash note buffers. */
+static DEFINE_SPINLOCK(crash_notes_lock);
 
 static Elf_Note *xen_crash_note;
 
@@ -165,13 +171,17 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note;
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;
 
+    BUG_ON( ! crash_notes );
+
     if ( cpumask_test_and_set_cpu(cpu, &crash_saved_cpus) )
         return;
 
+    note = crash_notes[cpu].start;
+
     prstatus = (ELF_Prstatus *)ELFNOTE_DESC(note);
 
     note = ELFNOTE_NEXT(note);
@@ -257,13 +267,6 @@ static struct keyhandler crashdump_trigg
     .desc = "trigger a crashdump"
 };
 
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +283,129 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }
 
+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const unsigned long cpu)
+{
+    Elf_Note * note;
+    int ret = 0;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu >= nr_cpu_ids || ! crash_notes );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu].start )
+        return ret;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( ! cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+
+    /* Protect the write into crash_notes[] with a spinlock, as this function
+     * is on a hotplug path and a hypercall path. */
+    spin_lock(&crash_notes_lock);
+
+    /* If we are racing with another CPU and it has beaten us, give up
+     * gracefully. */
+    if ( crash_notes[cpu].start )
+    {
+        spin_unlock(&crash_notes_lock);
+        /* Always return ok, because whether we successfully allocated or not,
+         * another CPU has successfully allocated. */
+        if ( note )
+            xfree(note);
+    }
+    else
+    {
+        crash_notes[cpu].start = note;
+        crash_notes[cpu].size = nr_bytes;
+        spin_unlock(&crash_notes_lock);
+
+        /* If the allocation failed, and another CPU did not beat us, give
+         * up with ENOMEM. */
+        if ( ! note )
+            ret = -ENOMEM;
+        /* else all is good so lets set up the notes. */
+        else
+        {
+            /* Set up CORE note. */
+            setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+            note = ELFNOTE_NEXT(note);
+
+            /* Set up Xen CORE note. */
+            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+                       sizeof(crash_xen_core_t));
+
+            if ( ! cpu )
+            {
+                /* Set up Xen Crash Info note. */
+                xen_crash_note = note = ELFNOTE_NEXT(note);
+                setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                           sizeof(crash_xen_info_t));
+            }
+        }
+    }
+
+    return ret;
+}
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned long cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        /* Ignore return value.  If this boot time, -ENOMEM will cause all
+         * manner of problems elsewhere very soon, and if it is during runtime,
+         * then failing to allocate crash notes is not a good enough reason to
+         * fail the CPU_UP_PREPARE */
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(unsigned long)smp_processor_id();
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( !kexec_crash_area.size )
+        return 0;
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    crash_notes = xzalloc_array(crash_note_range_t, nr_cpu_ids);
+    if ( ! crash_notes )
+        return -ENOMEM;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -294,44 +420,29 @@ static int kexec_get_reserve(xen_kexec_r
 static int kexec_get_cpu(xen_kexec_range_t *range)
 {
     int nr = range->nr;
-    int nr_bytes = 0;
 
-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids )
+        return -ERANGE;
+
+    if ( ! crash_notes )
         return -EINVAL;
 
-    nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
-    nr_bytes += sizeof_note("Xen", sizeof(crash_xen_core_t));
+    /* Try once again to allocate room for the crash notes.  It is just possible
+     * that more space has become available since we last tried.  If space has
+     * already been allocated, kexec_init_cpu_notes() will return early with 0.
+     */
+    kexec_init_cpu_notes(nr);
 
-    /* The Xen info note is included in CPU0's range. */
-    if ( nr == 0 )
-        nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
+    /* In the case of still not having enough memory to allocate buffer room,
+     * returning a range of 0,0 is still valid. */
+    if ( crash_notes[nr].start )
+    {
+        range->start = __pa(crash_notes[nr].start);
+        range->size = crash_notes[nr].size;
+    }
+    else
+        range->start = range->size = 0;
 
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
-    range->size = nr_bytes;
     return 0;
 }
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-12-02 16:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-29 18:56 [RFC] KEXEC: allocate crash note buffers at boot time Andrew Cooper
2011-11-29 11:19 ` Keir Fraser
2011-11-30 13:14   ` Andrew Cooper
2011-11-30 17:24     ` [RFC] KEXEC: allocate crash note buffers at boot time v2 Andrew Cooper
2011-12-01  9:08       ` Jan Beulich
2011-12-01  9:49         ` Andrew Cooper
2011-12-01 10:01           ` Jan Beulich
2011-12-01 12:29             ` [RFC] KEXEC: allocate crash note buffers at boot time v3 Andrew Cooper
2011-12-01 12:56               ` Jan Beulich
2011-12-01  5:20                 ` Keir Fraser
2011-12-01 14:00                   ` Andrew Cooper
2011-12-01 13:59                 ` Andrew Cooper
2011-12-01 15:14                   ` Jan Beulich
2011-12-01 15:02                 ` Andrew Cooper
2011-12-01 15:15                   ` Jan Beulich
2011-12-01 17:14                     ` [RFC] KEXEC: allocate crash note buffers at boot time v4 Andrew Cooper
2011-12-02  8:02                       ` Jan Beulich
2011-12-02 12:33                         ` Andrew Cooper
2011-12-02 15:19                           ` KEXEC: allocate crash note buffers at boot time v5 Andrew Cooper
2011-12-02 16:04                             ` Jan Beulich
2011-12-02 16:10                               ` KEXEC: allocate crash note buffers at boot time v Andrew Cooper
2011-11-30  9:20 ` [RFC] KEXEC: allocate crash note buffers at boot time Jan Beulich
2011-11-30 14:01   ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.