* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory @ 2011-12-29 15:51 Jan Beulich 2011-12-30 11:19 ` Tim Deegan 2011-12-31 0:11 ` Andrew Cooper 0 siblings, 2 replies; 11+ messages in thread From: Jan Beulich @ 2011-12-29 15:51 UTC (permalink / raw) To: andrew.cooper3; +Cc: xen-devel >>> Andrew Cooper 12/23/11 12:59 PM >>> >On 23/12/11 09:06, Jan Beulich wrote: >>>>> On 22.12.11 at 18:36, Andrew Cooper wrote: >>> On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such >>> as the CPU crash notes will go into the xenheap, which tends to be in >>> upper memory. This causes problems on machines with more than 64GB >>> (or 4GB if no PAE support) of ram as the crashkernel physically cant >>> access the crash notes. >> What use is a crash dump taken with a 32-bit kernel on a machine >> with more than 64G (or more than 4G is the crash kernel doesn't >> support PAE)? > >Very little use at all, which is the reason for this change. With this change, the usefulness doesn't significantly increase imo. >The correct solution is indeed to use a 64bit dom0 kernel, but while >politics is preventing that from happening, another solution has to be >found. I doubt that XenServer is the only product which would benifit, >which is why the changes are presented for upstreaming. > >>> 3) Change the conring buffer to use lower memory when instructed. >> Why does this one need moving, but none of the other Xen data >> structures? If anything, I would have expected you to limit the Xen >> heap altogether, so that automatically all allocations get done in a >> way so that at least Xen's data structures are available in the >> (truncated) dump. > >This is part of the "min" option which is trying to have the least >possible impact. The idea is to have this in low memory, use the >"console_to_ring" boot option to copy dom0 dmesg into conring, and pass >its physical address and size in a crash note, so that the crash kernel >environment grab it all. Why is the console ring *that* important? I would have thought that proper register values and stack contents are much more significant for analysis of a crash. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-29 15:51 [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Jan Beulich @ 2011-12-30 11:19 ` Tim Deegan 2011-12-31 0:11 ` Andrew Cooper 1 sibling, 0 replies; 11+ messages in thread From: Tim Deegan @ 2011-12-30 11:19 UTC (permalink / raw) To: Jan Beulich; +Cc: andrew.cooper3, xen-devel At 15:51 +0000 on 29 Dec (1325173875), Jan Beulich wrote: > >This is part of the "min" option which is trying to have the least > >possible impact. The idea is to have this in low memory, use the > >"console_to_ring" boot option to copy dom0 dmesg into conring, and pass > >its physical address and size in a crash note, so that the crash kernel > >environment grab it all. > > Why is the console ring *that* important? I would have thought > that proper register values and stack contents are much more > significant for analysis of a crash. The console ring has been _very_ useful in diagnosing bugs from field reports, especially things like guest-level watchdog timeouts and refcounting errors that cause a crash after the interesting event has passed. If nothing else it lets you verify the user's description of the system, and saves a round-trip of 'please try to set up a serial logger and reproduce the bug'). (I agree, though, that using a 64-bit crash kernel would be a much better idea). Tim. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-29 15:51 [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Jan Beulich 2011-12-30 11:19 ` Tim Deegan @ 2011-12-31 0:11 ` Andrew Cooper 2011-12-31 7:38 ` Jan Beulich 1 sibling, 1 reply; 11+ messages in thread From: Andrew Cooper @ 2011-12-31 0:11 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On 29/12/2011 15:51, Jan Beulich wrote: >>>> Andrew Cooper 12/23/11 12:59 PM >>> >> On 23/12/11 09:06, Jan Beulich wrote: >>>>>> On 22.12.11 at 18:36, Andrew Cooper wrote: >>>> On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such >>>> as the CPU crash notes will go into the xenheap, which tends to be in >>>> upper memory. This causes problems on machines with more than 64GB >>>> (or 4GB if no PAE support) of ram as the crashkernel physically cant >>>> access the crash notes. >>> What use is a crash dump taken with a 32-bit kernel on a machine >>> with more than 64G (or more than 4G is the crash kernel doesn't >>> support PAE)? >> Very little use at all, which is the reason for this change. > With this change, the usefulness doesn't significantly increase imo. But it does make it possible for a 32bit crashkernel to get information useful for debugging a problem, which is the point of the patch. If you honestly thing that this change is not worth the effort then that is fine, but as far as I am aware, there are still several good reasons to use a 32bit dom0 over a 64bit dom0. Following this reasoning is why I believe that this feature may be useful to other projects, as well as XenServer. >> The correct solution is indeed to use a 64bit dom0 kernel, but while >> politics is preventing that from happening, another solution has to be >> found. I doubt that XenServer is the only product which would benifit, >> which is why the changes are presented for upstreaming. >> >>>> 3) Change the conring buffer to use lower memory when instructed. >>> Why does this one need moving, but none of the other Xen data >>> structures? If anything, I would have expected you to limit the Xen >>> heap altogether, so that automatically all allocations get done in a >>> way so that at least Xen's data structures are available in the >>> (truncated) dump. >> This is part of the "min" option which is trying to have the least >> possible impact. The idea is to have this in low memory, use the >> "console_to_ring" boot option to copy dom0 dmesg into conring, and pass >> its physical address and size in a crash note, so that the crash kernel >> environment grab it all. > Why is the console ring *that* important? I would have thought > that proper register values and stack contents are much more > significant for analysis of a crash. > > Jan In combination with "console_to_ring" on the Xen command line, the dom0 serial console will be copied into xen console ring. In the case of a crash, it is highly likely that one or other of them have panic()'d into their respective console rings. It is not a foolproof solution, but in will work in the general case. The register contents of the pcpus which were running will be available in the PR_STATUS notes which are also deliberately allocated in low memory by this patch. To the best of my understanding; to get to the dom0 vcpu state, the crashkernel needs access to the domain structs and vcpu structs (which I believe are actually allocated below 4GiB) and the Xen page tables which dom0 uses (which inspecting CR3 from the crash notes is certainly not in lower memory). I suspect that there is also more which needs to be allocated in lower memory to get a full register dump, stack dump, stack trace etc. The plan is to also have the "all" option from the command line which will also allocate the page tables (and other structures where relevant) in lower memory, but this is rather more of an overhead than just the console ring and crash notes, which will have a more visible impact to customers running 32bit PV guests. This is the reason for separating the two via a command line argument. ~Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-31 0:11 ` Andrew Cooper @ 2011-12-31 7:38 ` Jan Beulich 2012-01-03 11:58 ` Andrew Cooper 0 siblings, 1 reply; 11+ messages in thread From: Jan Beulich @ 2011-12-31 7:38 UTC (permalink / raw) To: andrew.cooper3; +Cc: xen-devel >>> Andrew Cooper <andrew.cooper3@citrix.com> 12/31/11 1:12 AM >>> >The register contents of the pcpus which were running will be available >in the PR_STATUS notes which are also deliberately allocated in low >memory by this patch. To the best of my understanding; to get to the >dom0 vcpu state, the crashkernel needs access to the domain structs and >vcpu structs (which I believe are actually allocated below 4GiB) and the The vCPU ones are, while the domain one isn't. >Xen page tables which dom0 uses (which inspecting CR3 from the crash >notes is certainly not in lower memory). I suspect that there is also >more which needs to be allocated in lower memory to get a full register >dump, stack dump, stack trace etc. > >The plan is to also have the "all" option from the command line which >will also allocate the page tables (and other structures where relevant) >in lower memory, but this is rather more of an overhead than just the >console ring and crash notes, which will have a more visible impact to >customers running 32bit PV guests. This is the reason for separating >the two via a command line argument. I'm wondering whether, with much less impact on existing code, and as already pointed out, restricting the allocation range of alloc_xenheap_pages() (by way of a command line option) wouldn't get you what you want. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-31 7:38 ` Jan Beulich @ 2012-01-03 11:58 ` Andrew Cooper 0 siblings, 0 replies; 11+ messages in thread From: Andrew Cooper @ 2012-01-03 11:58 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On 31/12/11 07:38, Jan Beulich wrote: >>>> Andrew Cooper <andrew.cooper3@citrix.com> 12/31/11 1:12 AM >>> >> The register contents of the pcpus which were running will be available >> in the PR_STATUS notes which are also deliberately allocated in low >> memory by this patch. To the best of my understanding; to get to the >> dom0 vcpu state, the crashkernel needs access to the domain structs and >> vcpu structs (which I believe are actually allocated below 4GiB) and the > The vCPU ones are, while the domain one isn't. Right - that is useful to know. >> Xen page tables which dom0 uses (which inspecting CR3 from the crash >> notes is certainly not in lower memory). I suspect that there is also >> more which needs to be allocated in lower memory to get a full register >> dump, stack dump, stack trace etc. >> >> The plan is to also have the "all" option from the command line which >> will also allocate the page tables (and other structures where relevant) >> in lower memory, but this is rather more of an overhead than just the >> console ring and crash notes, which will have a more visible impact to >> customers running 32bit PV guests. This is the reason for separating >> the two via a command line argument. > I'm wondering whether, with much less impact on existing code, and as > already pointed out, restricting the allocation range of > alloc_xenheap_pages() (by way of a command line option) wouldn't get > you what you want. > > Jan Assuming that all relevant structures do in fact get allocated by alloc_xenheap_pages() (which is probably a safe assumption looking at the code, but I am not certain) then yes, it would have much less impact on the existing code. However, it would have a much bigger impact on the other memory restricted items. It might be possible (and perhaps a good idea when extreme debugging is needed) to have a 3rd command line option for low_crashinfo= which does this, but I dont like it as the general solution to this problem. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0 of 3] KEXEC fixes @ 2011-12-22 17:36 Andrew Cooper 2011-12-22 17:36 ` [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Andrew Cooper 0 siblings, 1 reply; 11+ messages in thread From: Andrew Cooper @ 2011-12-22 17:36 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper This set of 3 patches contain fixes and functional improvements to the kexec path in Xen. The first two have already been posted to the list individually but are presented here as a group. Patch 1 contains an implementation of a new hypercall which will not truncate 64bit pointers to 32bit when in compatability mode. Patch 2 changes the per cpu crash notes to be allocated on boot. Patch 3 implements logic to allow Xen to specifically allocate the crash notes and console buffer in lower memory because if 64/32bit issues on machines with large quantities of RAM. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-22 17:36 [PATCH 0 of 3] KEXEC fixes Andrew Cooper @ 2011-12-22 17:36 ` Andrew Cooper 2011-12-22 18:09 ` David Vrabel 2011-12-23 9:06 ` Jan Beulich 0 siblings, 2 replies; 11+ messages in thread From: Andrew Cooper @ 2011-12-22 17:36 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such as the CPU crash notes will go into the xenheap, which tends to be in upper memory. This causes problems on machines with more than 64GB (or 4GB if no PAE support) of ram as the crashkernel physically cant access the crash notes. The solution is to force Xen to allocate certain structures in lower memory. This is achieved by introducing two new command line parameters; low_crashinfo and crashinfo_maxaddr. Because of the potential impact on 32bit PV guests, and that this problem does not exist for 64bit dom0 on 64bit Xen, this new functionality defaults to the codebase's previous behavior, requiring the user to explicitly add extra command line parameters to change the behavior. This patch consists of 3 logically distinct but closely related changes. 1) Add the two new command line parameters. 2) Change crash note allocation to use lower memory when instructed. 3) Change the conring buffer to use lower memory when instructed. There result is that the crash notes and console ring will be placed in lower memory so useful information can be recovered in the case of a crash. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> diff -r 3da37c68284f -r 23c31d59ffb1 xen/arch/x86/setup.c --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -586,6 +586,10 @@ void __init __start_xen(unsigned long mb } cmdline_parse(cmdline); + /* Must be after command line argument parsing and before + * allocing any xenheap structures wanted in lower memory. */ + kexec_early_calculations(); + parse_video_info(); set_current((struct vcpu *)0xfffff000); /* debug sanity */ diff -r 3da37c68284f -r 23c31d59ffb1 xen/common/kexec.c --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -36,7 +36,9 @@ bool_t kexecing = FALSE; typedef struct { Elf_Note * start; size_t size; } crash_note_range_t; static crash_note_range_t * crash_notes; -/* Lock to prevent race conditions when allocating the crash note buffers. */ +/* Lock to prevent race conditions when allocating the crash note buffers. + * It also serves to protect calls to alloc_from_crash_heap when allocating + * crash note buffers in lower memory. */ static DEFINE_SPINLOCK(crash_notes_lock); static Elf_Note *xen_crash_note; @@ -70,6 +72,23 @@ static struct { (_d_)->start = (_s_)->start; \ } while (0) +/* Low crashinfo mode. Start as INVALID so serveral codepaths can set up + * defaults without needing to know the state of the others. */ +enum low_crashinfo low_crashinfo_mode = LOW_CRASHINFO_INVALID; + +/* This value is only considered if low_crash_mode is set to MIN or ALL, so + * setting a default here is safe. Default to 4GB. This is because the current + * KEXEC_CMD_get_range compat hypercall trucates 64bit pointers to 32 bits. The + * typical usecase for crashinfo_maxaddr will be for 64bit Xen with 32bit dom0 + * and 32bit crash kernel. */ +static paddr_t crashinfo_maxaddr = 4ULL << 30; + +/* = log base 2 of crashinfo_maxaddr after checking for sanity. */ +paddr_t crashinfo_maxaddr_bits = 0; + +/* Pointers to keep track of the crash heap region. */ +static void *crash_heap_current = NULL, *crash_heap_end = NULL; + /* * Parse command lines in the format * @@ -148,6 +167,58 @@ static void __init parse_crashkernel(con } custom_param("crashkernel", parse_crashkernel); +/* Parse command lines in the format: + * + * low_crashinfo=[none,min,all] + * + * - none disables the low allocation of crash info. + * - min will allocate enough low information for the crash kernel to be able + * to extract the hypervisor and dom0 message ring buffers. + * - all will allocate additional structures such as domain and vcpu structs + * low so the crash kernel can perform an extended analysis of state. + */ +static void __init parse_low_crashinfo(const char * str) +{ + + if ( !strlen(str) ) + /* default to min if user just specifies "low_crashinfo" */ + low_crashinfo_mode = LOW_CRASHINFO_MIN; + else if ( !strcmp(str, "none" ) ) + low_crashinfo_mode = LOW_CRASHINFO_NONE; + else if ( !strcmp(str, "min" ) ) + low_crashinfo_mode = LOW_CRASHINFO_MIN; + else if ( !strcmp(str, "all" ) ) + low_crashinfo_mode = LOW_CRASHINFO_ALL; + else + { + printk("Unknown low_crashinfo parameter '%s'. Defaulting to min.\n", str); + low_crashinfo_mode = LOW_CRASHINFO_MIN; + } +} +custom_param("low_crashinfo", parse_low_crashinfo); + +/* Parse command lines in the format: + * + * crashinfo_maxaddr=<addr> + * + * <addr> will be rounded down to the nearest power of two. Defaults to 64G + */ +static void __init parse_crashinfo_maxaddr(const char * str) +{ + u64 addr; + + /* if low_crashinfo_mode is unset, default to min. */ + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) + low_crashinfo_mode = LOW_CRASHINFO_MIN; + + if ( (addr = parse_size_and_unit(str, NULL)) ) + crashinfo_maxaddr = addr; + else + printk("Unable to parse crashinfo_maxaddr. Defaulting to %p\n", + (void*)crashinfo_maxaddr); +} +custom_param("crashinfo_maxaddr", parse_crashinfo_maxaddr); + void __init set_kexec_crash_area_size(u64 system_ram) { unsigned int idx; @@ -306,6 +377,20 @@ static size_t sizeof_cpu_notes(const uns return bytes; } +/* Allocate size_t bytes of space from the previously allocated + * crash heap if the user has requested that crash notes be allocated + * in lower memory. There is currently no case where the crash notes + * should be free()'d. */ +static void * alloc_from_crash_heap(const size_t bytes) +{ + void * ret; + if ( crash_heap_current + bytes > crash_heap_end ) + return NULL; + ret = (void*)crash_heap_current; + crash_heap_current += bytes; + return ret; +} + /* Allocate a crash note buffer for a newly onlined cpu. */ static int kexec_init_cpu_notes(const unsigned long cpu) { @@ -320,7 +405,10 @@ static int kexec_init_cpu_notes(const un return ret; nr_bytes = sizeof_cpu_notes(cpu); - note = xmalloc_bytes(nr_bytes); + + /* If we dont care about the position of allocation, malloc. */ + if ( low_crashinfo_mode == LOW_CRASHINFO_NONE ) + note = xmalloc_bytes(nr_bytes); /* Protect the write into crash_notes[] with a spinlock, as this function * is on a hotplug path and a hypercall path. */ @@ -338,6 +426,11 @@ static int kexec_init_cpu_notes(const un } else { + /* If we care about memory possition, alloc from the crash heap, + * also protected by the crash_notes_lock. */ + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) + note = alloc_from_crash_heap(nr_bytes); + crash_notes[cpu].start = note; crash_notes[cpu].size = nr_bytes; spin_unlock(&crash_notes_lock); @@ -397,6 +490,24 @@ static struct notifier_block cpu_nfb = { .notifier_call = cpu_callback }; +void __init kexec_early_calculations(void) +{ + /* If low_crashinfo_mode is still INVALID, neither "low_crashinfo" nor + * "crashinfo_maxaddr" have been specified on the command line, so + * explicitly set to NONE. */ + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) + low_crashinfo_mode = LOW_CRASHINFO_NONE; + + crashinfo_maxaddr_bits = 0; + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) + { + paddr_t tmp = crashinfo_maxaddr; + + while ( tmp >>= 1 ) + crashinfo_maxaddr_bits++; + } +} + static int __init kexec_init(void) { void *cpu = (void *)(unsigned long)smp_processor_id(); @@ -407,6 +518,29 @@ static int __init kexec_init(void) register_keyhandler('C', &crashdump_trigger_keyhandler); + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) + { + size_t crash_heap_size; + + /* This calculation is safe even if the machine is booted in + * uniprocessor mode. */ + crash_heap_size = sizeof_cpu_notes(0) + + sizeof_cpu_notes(1) * (nr_cpu_ids - 1); + crash_heap_size = PAGE_ALIGN(crash_heap_size); + + crash_heap_current = alloc_xenheap_pages( + get_order_from_bytes(crash_heap_size), + MEMF_bits(crashinfo_maxaddr_bits) ); + + if ( crash_heap_current ) + crash_heap_end = crash_heap_current + crash_heap_size; + else + return -ENOMEM; + } + + /* crash_notes may be allocated anywhere Xen can reach in memory. + Only the individual CPU crash notes themselves must be allocated + in lower memory if requested. */ crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids); if ( ! crash_notes ) return -ENOMEM; diff -r 3da37c68284f -r 23c31d59ffb1 xen/drivers/char/console.c --- a/xen/drivers/char/console.c +++ b/xen/drivers/char/console.c @@ -584,6 +584,7 @@ void __init console_init_postirq(void) { char *ring; unsigned int i, order; + u64 memflags; serial_init_postirq(); @@ -591,7 +592,8 @@ void __init console_init_postirq(void) opt_conring_size = num_present_cpus() << (9 + xenlog_lower_thresh); order = get_order_from_bytes(max(opt_conring_size, conring_size)); - while ( (ring = alloc_xenheap_pages(order, 0)) == NULL ) + memflags = low_crashinfo_mode > LOW_CRASHINFO_NONE ? MEMF_bits(crashinfo_maxaddr_bits) : 0; + while ( (ring = alloc_xenheap_pages(order, memflags)) == NULL ) { BUG_ON(order == 0); order--; diff -r 3da37c68284f -r 23c31d59ffb1 xen/include/xen/kexec.h --- a/xen/include/xen/kexec.h +++ b/xen/include/xen/kexec.h @@ -25,6 +25,19 @@ void set_kexec_crash_area_size(u64 syste #define KEXEC_IMAGE_CRASH_BASE 2 #define KEXEC_IMAGE_NR 4 +enum low_crashinfo { + LOW_CRASHINFO_INVALID = 0, + LOW_CRASHINFO_NONE = 1, + LOW_CRASHINFO_MIN = 2, + LOW_CRASHINFO_ALL = 3 +}; + +/* Low crashinfo mode. Start as INVALID so serveral codepaths can set up + * defaults without needing to know the state of the others. */ +extern enum low_crashinfo low_crashinfo_mode; +extern paddr_t crashinfo_maxaddr_bits; +void kexec_early_calculations(void); + int machine_kexec_load(int type, int slot, xen_kexec_image_t *image); void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image); void machine_kexec_reserved(xen_kexec_reserve_t *reservation); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-22 17:36 ` [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Andrew Cooper @ 2011-12-22 18:09 ` David Vrabel 2011-12-22 18:44 ` Andrew Cooper 2011-12-23 9:06 ` Jan Beulich 1 sibling, 1 reply; 11+ messages in thread From: David Vrabel @ 2011-12-22 18:09 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel On 22/12/11 17:36, Andrew Cooper wrote: > On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such > as the CPU crash notes will go into the xenheap, which tends to be in > upper memory. This causes problems on machines with more than 64GB > (or 4GB if no PAE support) of ram as the crashkernel physically cant > access the crash notes. I've never been entirely convinced that this was the correct approach. It seems that using a 64-bit crash kernel would be an easier solution. > @@ -320,7 +405,10 @@ static int kexec_init_cpu_notes(const un > return ret; > > nr_bytes = sizeof_cpu_notes(cpu); > - note = xmalloc_bytes(nr_bytes); > + > + /* If we dont care about the position of allocation, malloc. */ > + if ( low_crashinfo_mode == LOW_CRASHINFO_NONE ) > + note = xmalloc_bytes(nr_bytes); Suggest refactoring this (and the other similar places) so that the check for the regular/crash head occurs in one wrapper alloc function. > /* Protect the write into crash_notes[] with a spinlock, as this function > * is on a hotplug path and a hypercall path. */ > @@ -338,6 +426,11 @@ static int kexec_init_cpu_notes(const un > } > else > { > + /* If we care about memory possition, alloc from the crash heap, > + * also protected by the crash_notes_lock. */ > + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) > + note = alloc_from_crash_heap(nr_bytes); > + > crash_notes[cpu].start = note; > crash_notes[cpu].size = nr_bytes; > spin_unlock(&crash_notes_lock); > @@ -397,6 +490,24 @@ static struct notifier_block cpu_nfb = { > .notifier_call = cpu_callback > }; > > +void __init kexec_early_calculations(void) > +{ > + /* If low_crashinfo_mode is still INVALID, neither "low_crashinfo" nor > + * "crashinfo_maxaddr" have been specified on the command line, so > + * explicitly set to NONE. */ > + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) > + low_crashinfo_mode = LOW_CRASHINFO_NONE; Why not initialize low_crash_info_mode to NONE? > + > + crashinfo_maxaddr_bits = 0; > + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) > + { > + paddr_t tmp = crashinfo_maxaddr; > + > + while ( tmp >>= 1 ) > + crashinfo_maxaddr_bits++; > + } > +} Do these during the parsing of the command line? These will allow you to remove this unhelpfully named function. > + > static int __init kexec_init(void) > { > void *cpu = (void *)(unsigned long)smp_processor_id(); > @@ -407,6 +518,29 @@ static int __init kexec_init(void) > > register_keyhandler('C', &crashdump_trigger_keyhandler); > > + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) > + { > + size_t crash_heap_size; > + > + /* This calculation is safe even if the machine is booted in > + * uniprocessor mode. */ > + crash_heap_size = sizeof_cpu_notes(0) + > + sizeof_cpu_notes(1) * (nr_cpu_ids - 1); > + crash_heap_size = PAGE_ALIGN(crash_heap_size); > + > + crash_heap_current = alloc_xenheap_pages( > + get_order_from_bytes(crash_heap_size), > + MEMF_bits(crashinfo_maxaddr_bits) ); > + > + if ( crash_heap_current ) > + crash_heap_end = crash_heap_current + crash_heap_size; > + else > + return -ENOMEM; > + } Suggest moving this into a crash_heap_setup() function. > + > + /* crash_notes may be allocated anywhere Xen can reach in memory. > + Only the individual CPU crash notes themselves must be allocated > + in lower memory if requested. */ > crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids); > if ( ! crash_notes ) > return -ENOMEM; > diff -r 3da37c68284f -r 23c31d59ffb1 xen/drivers/char/console.c > --- a/xen/drivers/char/console.c > +++ b/xen/drivers/char/console.c > @@ -584,6 +584,7 @@ void __init console_init_postirq(void) > { > char *ring; > unsigned int i, order; > + u64 memflags; > > serial_init_postirq(); > > @@ -591,7 +592,8 @@ void __init console_init_postirq(void) > opt_conring_size = num_present_cpus() << (9 + xenlog_lower_thresh); > > order = get_order_from_bytes(max(opt_conring_size, conring_size)); > - while ( (ring = alloc_xenheap_pages(order, 0)) == NULL ) > + memflags = low_crashinfo_mode > LOW_CRASHINFO_NONE ? MEMF_bits(crashinfo_maxaddr_bits) : 0; If you set crashinfo_maxaddr_bits to 64 if low_crashinfo_mode isn't used you wouldn't need this test. David ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-22 18:09 ` David Vrabel @ 2011-12-22 18:44 ` Andrew Cooper 2011-12-23 9:09 ` Jan Beulich 0 siblings, 1 reply; 11+ messages in thread From: Andrew Cooper @ 2011-12-22 18:44 UTC (permalink / raw) To: David Vrabel; +Cc: xen-devel, Tim Deegan On 22/12/11 18:09, David Vrabel wrote: > On 22/12/11 17:36, Andrew Cooper wrote: >> On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such >> as the CPU crash notes will go into the xenheap, which tends to be in >> upper memory. This causes problems on machines with more than 64GB >> (or 4GB if no PAE support) of ram as the crashkernel physically cant >> access the crash notes. > I've never been entirely convinced that this was the correct approach. > It seems that using a 64-bit crash kernel would be an easier solution. In an ideal world yes, but reality is that XS will stay with a 32bit kernel for a while yet. >> @@ -320,7 +405,10 @@ static int kexec_init_cpu_notes(const un >> return ret; >> >> nr_bytes = sizeof_cpu_notes(cpu); >> - note = xmalloc_bytes(nr_bytes); >> + >> + /* If we dont care about the position of allocation, malloc. */ >> + if ( low_crashinfo_mode == LOW_CRASHINFO_NONE ) >> + note = xmalloc_bytes(nr_bytes); > Suggest refactoring this (and the other similar places) so that the > check for the regular/crash head occurs in one wrapper alloc function. Can't refactor this as its other half is specifically inside the lock while this is outside. >> /* Protect the write into crash_notes[] with a spinlock, as this function >> * is on a hotplug path and a hypercall path. */ >> @@ -338,6 +426,11 @@ static int kexec_init_cpu_notes(const un >> } >> else >> { >> + /* If we care about memory possition, alloc from the crash heap, >> + * also protected by the crash_notes_lock. */ >> + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) >> + note = alloc_from_crash_heap(nr_bytes); >> + >> crash_notes[cpu].start = note; >> crash_notes[cpu].size = nr_bytes; >> spin_unlock(&crash_notes_lock); >> @@ -397,6 +490,24 @@ static struct notifier_block cpu_nfb = { >> .notifier_call = cpu_callback >> }; >> >> +void __init kexec_early_calculations(void) >> +{ >> + /* If low_crashinfo_mode is still INVALID, neither "low_crashinfo" nor >> + * "crashinfo_maxaddr" have been specified on the command line, so >> + * explicitly set to NONE. */ >> + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) >> + low_crashinfo_mode = LOW_CRASHINFO_NONE; > Why not initialize low_crash_info_mode to NONE? So there is not a dependency between the order of low_crashinfo and crashinfo_maxaddr, while still allowing each option to set a sensible default for the other if only a single one is specified on the command line. >> + >> + crashinfo_maxaddr_bits = 0; >> + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) >> + { >> + paddr_t tmp = crashinfo_maxaddr; >> + >> + while ( tmp >>= 1 ) >> + crashinfo_maxaddr_bits++; >> + } >> +} > Do these during the parsing of the command line? Argument along the same lines as above w.r.t two command line arguments. crashinfo_maxaddr_bits is always used. Therefore it must be 0 in the case where no limits are applied, meaning that setting it to a default of lg(4GB) will break things. Setting it in the parsing function would prevent a sensible default being set if the user only specified low_crashinfo without making a combinational explosion of logic in the parsing function. > These will allow you to remove this unhelpfully named function. > >> + >> static int __init kexec_init(void) >> { >> void *cpu = (void *)(unsigned long)smp_processor_id(); >> @@ -407,6 +518,29 @@ static int __init kexec_init(void) >> >> register_keyhandler('C', &crashdump_trigger_keyhandler); >> >> + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) >> + { >> + size_t crash_heap_size; >> + >> + /* This calculation is safe even if the machine is booted in >> + * uniprocessor mode. */ >> + crash_heap_size = sizeof_cpu_notes(0) + >> + sizeof_cpu_notes(1) * (nr_cpu_ids - 1); >> + crash_heap_size = PAGE_ALIGN(crash_heap_size); >> + >> + crash_heap_current = alloc_xenheap_pages( >> + get_order_from_bytes(crash_heap_size), >> + MEMF_bits(crashinfo_maxaddr_bits) ); >> + >> + if ( crash_heap_current ) >> + crash_heap_end = crash_heap_current + crash_heap_size; >> + else >> + return -ENOMEM; >> + } > Suggest moving this into a crash_heap_setup() function. Questionable. I would argue that it is trading an extra indirection in the source code for a gain of only a few lines, which will be inlined back to here by the compiler. >> + >> + /* crash_notes may be allocated anywhere Xen can reach in memory. >> + Only the individual CPU crash notes themselves must be allocated >> + in lower memory if requested. */ >> crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids); >> if ( ! crash_notes ) >> return -ENOMEM; >> diff -r 3da37c68284f -r 23c31d59ffb1 xen/drivers/char/console.c >> --- a/xen/drivers/char/console.c >> +++ b/xen/drivers/char/console.c >> @@ -584,6 +584,7 @@ void __init console_init_postirq(void) >> { >> char *ring; >> unsigned int i, order; >> + u64 memflags; >> >> serial_init_postirq(); >> >> @@ -591,7 +592,8 @@ void __init console_init_postirq(void) >> opt_conring_size = num_present_cpus() << (9 + xenlog_lower_thresh); >> >> order = get_order_from_bytes(max(opt_conring_size, conring_size)); >> - while ( (ring = alloc_xenheap_pages(order, 0)) == NULL ) >> + memflags = low_crashinfo_mode > LOW_CRASHINFO_NONE ? MEMF_bits(crashinfo_maxaddr_bits) : 0; > If you set crashinfo_maxaddr_bits to 64 if low_crashinfo_mode isn't used > you wouldn't need this test. I am not familiar enough with the intricacies of alloc_xenheap_pages to know whether that is safe. cc'ing Tim for his expertise. > David -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-22 18:44 ` Andrew Cooper @ 2011-12-23 9:09 ` Jan Beulich 0 siblings, 0 replies; 11+ messages in thread From: Jan Beulich @ 2011-12-23 9:09 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel, Tim Deegan, David Vrabel >>> On 22.12.11 at 19:44, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 22/12/11 18:09, David Vrabel wrote: >> On 22/12/11 17:36, Andrew Cooper wrote: >>> + memflags = low_crashinfo_mode > LOW_CRASHINFO_NONE ? > MEMF_bits(crashinfo_maxaddr_bits) : 0; >> If you set crashinfo_maxaddr_bits to 64 if low_crashinfo_mode isn't used >> you wouldn't need this test. > > I am not familiar enough with the intricacies of alloc_xenheap_pages to > know whether that is safe. cc'ing Tim for his expertise. Passing 0 for the bit width is equivalent to passing a value beyond what is covering the maximum available address, so you can pick whatever better serves the purpose. Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-22 17:36 ` [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Andrew Cooper 2011-12-22 18:09 ` David Vrabel @ 2011-12-23 9:06 ` Jan Beulich 2011-12-23 11:59 ` Andrew Cooper 1 sibling, 1 reply; 11+ messages in thread From: Jan Beulich @ 2011-12-23 9:06 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 22.12.11 at 18:36, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such > as the CPU crash notes will go into the xenheap, which tends to be in > upper memory. This causes problems on machines with more than 64GB > (or 4GB if no PAE support) of ram as the crashkernel physically cant > access the crash notes. What use is a crash dump taken with a 32-bit kernel on a machine with more than 64G (or more than 4G is the crash kernel doesn't support PAE)? > The solution is to force Xen to allocate certain structures in lower > memory. This is achieved by introducing two new command line > parameters; low_crashinfo and crashinfo_maxaddr. Because of the > potential impact on 32bit PV guests, and that this problem does not > exist for 64bit dom0 on 64bit Xen, this new functionality defaults to > the codebase's previous behavior, requiring the user to explicitly > add extra command line parameters to change the behavior. > > This patch consists of 3 logically distinct but closely related > changes. > 1) Add the two new command line parameters. Do you really need both? Just being able to set the max address would seem sufficient... > 2) Change crash note allocation to use lower memory when instructed. > 3) Change the conring buffer to use lower memory when instructed. Why does this one need moving, but none of the other Xen data structures? If anything, I would have expected you to limit the Xen heap altogether, so that automatically all allocations get done in a way so that at least Xen's data structures are available in the (truncated) dump. > There result is that the crash notes and console ring will be placed > in lower memory so useful information can be recovered in the case of > a crash. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > > diff -r 3da37c68284f -r 23c31d59ffb1 xen/arch/x86/setup.c > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -586,6 +586,10 @@ void __init __start_xen(unsigned long mb > } > cmdline_parse(cmdline); > > + /* Must be after command line argument parsing and before > + * allocing any xenheap structures wanted in lower memory. */ > + kexec_early_calculations(); > + But does it really need to be *this* early? > parse_video_info(); > > set_current((struct vcpu *)0xfffff000); /* debug sanity */ > diff -r 3da37c68284f -r 23c31d59ffb1 xen/common/kexec.c > --- a/xen/common/kexec.c > +++ b/xen/common/kexec.c > @@ -36,7 +36,9 @@ bool_t kexecing = FALSE; > typedef struct { Elf_Note * start; size_t size; } crash_note_range_t; > static crash_note_range_t * crash_notes; > > -/* Lock to prevent race conditions when allocating the crash note buffers. > */ > +/* Lock to prevent race conditions when allocating the crash note buffers. > + * It also serves to protect calls to alloc_from_crash_heap when allocating > + * crash note buffers in lower memory. */ > static DEFINE_SPINLOCK(crash_notes_lock); > > static Elf_Note *xen_crash_note; > @@ -70,6 +72,23 @@ static struct { > (_d_)->start = (_s_)->start; \ > } while (0) > > +/* Low crashinfo mode. Start as INVALID so serveral codepaths can set up > + * defaults without needing to know the state of the others. */ > +enum low_crashinfo low_crashinfo_mode = LOW_CRASHINFO_INVALID; > + > +/* This value is only considered if low_crash_mode is set to MIN or ALL, so > + * setting a default here is safe. Default to 4GB. This is because the > current > + * KEXEC_CMD_get_range compat hypercall trucates 64bit pointers to 32 bits. > The > + * typical usecase for crashinfo_maxaddr will be for 64bit Xen with 32bit > dom0 > + * and 32bit crash kernel. */ > +static paddr_t crashinfo_maxaddr = 4ULL << 30; __initdata > + > +/* = log base 2 of crashinfo_maxaddr after checking for sanity. */ > +paddr_t crashinfo_maxaddr_bits = 0; > + > +/* Pointers to keep track of the crash heap region. */ > +static void *crash_heap_current = NULL, *crash_heap_end = NULL; > + > /* > * Parse command lines in the format > * > @@ -148,6 +167,58 @@ static void __init parse_crashkernel(con > } > custom_param("crashkernel", parse_crashkernel); > > +/* Parse command lines in the format: > + * > + * low_crashinfo=[none,min,all] > + * > + * - none disables the low allocation of crash info. > + * - min will allocate enough low information for the crash kernel to be > able > + * to extract the hypervisor and dom0 message ring buffers. > + * - all will allocate additional structures such as domain and vcpu structs > + * low so the crash kernel can perform an extended analysis of state. > + */ > +static void __init parse_low_crashinfo(const char * str) > +{ > + > + if ( !strlen(str) ) > + /* default to min if user just specifies "low_crashinfo" */ > + low_crashinfo_mode = LOW_CRASHINFO_MIN; > + else if ( !strcmp(str, "none" ) ) > + low_crashinfo_mode = LOW_CRASHINFO_NONE; > + else if ( !strcmp(str, "min" ) ) > + low_crashinfo_mode = LOW_CRASHINFO_MIN; > + else if ( !strcmp(str, "all" ) ) > + low_crashinfo_mode = LOW_CRASHINFO_ALL; > + else > + { > + printk("Unknown low_crashinfo parameter '%s'. Defaulting to > min.\n", str); > + low_crashinfo_mode = LOW_CRASHINFO_MIN; > + } > +} > +custom_param("low_crashinfo", parse_low_crashinfo); > + > +/* Parse command lines in the format: > + * > + * crashinfo_maxaddr=<addr> > + * > + * <addr> will be rounded down to the nearest power of two. Defaults to 64G > + */ > +static void __init parse_crashinfo_maxaddr(const char * str) > +{ > + u64 addr; > + > + /* if low_crashinfo_mode is unset, default to min. */ > + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) > + low_crashinfo_mode = LOW_CRASHINFO_MIN; > + > + if ( (addr = parse_size_and_unit(str, NULL)) ) > + crashinfo_maxaddr = addr; > + else > + printk("Unable to parse crashinfo_maxaddr. Defaulting to %p\n", > + (void*)crashinfo_maxaddr); > +} > +custom_param("crashinfo_maxaddr", parse_crashinfo_maxaddr); > + > void __init set_kexec_crash_area_size(u64 system_ram) > { > unsigned int idx; > @@ -306,6 +377,20 @@ static size_t sizeof_cpu_notes(const uns > return bytes; > } > > +/* Allocate size_t bytes of space from the previously allocated > + * crash heap if the user has requested that crash notes be allocated > + * in lower memory. There is currently no case where the crash notes > + * should be free()'d. */ > +static void * alloc_from_crash_heap(const size_t bytes) > +{ > + void * ret; > + if ( crash_heap_current + bytes > crash_heap_end ) > + return NULL; > + ret = (void*)crash_heap_current; > + crash_heap_current += bytes; > + return ret; > +} > + > /* Allocate a crash note buffer for a newly onlined cpu. */ > static int kexec_init_cpu_notes(const unsigned long cpu) > { > @@ -320,7 +405,10 @@ static int kexec_init_cpu_notes(const un > return ret; > > nr_bytes = sizeof_cpu_notes(cpu); > - note = xmalloc_bytes(nr_bytes); > + > + /* If we dont care about the position of allocation, malloc. */ > + if ( low_crashinfo_mode == LOW_CRASHINFO_NONE ) > + note = xmalloc_bytes(nr_bytes); > > /* Protect the write into crash_notes[] with a spinlock, as this > function > * is on a hotplug path and a hypercall path. */ > @@ -338,6 +426,11 @@ static int kexec_init_cpu_notes(const un > } > else > { > + /* If we care about memory possition, alloc from the crash heap, > + * also protected by the crash_notes_lock. */ > + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) > + note = alloc_from_crash_heap(nr_bytes); > + > crash_notes[cpu].start = note; > crash_notes[cpu].size = nr_bytes; > spin_unlock(&crash_notes_lock); > @@ -397,6 +490,24 @@ static struct notifier_block cpu_nfb = { > .notifier_call = cpu_callback > }; > > +void __init kexec_early_calculations(void) > +{ > + /* If low_crashinfo_mode is still INVALID, neither "low_crashinfo" nor > + * "crashinfo_maxaddr" have been specified on the command line, so > + * explicitly set to NONE. */ > + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) > + low_crashinfo_mode = LOW_CRASHINFO_NONE; > + > + crashinfo_maxaddr_bits = 0; > + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) > + { > + paddr_t tmp = crashinfo_maxaddr; > + > + while ( tmp >>= 1 ) > + crashinfo_maxaddr_bits++; fls() or some of its cousins > + } > +} > + > static int __init kexec_init(void) > { > void *cpu = (void *)(unsigned long)smp_processor_id(); > @@ -407,6 +518,29 @@ static int __init kexec_init(void) > > register_keyhandler('C', &crashdump_trigger_keyhandler); > > + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) > + { > + size_t crash_heap_size; > + > + /* This calculation is safe even if the machine is booted in > + * uniprocessor mode. */ > + crash_heap_size = sizeof_cpu_notes(0) + > + sizeof_cpu_notes(1) * (nr_cpu_ids - 1); > + crash_heap_size = PAGE_ALIGN(crash_heap_size); > + > + crash_heap_current = alloc_xenheap_pages( > + get_order_from_bytes(crash_heap_size), > + MEMF_bits(crashinfo_maxaddr_bits) ); > + > + if ( crash_heap_current ) > + crash_heap_end = crash_heap_current + crash_heap_size; > + else > + return -ENOMEM; Constructs like this generally look bogus to me, as if ( !crash_heap_current ) return -ENOMEM; ... is shorter and (to me at least) more clear. > + } > + > + /* crash_notes may be allocated anywhere Xen can reach in memory. > + Only the individual CPU crash notes themselves must be allocated > + in lower memory if requested. */ > crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids); > if ( ! crash_notes ) > return -ENOMEM; > diff -r 3da37c68284f -r 23c31d59ffb1 xen/drivers/char/console.c > --- a/xen/drivers/char/console.c > +++ b/xen/drivers/char/console.c > @@ -584,6 +584,7 @@ void __init console_init_postirq(void) > { > char *ring; > unsigned int i, order; > + u64 memflags; > > serial_init_postirq(); > > @@ -591,7 +592,8 @@ void __init console_init_postirq(void) > opt_conring_size = num_present_cpus() << (9 + xenlog_lower_thresh); > > order = get_order_from_bytes(max(opt_conring_size, conring_size)); > - while ( (ring = alloc_xenheap_pages(order, 0)) == NULL ) > + memflags = low_crashinfo_mode > LOW_CRASHINFO_NONE ? > MEMF_bits(crashinfo_maxaddr_bits) : 0; > + while ( (ring = alloc_xenheap_pages(order, memflags)) == NULL ) > { > BUG_ON(order == 0); > order--; > diff -r 3da37c68284f -r 23c31d59ffb1 xen/include/xen/kexec.h > --- a/xen/include/xen/kexec.h > +++ b/xen/include/xen/kexec.h > @@ -25,6 +25,19 @@ void set_kexec_crash_area_size(u64 syste > #define KEXEC_IMAGE_CRASH_BASE 2 > #define KEXEC_IMAGE_NR 4 > > +enum low_crashinfo { > + LOW_CRASHINFO_INVALID = 0, > + LOW_CRASHINFO_NONE = 1, > + LOW_CRASHINFO_MIN = 2, > + LOW_CRASHINFO_ALL = 3 > +}; > + > +/* Low crashinfo mode. Start as INVALID so serveral codepaths can set up > + * defaults without needing to know the state of the others. */ > +extern enum low_crashinfo low_crashinfo_mode; > +extern paddr_t crashinfo_maxaddr_bits; > +void kexec_early_calculations(void); > + > int machine_kexec_load(int type, int slot, xen_kexec_image_t *image); > void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image); > void machine_kexec_reserved(xen_kexec_reserve_t *reservation); > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory 2011-12-23 9:06 ` Jan Beulich @ 2011-12-23 11:59 ` Andrew Cooper 0 siblings, 0 replies; 11+ messages in thread From: Andrew Cooper @ 2011-12-23 11:59 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On 23/12/11 09:06, Jan Beulich wrote: >>>> On 22.12.11 at 18:36, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 64bit Xen with 32bit dom0 and crashkernel, xmalloc'ing items such >> as the CPU crash notes will go into the xenheap, which tends to be in >> upper memory. This causes problems on machines with more than 64GB >> (or 4GB if no PAE support) of ram as the crashkernel physically cant >> access the crash notes. > What use is a crash dump taken with a 32-bit kernel on a machine > with more than 64G (or more than 4G is the crash kernel doesn't > support PAE)? Very little use at all, which is the reason for this change. The correct solution is indeed to use a 64bit dom0 kernel, but while politics is preventing that from happening, another solution has to be found. I doubt that XenServer is the only product which would benifit, which is why the changes are presented for upstreaming. >> The solution is to force Xen to allocate certain structures in lower >> memory. This is achieved by introducing two new command line >> parameters; low_crashinfo and crashinfo_maxaddr. Because of the >> potential impact on 32bit PV guests, and that this problem does not >> exist for 64bit dom0 on 64bit Xen, this new functionality defaults to >> the codebase's previous behavior, requiring the user to explicitly >> add extra command line parameters to change the behavior. >> >> This patch consists of 3 logically distinct but closely related >> changes. >> 1) Add the two new command line parameters. > Do you really need both? Just being able to set the max address > would seem sufficient... The plan, as explained in the comments, is for two (or perhaps more) levels of low allocation. "min" means "allocate enough stuff low for the crash kernel to grab the Xen ring buffer" where "all" will include extra things such as the page tables so the crash kernel can generate full stack traces given the PT_STATUS notes. The differentiation is because "min" should not impact very much at all on other memory restricted things such as 32bit PV guests, whereas "all" certainly will. >> 2) Change crash note allocation to use lower memory when instructed. >> 3) Change the conring buffer to use lower memory when instructed. > Why does this one need moving, but none of the other Xen data > structures? If anything, I would have expected you to limit the Xen > heap altogether, so that automatically all allocations get done in a > way so that at least Xen's data structures are available in the > (truncated) dump. This is part of the "min" option which is trying to have the least possible impact. The idea is to have this in low memory, use the "console_to_ring" boot option to copy dom0 dmesg into conring, and pass its physical address and size in a crash note, so that the crash kernel environment grab it all. >> There result is that the crash notes and console ring will be placed >> in lower memory so useful information can be recovered in the case of >> a crash. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> >> diff -r 3da37c68284f -r 23c31d59ffb1 xen/arch/x86/setup.c >> --- a/xen/arch/x86/setup.c >> +++ b/xen/arch/x86/setup.c >> @@ -586,6 +586,10 @@ void __init __start_xen(unsigned long mb >> } >> cmdline_parse(cmdline); >> >> + /* Must be after command line argument parsing and before >> + * allocing any xenheap structures wanted in lower memory. */ >> + kexec_early_calculations(); >> + > But does it really need to be *this* early? Possibly not, but it needs to be before console_preirq_init() which is not much later. >> parse_video_info(); >> >> set_current((struct vcpu *)0xfffff000); /* debug sanity */ >> diff -r 3da37c68284f -r 23c31d59ffb1 xen/common/kexec.c >> --- a/xen/common/kexec.c >> +++ b/xen/common/kexec.c >> @@ -36,7 +36,9 @@ bool_t kexecing = FALSE; >> typedef struct { Elf_Note * start; size_t size; } crash_note_range_t; >> static crash_note_range_t * crash_notes; >> >> -/* Lock to prevent race conditions when allocating the crash note buffers. >> */ >> +/* Lock to prevent race conditions when allocating the crash note buffers. >> + * It also serves to protect calls to alloc_from_crash_heap when allocating >> + * crash note buffers in lower memory. */ >> static DEFINE_SPINLOCK(crash_notes_lock); >> >> static Elf_Note *xen_crash_note; >> @@ -70,6 +72,23 @@ static struct { >> (_d_)->start = (_s_)->start; \ >> } while (0) >> >> +/* Low crashinfo mode. Start as INVALID so serveral codepaths can set up >> + * defaults without needing to know the state of the others. */ >> +enum low_crashinfo low_crashinfo_mode = LOW_CRASHINFO_INVALID; >> + >> +/* This value is only considered if low_crash_mode is set to MIN or ALL, so >> + * setting a default here is safe. Default to 4GB. This is because the >> current >> + * KEXEC_CMD_get_range compat hypercall trucates 64bit pointers to 32 bits. >> The >> + * typical usecase for crashinfo_maxaddr will be for 64bit Xen with 32bit >> dom0 >> + * and 32bit crash kernel. */ >> +static paddr_t crashinfo_maxaddr = 4ULL << 30; > __initdata Good point >> + >> +/* = log base 2 of crashinfo_maxaddr after checking for sanity. */ >> +paddr_t crashinfo_maxaddr_bits = 0; >> + >> +/* Pointers to keep track of the crash heap region. */ >> +static void *crash_heap_current = NULL, *crash_heap_end = NULL; >> + >> /* >> * Parse command lines in the format >> * >> @@ -148,6 +167,58 @@ static void __init parse_crashkernel(con >> } >> custom_param("crashkernel", parse_crashkernel); >> >> +/* Parse command lines in the format: >> + * >> + * low_crashinfo=[none,min,all] >> + * >> + * - none disables the low allocation of crash info. >> + * - min will allocate enough low information for the crash kernel to be >> able >> + * to extract the hypervisor and dom0 message ring buffers. >> + * - all will allocate additional structures such as domain and vcpu structs >> + * low so the crash kernel can perform an extended analysis of state. >> + */ >> +static void __init parse_low_crashinfo(const char * str) >> +{ >> + >> + if ( !strlen(str) ) >> + /* default to min if user just specifies "low_crashinfo" */ >> + low_crashinfo_mode = LOW_CRASHINFO_MIN; >> + else if ( !strcmp(str, "none" ) ) >> + low_crashinfo_mode = LOW_CRASHINFO_NONE; >> + else if ( !strcmp(str, "min" ) ) >> + low_crashinfo_mode = LOW_CRASHINFO_MIN; >> + else if ( !strcmp(str, "all" ) ) >> + low_crashinfo_mode = LOW_CRASHINFO_ALL; >> + else >> + { >> + printk("Unknown low_crashinfo parameter '%s'. Defaulting to >> min.\n", str); >> + low_crashinfo_mode = LOW_CRASHINFO_MIN; >> + } >> +} >> +custom_param("low_crashinfo", parse_low_crashinfo); >> + >> +/* Parse command lines in the format: >> + * >> + * crashinfo_maxaddr=<addr> >> + * >> + * <addr> will be rounded down to the nearest power of two. Defaults to 64G >> + */ >> +static void __init parse_crashinfo_maxaddr(const char * str) >> +{ >> + u64 addr; >> + >> + /* if low_crashinfo_mode is unset, default to min. */ >> + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) >> + low_crashinfo_mode = LOW_CRASHINFO_MIN; >> + >> + if ( (addr = parse_size_and_unit(str, NULL)) ) >> + crashinfo_maxaddr = addr; >> + else >> + printk("Unable to parse crashinfo_maxaddr. Defaulting to %p\n", >> + (void*)crashinfo_maxaddr); >> +} >> +custom_param("crashinfo_maxaddr", parse_crashinfo_maxaddr); >> + >> void __init set_kexec_crash_area_size(u64 system_ram) >> { >> unsigned int idx; >> @@ -306,6 +377,20 @@ static size_t sizeof_cpu_notes(const uns >> return bytes; >> } >> >> +/* Allocate size_t bytes of space from the previously allocated >> + * crash heap if the user has requested that crash notes be allocated >> + * in lower memory. There is currently no case where the crash notes >> + * should be free()'d. */ >> +static void * alloc_from_crash_heap(const size_t bytes) >> +{ >> + void * ret; >> + if ( crash_heap_current + bytes > crash_heap_end ) >> + return NULL; >> + ret = (void*)crash_heap_current; >> + crash_heap_current += bytes; >> + return ret; >> +} >> + >> /* Allocate a crash note buffer for a newly onlined cpu. */ >> static int kexec_init_cpu_notes(const unsigned long cpu) >> { >> @@ -320,7 +405,10 @@ static int kexec_init_cpu_notes(const un >> return ret; >> >> nr_bytes = sizeof_cpu_notes(cpu); >> - note = xmalloc_bytes(nr_bytes); >> + >> + /* If we dont care about the position of allocation, malloc. */ >> + if ( low_crashinfo_mode == LOW_CRASHINFO_NONE ) >> + note = xmalloc_bytes(nr_bytes); >> >> /* Protect the write into crash_notes[] with a spinlock, as this >> function >> * is on a hotplug path and a hypercall path. */ >> @@ -338,6 +426,11 @@ static int kexec_init_cpu_notes(const un >> } >> else >> { >> + /* If we care about memory possition, alloc from the crash heap, >> + * also protected by the crash_notes_lock. */ >> + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) >> + note = alloc_from_crash_heap(nr_bytes); >> + >> crash_notes[cpu].start = note; >> crash_notes[cpu].size = nr_bytes; >> spin_unlock(&crash_notes_lock); >> @@ -397,6 +490,24 @@ static struct notifier_block cpu_nfb = { >> .notifier_call = cpu_callback >> }; >> >> +void __init kexec_early_calculations(void) >> +{ >> + /* If low_crashinfo_mode is still INVALID, neither "low_crashinfo" nor >> + * "crashinfo_maxaddr" have been specified on the command line, so >> + * explicitly set to NONE. */ >> + if ( low_crashinfo_mode == LOW_CRASHINFO_INVALID ) >> + low_crashinfo_mode = LOW_CRASHINFO_NONE; >> + >> + crashinfo_maxaddr_bits = 0; >> + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) >> + { >> + paddr_t tmp = crashinfo_maxaddr; >> + >> + while ( tmp >>= 1 ) >> + crashinfo_maxaddr_bits++; > fls() or some of its cousins Ok >> + } >> +} >> + >> static int __init kexec_init(void) >> { >> void *cpu = (void *)(unsigned long)smp_processor_id(); >> @@ -407,6 +518,29 @@ static int __init kexec_init(void) >> >> register_keyhandler('C', &crashdump_trigger_keyhandler); >> >> + if ( low_crashinfo_mode > LOW_CRASHINFO_NONE ) >> + { >> + size_t crash_heap_size; >> + >> + /* This calculation is safe even if the machine is booted in >> + * uniprocessor mode. */ >> + crash_heap_size = sizeof_cpu_notes(0) + >> + sizeof_cpu_notes(1) * (nr_cpu_ids - 1); >> + crash_heap_size = PAGE_ALIGN(crash_heap_size); >> + >> + crash_heap_current = alloc_xenheap_pages( >> + get_order_from_bytes(crash_heap_size), >> + MEMF_bits(crashinfo_maxaddr_bits) ); >> + >> + if ( crash_heap_current ) >> + crash_heap_end = crash_heap_current + crash_heap_size; >> + else >> + return -ENOMEM; > Constructs like this generally look bogus to me, as > > if ( !crash_heap_current ) > return -ENOMEM; > ... > > is shorter and (to me at least) more clear. True - for a long time I was debating whether to return ENOMEM at this point at not, so I clearly didn't consider clarity when deciding to add it in. I will fix this. >> + } >> + >> + /* crash_notes may be allocated anywhere Xen can reach in memory. >> + Only the individual CPU crash notes themselves must be allocated >> + in lower memory if requested. */ >> crash_notes = xmalloc_array(crash_note_range_t, nr_cpu_ids); >> if ( ! crash_notes ) >> return -ENOMEM; >> diff -r 3da37c68284f -r 23c31d59ffb1 xen/drivers/char/console.c >> --- a/xen/drivers/char/console.c >> +++ b/xen/drivers/char/console.c >> @@ -584,6 +584,7 @@ void __init console_init_postirq(void) >> { >> char *ring; >> unsigned int i, order; >> + u64 memflags; >> >> serial_init_postirq(); >> >> @@ -591,7 +592,8 @@ void __init console_init_postirq(void) >> opt_conring_size = num_present_cpus() << (9 + xenlog_lower_thresh); >> >> order = get_order_from_bytes(max(opt_conring_size, conring_size)); >> - while ( (ring = alloc_xenheap_pages(order, 0)) == NULL ) >> + memflags = low_crashinfo_mode > LOW_CRASHINFO_NONE ? >> MEMF_bits(crashinfo_maxaddr_bits) : 0; >> + while ( (ring = alloc_xenheap_pages(order, memflags)) == NULL ) >> { >> BUG_ON(order == 0); >> order--; >> diff -r 3da37c68284f -r 23c31d59ffb1 xen/include/xen/kexec.h >> --- a/xen/include/xen/kexec.h >> +++ b/xen/include/xen/kexec.h >> @@ -25,6 +25,19 @@ void set_kexec_crash_area_size(u64 syste >> #define KEXEC_IMAGE_CRASH_BASE 2 >> #define KEXEC_IMAGE_NR 4 >> >> +enum low_crashinfo { >> + LOW_CRASHINFO_INVALID = 0, >> + LOW_CRASHINFO_NONE = 1, >> + LOW_CRASHINFO_MIN = 2, >> + LOW_CRASHINFO_ALL = 3 >> +}; >> + >> +/* Low crashinfo mode. Start as INVALID so serveral codepaths can set up >> + * defaults without needing to know the state of the others. */ >> +extern enum low_crashinfo low_crashinfo_mode; >> +extern paddr_t crashinfo_maxaddr_bits; >> +void kexec_early_calculations(void); >> + >> int machine_kexec_load(int type, int slot, xen_kexec_image_t *image); >> void machine_kexec_unload(int type, int slot, xen_kexec_image_t *image); >> void machine_kexec_reserved(xen_kexec_reserve_t *reservation); >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-01-03 11:58 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-12-29 15:51 [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Jan Beulich 2011-12-30 11:19 ` Tim Deegan 2011-12-31 0:11 ` Andrew Cooper 2011-12-31 7:38 ` Jan Beulich 2012-01-03 11:58 ` Andrew Cooper -- strict thread matches above, loose matches on Subject: below -- 2011-12-22 17:36 [PATCH 0 of 3] KEXEC fixes Andrew Cooper 2011-12-22 17:36 ` [PATCH 3 of 3] KEXEC: Allocate crash structures in low memory Andrew Cooper 2011-12-22 18:09 ` David Vrabel 2011-12-22 18:44 ` Andrew Cooper 2011-12-23 9:09 ` Jan Beulich 2011-12-23 9:06 ` Jan Beulich 2011-12-23 11:59 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.