All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-22 16:23 ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vivek Goyal,
	Haren Myneni, Eric Biederman, Neil Horman, Cong Wang, kexec

Patch applies to 2.6.34-rc5

On x86 platform, even if hardware is 64-bit capable, kernel starts
execution in 32-bit mode. When system is kdump-enabled, crashed kernel
switches to 32 bit mode and jumps into new kernel. This automatically
limits location of dump-capture kernel image and it's initrd by first
4Gb of memory. Switching to 32 bit mode is performed by purgatory
code, which has relocations of type R_X86_64_32S (32-bit signed), and
this cuts "good" address space for crash kernel down to 2 Gb. I/O
regions may cut down this space further.

When system has a lot of memory (hundreds of gigabytes), dump-capture
kernel also needs relatively a lot of memory to account old kernel's
pages. It may be impossible to reserve enough memory below 2 or even 4
Gb. Simplest solution is it break dump-capture kernel's reserved
memory region into two pieces: first (small) region for kernel and
initrd images may be easily placed in "good" address space in the
beginning of physical memory, and second region may be located
anywhere.

This serie of patches realizes this approach. It requires also changes
in kexec utility to make this feature work, but is
backward-compatible: old versions of kexec will work with new
kernel. I will post patch to kexec-tools upstream separately.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>

 Documentation/kdump/kdump.txt       |   40 ++++++++
 Documentation/kernel-parameters.txt |   19 +++-
 arch/x86/kernel/setup.c             |   56 +++++++----
 include/linux/kexec.h               |    6 +
 kernel/kexec.c                      |  182 ++++++++++++++++++++++++++---------
 5 files changed, 232 insertions(+), 71 deletions(-)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-22 16:23 ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Neil Horman, kexec, Haren Myneni, Ingo Molnar,
	Eric Biederman, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

Patch applies to 2.6.34-rc5

On x86 platform, even if hardware is 64-bit capable, kernel starts
execution in 32-bit mode. When system is kdump-enabled, crashed kernel
switches to 32 bit mode and jumps into new kernel. This automatically
limits location of dump-capture kernel image and it's initrd by first
4Gb of memory. Switching to 32 bit mode is performed by purgatory
code, which has relocations of type R_X86_64_32S (32-bit signed), and
this cuts "good" address space for crash kernel down to 2 Gb. I/O
regions may cut down this space further.

When system has a lot of memory (hundreds of gigabytes), dump-capture
kernel also needs relatively a lot of memory to account old kernel's
pages. It may be impossible to reserve enough memory below 2 or even 4
Gb. Simplest solution is it break dump-capture kernel's reserved
memory region into two pieces: first (small) region for kernel and
initrd images may be easily placed in "good" address space in the
beginning of physical memory, and second region may be located
anywhere.

This serie of patches realizes this approach. It requires also changes
in kexec utility to make this feature work, but is
backward-compatible: old versions of kexec will work with new
kernel. I will post patch to kexec-tools upstream separately.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>

 Documentation/kdump/kdump.txt       |   40 ++++++++
 Documentation/kernel-parameters.txt |   19 +++-
 arch/x86/kernel/setup.c             |   56 +++++++----
 include/linux/kexec.h               |    6 +
 kernel/kexec.c                      |  182 ++++++++++++++++++++++++++---------
 5 files changed, 232 insertions(+), 71 deletions(-)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] Introduce second memory resource for crash kernel
  2010-04-22 16:23 ` Vitaly Mayatskikh
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vivek Goyal,
	Haren Myneni, Eric Biederman, Neil Horman, Cong Wang, kexec

Currently crash kernel uses only one memory region (described by
struct resource). When this region gets enough large, there may appear
a problem to reside this region in a valid addresses range.

This patch introduces second memory region, which may be also used by
crash kernel. First region may be enough small to place only kernel
and initrd images at low addresses, and second region may be placed
almost anywhere.

Second memory resource has another name with aim not to confuse
existing userspace utilities, like kexec.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 include/linux/kexec.h |    1 +
 kernel/kexec.c        |   11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 03e8e8d..1a3b0a3 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -198,6 +198,7 @@ extern struct kimage *kexec_crash_image;
 /* Location of a reserved region to hold the crash kernel.
  */
 extern struct resource crashk_res;
+extern struct resource crashk_res_hi;
 typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4];
 extern note_buf_t __percpu *crash_notes;
 extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 87ebe8a..1bd0199 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -49,7 +49,7 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
 size_t vmcoreinfo_size;
 size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
 
-/* Location of the reserved area for the crash kernel */
+/* Location of the reserved area for the crash kernel in low memory */
 struct resource crashk_res = {
 	.name  = "Crash kernel",
 	.start = 0,
@@ -57,6 +57,14 @@ struct resource crashk_res = {
 	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
 
+/* Location of the reserved area for the crash kernel in high memory */
+struct resource crashk_res_hi = {
+	.name  = "Crash high memory",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
+
 int kexec_should_crash(struct task_struct *p)
 {
 	if (in_interrupt() || !p->pid || is_global_init(p) || panic_on_oops)
@@ -1092,6 +1100,7 @@ size_t crash_get_memory_size(void)
 	size_t size;
 	mutex_lock(&kexec_mutex);
 	size = crashk_res.end - crashk_res.start + 1;
+	size += crashk_res_hi.end - crashk_res_hi.start + 1;
 	mutex_unlock(&kexec_mutex);
 	return size;
 }
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/5] Introduce second memory resource for crash kernel
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Neil Horman, kexec, Haren Myneni, Ingo Molnar,
	Eric Biederman, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

Currently crash kernel uses only one memory region (described by
struct resource). When this region gets enough large, there may appear
a problem to reside this region in a valid addresses range.

This patch introduces second memory region, which may be also used by
crash kernel. First region may be enough small to place only kernel
and initrd images at low addresses, and second region may be placed
almost anywhere.

Second memory resource has another name with aim not to confuse
existing userspace utilities, like kexec.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 include/linux/kexec.h |    1 +
 kernel/kexec.c        |   11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 03e8e8d..1a3b0a3 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -198,6 +198,7 @@ extern struct kimage *kexec_crash_image;
 /* Location of a reserved region to hold the crash kernel.
  */
 extern struct resource crashk_res;
+extern struct resource crashk_res_hi;
 typedef u32 note_buf_t[KEXEC_NOTE_BYTES/4];
 extern note_buf_t __percpu *crash_notes;
 extern u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 87ebe8a..1bd0199 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -49,7 +49,7 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
 size_t vmcoreinfo_size;
 size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
 
-/* Location of the reserved area for the crash kernel */
+/* Location of the reserved area for the crash kernel in low memory */
 struct resource crashk_res = {
 	.name  = "Crash kernel",
 	.start = 0,
@@ -57,6 +57,14 @@ struct resource crashk_res = {
 	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
 
+/* Location of the reserved area for the crash kernel in high memory */
+struct resource crashk_res_hi = {
+	.name  = "Crash high memory",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
+};
+
 int kexec_should_crash(struct task_struct *p)
 {
 	if (in_interrupt() || !p->pid || is_global_init(p) || panic_on_oops)
@@ -1092,6 +1100,7 @@ size_t crash_get_memory_size(void)
 	size_t size;
 	mutex_lock(&kexec_mutex);
 	size = crashk_res.end - crashk_res.start + 1;
+	size += crashk_res_hi.end - crashk_res_hi.start + 1;
 	mutex_unlock(&kexec_mutex);
 	return size;
 }
-- 
1.7.0.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] Modify parse_crashkernel* for new syntax
  2010-04-22 16:23 ` Vitaly Mayatskikh
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vivek Goyal,
	Haren Myneni, Eric Biederman, Neil Horman, Cong Wang, kexec

crashkernel= syntax of kernel command line was extended to allow
reservation of two memory regions for dump-capture kernel.

Syntax for simple case was changed from

    crashkernel=size[@offset]

to

    crashkernel=<low>/<high>

Where <low> and <high> are memory regions for dump-capture kernel in
usual crashkernel format (size@offset).

Crashkernel syntax, involving conditional reservation based on memory
size, was changed from

    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]

to

    crashkernel=<range1>:<low_size1>[/<high_size1>]
                [,<range2>:<low_size2>[/high_size2],...]
                [@low_offset][/high_offset]

New syntax is backward compatible.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 include/linux/kexec.h |    5 ++
 kernel/kexec.c        |  116 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 93 insertions(+), 28 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1a3b0a3..d2063f8 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -207,6 +207,11 @@ extern size_t vmcoreinfo_max_size;
 
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int __init parse_crashkernel_ext(char *cmdline, unsigned long long system_ram,
+				 unsigned long long *crash_size,
+				 unsigned long long *crash_base,
+				 unsigned long long *crash_size_hi,
+				 unsigned long long *crash_base_hi);
 int crash_shrink_memory(unsigned long new_size);
 size_t crash_get_memory_size(void);
 
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 1bd0199..b8fd6eb 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1229,23 +1229,42 @@ module_init(crash_notes_memory_init)
  */
 
 
+static char * __init parse_crashkernel_region(char		*cmdline,
+					     unsigned long long	*crash_size,
+					     unsigned long long	*crash_base)
+{
+	char *cur = cmdline;
+
+	*crash_size = memparse(cmdline, &cur);
+	if (cmdline == cur) {
+		pr_warning("crashkernel: memory value expected\n");
+		return 0;
+	}
+
+	if (*cur == '@')
+		*crash_base = memparse(cur + 1, &cur);
+	return cur;
+}
+
 /*
  * This function parses command lines in the format
  *
- *   crashkernel=ramsize-range:size[,...][@offset]
+ *   crashkernel=ramsize-range:size[/size2][,...][@offset][/offset2]
  *
  * The function returns 0 on success and -EINVAL on failure.
  */
-static int __init parse_crashkernel_mem(char 			*cmdline,
+static int __init parse_crashkernel_mem(char			*cmdline,
 					unsigned long long	system_ram,
 					unsigned long long	*crash_size,
-					unsigned long long	*crash_base)
+					unsigned long long	*crash_base,
+					unsigned long long	*crash_size_hi,
+					unsigned long long	*crash_base_hi)
 {
 	char *cur = cmdline, *tmp;
 
 	/* for each entry of the comma-separated list */
 	do {
-		unsigned long long start, end = ULLONG_MAX, size;
+		unsigned long long start, end = ULLONG_MAX, size, size_hi;
 
 		/* get the start of the range */
 		start = memparse(cur, &tmp);
@@ -1287,6 +1306,17 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 			return -EINVAL;
 		}
 		cur = tmp;
+
+		if (*cur == '/') {
+			cur++;
+			size_hi = memparse(cur, &tmp);
+			if (cur == tmp) {
+				pr_warning("Memory value expected\n");
+				return -EINVAL;
+			}
+			cur = tmp;
+		}
+
 		if (size >= system_ram) {
 			pr_warning("crashkernel: invalid size\n");
 			return -EINVAL;
@@ -1295,6 +1325,8 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 		/* match ? */
 		if (system_ram >= start && system_ram < end) {
 			*crash_size = size;
+			if (crash_size_hi)
+				*crash_size_hi = size_hi;
 			break;
 		}
 	} while (*cur++ == ',');
@@ -1310,6 +1342,17 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 						"after '@'\n");
 				return -EINVAL;
 			}
+			cur = tmp;
+			if (*cur == '/') {
+				cur++;
+				if (crash_base_hi)
+					*crash_base_hi = memparse(cur, &tmp);
+				if (cur == tmp) {
+					pr_warning("Memory value expected "
+						   "after '@'\n");
+					return -EINVAL;
+				}
+			}
 		}
 	}
 
@@ -1319,43 +1362,46 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 /*
  * That function parses "simple" (old) crashkernel command lines like
  *
- * 	crashkernel=size[@offset]
+ *	crashkernel=size[@offset][/size_hi][@offset_hi]
  *
  * It returns 0 on success and -EINVAL on failure.
  */
-static int __init parse_crashkernel_simple(char 		*cmdline,
-					   unsigned long long 	*crash_size,
-					   unsigned long long 	*crash_base)
+static int __init parse_crashkernel_simple(char			*cmdline,
+					   unsigned long long	*crash_size,
+					   unsigned long long	*crash_base,
+					   unsigned long long	*crash_size_hi,
+					   unsigned long long	*crash_base_hi)
 {
-	char *cur = cmdline;
+	char *cur = parse_crashkernel_region(cmdline, crash_size, crash_base);
 
-	*crash_size = memparse(cmdline, &cur);
-	if (cmdline == cur) {
-		pr_warning("crashkernel: memory value expected\n");
+	if (!cur) {
 		return -EINVAL;
+	} else if (*cur == '/' && crash_size_hi && crash_base_hi) {
+		cur = parse_crashkernel_region(cur + 1, crash_size_hi,
+					       crash_base_hi);
+		if (!cur)
+			return -EINVAL;
 	}
-
-	if (*cur == '@')
-		*crash_base = memparse(cur+1, &cur);
-
 	return 0;
 }
 
-/*
- * That function is the entry point for command line parsing and should be
- * called from the arch-specific code.
- */
-int __init parse_crashkernel(char 		 *cmdline,
-			     unsigned long long system_ram,
-			     unsigned long long *crash_size,
-			     unsigned long long *crash_base)
+int __init parse_crashkernel_ext(char		 *cmdline,
+				 unsigned long long system_ram,
+				 unsigned long long *crash_size,
+				 unsigned long long *crash_base,
+				 unsigned long long *crash_size_hi,
+				 unsigned long long *crash_base_hi)
 {
-	char 	*p = cmdline, *ck_cmdline = NULL;
+	char	*p = cmdline, *ck_cmdline = NULL;
 	char	*first_colon, *first_space;
 
 	BUG_ON(!crash_size || !crash_base);
 	*crash_size = 0;
 	*crash_base = 0;
+	if (crash_size_hi)
+		*crash_size_hi = 0;
+	if (crash_base_hi)
+		*crash_base_hi = 0;
 
 	/* find crashkernel and use the last one if there are more */
 	p = strstr(p, "crashkernel=");
@@ -1377,15 +1423,29 @@ int __init parse_crashkernel(char 		 *cmdline,
 	first_space = strchr(ck_cmdline, ' ');
 	if (first_colon && (!first_space || first_colon < first_space))
 		return parse_crashkernel_mem(ck_cmdline, system_ram,
-				crash_size, crash_base);
+					     crash_size, crash_base,
+					     crash_size_hi, crash_base_hi);
 	else
 		return parse_crashkernel_simple(ck_cmdline, crash_size,
-				crash_base);
+						crash_base, crash_size_hi,
+						crash_base_hi);
 
 	return 0;
 }
 
-
+/*
+ * That function is the entry point for command line parsing and should be
+ * called from the arch-specific code.
+ */
+int __init parse_crashkernel(char		 *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return parse_crashkernel_ext(cmdline, system_ram,
+				     crash_size, crash_base,
+				     0, 0);
+}
 
 void crash_save_vmcoreinfo(void)
 {
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] Modify parse_crashkernel* for new syntax
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Neil Horman, kexec, Haren Myneni, Ingo Molnar,
	Eric Biederman, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

crashkernel= syntax of kernel command line was extended to allow
reservation of two memory regions for dump-capture kernel.

Syntax for simple case was changed from

    crashkernel=size[@offset]

to

    crashkernel=<low>/<high>

Where <low> and <high> are memory regions for dump-capture kernel in
usual crashkernel format (size@offset).

Crashkernel syntax, involving conditional reservation based on memory
size, was changed from

    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]

to

    crashkernel=<range1>:<low_size1>[/<high_size1>]
                [,<range2>:<low_size2>[/high_size2],...]
                [@low_offset][/high_offset]

New syntax is backward compatible.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 include/linux/kexec.h |    5 ++
 kernel/kexec.c        |  116 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 93 insertions(+), 28 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1a3b0a3..d2063f8 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -207,6 +207,11 @@ extern size_t vmcoreinfo_max_size;
 
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int __init parse_crashkernel_ext(char *cmdline, unsigned long long system_ram,
+				 unsigned long long *crash_size,
+				 unsigned long long *crash_base,
+				 unsigned long long *crash_size_hi,
+				 unsigned long long *crash_base_hi);
 int crash_shrink_memory(unsigned long new_size);
 size_t crash_get_memory_size(void);
 
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 1bd0199..b8fd6eb 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1229,23 +1229,42 @@ module_init(crash_notes_memory_init)
  */
 
 
+static char * __init parse_crashkernel_region(char		*cmdline,
+					     unsigned long long	*crash_size,
+					     unsigned long long	*crash_base)
+{
+	char *cur = cmdline;
+
+	*crash_size = memparse(cmdline, &cur);
+	if (cmdline == cur) {
+		pr_warning("crashkernel: memory value expected\n");
+		return 0;
+	}
+
+	if (*cur == '@')
+		*crash_base = memparse(cur + 1, &cur);
+	return cur;
+}
+
 /*
  * This function parses command lines in the format
  *
- *   crashkernel=ramsize-range:size[,...][@offset]
+ *   crashkernel=ramsize-range:size[/size2][,...][@offset][/offset2]
  *
  * The function returns 0 on success and -EINVAL on failure.
  */
-static int __init parse_crashkernel_mem(char 			*cmdline,
+static int __init parse_crashkernel_mem(char			*cmdline,
 					unsigned long long	system_ram,
 					unsigned long long	*crash_size,
-					unsigned long long	*crash_base)
+					unsigned long long	*crash_base,
+					unsigned long long	*crash_size_hi,
+					unsigned long long	*crash_base_hi)
 {
 	char *cur = cmdline, *tmp;
 
 	/* for each entry of the comma-separated list */
 	do {
-		unsigned long long start, end = ULLONG_MAX, size;
+		unsigned long long start, end = ULLONG_MAX, size, size_hi;
 
 		/* get the start of the range */
 		start = memparse(cur, &tmp);
@@ -1287,6 +1306,17 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 			return -EINVAL;
 		}
 		cur = tmp;
+
+		if (*cur == '/') {
+			cur++;
+			size_hi = memparse(cur, &tmp);
+			if (cur == tmp) {
+				pr_warning("Memory value expected\n");
+				return -EINVAL;
+			}
+			cur = tmp;
+		}
+
 		if (size >= system_ram) {
 			pr_warning("crashkernel: invalid size\n");
 			return -EINVAL;
@@ -1295,6 +1325,8 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 		/* match ? */
 		if (system_ram >= start && system_ram < end) {
 			*crash_size = size;
+			if (crash_size_hi)
+				*crash_size_hi = size_hi;
 			break;
 		}
 	} while (*cur++ == ',');
@@ -1310,6 +1342,17 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 						"after '@'\n");
 				return -EINVAL;
 			}
+			cur = tmp;
+			if (*cur == '/') {
+				cur++;
+				if (crash_base_hi)
+					*crash_base_hi = memparse(cur, &tmp);
+				if (cur == tmp) {
+					pr_warning("Memory value expected "
+						   "after '@'\n");
+					return -EINVAL;
+				}
+			}
 		}
 	}
 
@@ -1319,43 +1362,46 @@ static int __init parse_crashkernel_mem(char 			*cmdline,
 /*
  * That function parses "simple" (old) crashkernel command lines like
  *
- * 	crashkernel=size[@offset]
+ *	crashkernel=size[@offset][/size_hi][@offset_hi]
  *
  * It returns 0 on success and -EINVAL on failure.
  */
-static int __init parse_crashkernel_simple(char 		*cmdline,
-					   unsigned long long 	*crash_size,
-					   unsigned long long 	*crash_base)
+static int __init parse_crashkernel_simple(char			*cmdline,
+					   unsigned long long	*crash_size,
+					   unsigned long long	*crash_base,
+					   unsigned long long	*crash_size_hi,
+					   unsigned long long	*crash_base_hi)
 {
-	char *cur = cmdline;
+	char *cur = parse_crashkernel_region(cmdline, crash_size, crash_base);
 
-	*crash_size = memparse(cmdline, &cur);
-	if (cmdline == cur) {
-		pr_warning("crashkernel: memory value expected\n");
+	if (!cur) {
 		return -EINVAL;
+	} else if (*cur == '/' && crash_size_hi && crash_base_hi) {
+		cur = parse_crashkernel_region(cur + 1, crash_size_hi,
+					       crash_base_hi);
+		if (!cur)
+			return -EINVAL;
 	}
-
-	if (*cur == '@')
-		*crash_base = memparse(cur+1, &cur);
-
 	return 0;
 }
 
-/*
- * That function is the entry point for command line parsing and should be
- * called from the arch-specific code.
- */
-int __init parse_crashkernel(char 		 *cmdline,
-			     unsigned long long system_ram,
-			     unsigned long long *crash_size,
-			     unsigned long long *crash_base)
+int __init parse_crashkernel_ext(char		 *cmdline,
+				 unsigned long long system_ram,
+				 unsigned long long *crash_size,
+				 unsigned long long *crash_base,
+				 unsigned long long *crash_size_hi,
+				 unsigned long long *crash_base_hi)
 {
-	char 	*p = cmdline, *ck_cmdline = NULL;
+	char	*p = cmdline, *ck_cmdline = NULL;
 	char	*first_colon, *first_space;
 
 	BUG_ON(!crash_size || !crash_base);
 	*crash_size = 0;
 	*crash_base = 0;
+	if (crash_size_hi)
+		*crash_size_hi = 0;
+	if (crash_base_hi)
+		*crash_base_hi = 0;
 
 	/* find crashkernel and use the last one if there are more */
 	p = strstr(p, "crashkernel=");
@@ -1377,15 +1423,29 @@ int __init parse_crashkernel(char 		 *cmdline,
 	first_space = strchr(ck_cmdline, ' ');
 	if (first_colon && (!first_space || first_colon < first_space))
 		return parse_crashkernel_mem(ck_cmdline, system_ram,
-				crash_size, crash_base);
+					     crash_size, crash_base,
+					     crash_size_hi, crash_base_hi);
 	else
 		return parse_crashkernel_simple(ck_cmdline, crash_size,
-				crash_base);
+						crash_base, crash_size_hi,
+						crash_base_hi);
 
 	return 0;
 }
 
-
+/*
+ * That function is the entry point for command line parsing and should be
+ * called from the arch-specific code.
+ */
+int __init parse_crashkernel(char		 *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return parse_crashkernel_ext(cmdline, system_ram,
+				     crash_size, crash_base,
+				     0, 0);
+}
 
 void crash_save_vmcoreinfo(void)
 {
-- 
1.7.0.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] Support second memory region in crash_shrink_memory()
  2010-04-22 16:23 ` Vitaly Mayatskikh
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vivek Goyal,
	Haren Myneni, Eric Biederman, Neil Horman, Cong Wang, kexec

This patch changes crash_shrink_memory() to work with previosly added
memory region also. When shrink occurs, second region is shrunk first.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 kernel/kexec.c |   55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index b8fd6eb..dfaa01e 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1117,10 +1117,36 @@ static void free_reserved_phys_range(unsigned long begin, unsigned long end)
 	}
 }
 
+int crash_shrink_region(struct resource *crashk, unsigned long new_size)
+{
+	unsigned long start, end, size;
+
+	start = crashk->start;
+	end = crashk->end;
+	size = end - start + 1;
+
+	if (!size || new_size == size) /* Nothing to free */
+		return 0;
+
+	if (new_size > size)
+		return -EINVAL;
+
+	start = roundup(start, PAGE_SIZE);
+	end = roundup(start + new_size, PAGE_SIZE);
+
+	free_reserved_phys_range(end, crashk->end);
+
+	if (start == end)
+		release_resource(crashk);
+	crashk->end = end - 1;
+
+	return 0;
+}
+
 int crash_shrink_memory(unsigned long new_size)
 {
 	int ret = 0;
-	unsigned long start, end;
+	unsigned long crash_size, low_size;
 
 	mutex_lock(&kexec_mutex);
 
@@ -1128,26 +1154,25 @@ int crash_shrink_memory(unsigned long new_size)
 		ret = -ENOENT;
 		goto unlock;
 	}
-	start = crashk_res.start;
-	end = crashk_res.end;
 
-	if (new_size >= end - start + 1) {
+	crash_size = low_size = crashk_res.end - crashk_res.start + 1;
+	crash_size += crashk_res_hi.end - crashk_res_hi.start + 1;
+
+	if (crash_size == new_size)
+		goto unlock;
+	if (crash_size < new_size) {
 		ret = -EINVAL;
-		if (new_size == end - start + 1)
-			ret = 0;
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
-
-	free_reserved_phys_range(end, crashk_res.end);
-
-	if (start == end) {
-		crashk_res.end = end;
-		release_resource(&crashk_res);
+	if (new_size < low_size) {
+		/* Reap crashk_res_hi */
+		ret = crash_shrink_region(&crashk_res_hi, 0);
+		if (ret)
+			goto unlock;
+		ret = crash_shrink_region(&crashk_res, new_size);
 	} else
-		crashk_res.end = end - 1;
+		ret = crash_shrink_region(&crashk_res_hi, new_size - low_size);
 
 unlock:
 	mutex_unlock(&kexec_mutex);
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] Support second memory region in crash_shrink_memory()
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Neil Horman, kexec, Haren Myneni, Ingo Molnar,
	Eric Biederman, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

This patch changes crash_shrink_memory() to work with previosly added
memory region also. When shrink occurs, second region is shrunk first.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 kernel/kexec.c |   55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index b8fd6eb..dfaa01e 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1117,10 +1117,36 @@ static void free_reserved_phys_range(unsigned long begin, unsigned long end)
 	}
 }
 
+int crash_shrink_region(struct resource *crashk, unsigned long new_size)
+{
+	unsigned long start, end, size;
+
+	start = crashk->start;
+	end = crashk->end;
+	size = end - start + 1;
+
+	if (!size || new_size == size) /* Nothing to free */
+		return 0;
+
+	if (new_size > size)
+		return -EINVAL;
+
+	start = roundup(start, PAGE_SIZE);
+	end = roundup(start + new_size, PAGE_SIZE);
+
+	free_reserved_phys_range(end, crashk->end);
+
+	if (start == end)
+		release_resource(crashk);
+	crashk->end = end - 1;
+
+	return 0;
+}
+
 int crash_shrink_memory(unsigned long new_size)
 {
 	int ret = 0;
-	unsigned long start, end;
+	unsigned long crash_size, low_size;
 
 	mutex_lock(&kexec_mutex);
 
@@ -1128,26 +1154,25 @@ int crash_shrink_memory(unsigned long new_size)
 		ret = -ENOENT;
 		goto unlock;
 	}
-	start = crashk_res.start;
-	end = crashk_res.end;
 
-	if (new_size >= end - start + 1) {
+	crash_size = low_size = crashk_res.end - crashk_res.start + 1;
+	crash_size += crashk_res_hi.end - crashk_res_hi.start + 1;
+
+	if (crash_size == new_size)
+		goto unlock;
+	if (crash_size < new_size) {
 		ret = -EINVAL;
-		if (new_size == end - start + 1)
-			ret = 0;
 		goto unlock;
 	}
 
-	start = roundup(start, PAGE_SIZE);
-	end = roundup(start + new_size, PAGE_SIZE);
-
-	free_reserved_phys_range(end, crashk_res.end);
-
-	if (start == end) {
-		crashk_res.end = end;
-		release_resource(&crashk_res);
+	if (new_size < low_size) {
+		/* Reap crashk_res_hi */
+		ret = crash_shrink_region(&crashk_res_hi, 0);
+		if (ret)
+			goto unlock;
+		ret = crash_shrink_region(&crashk_res, new_size);
 	} else
-		crashk_res.end = end - 1;
+		ret = crash_shrink_region(&crashk_res_hi, new_size - low_size);
 
 unlock:
 	mutex_unlock(&kexec_mutex);
-- 
1.7.0.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] x86: use second memory region for dump-capture kernel
  2010-04-22 16:23 ` Vitaly Mayatskikh
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vivek Goyal,
	Haren Myneni, Eric Biederman, Neil Horman, Cong Wang, kexec

This patch adds second memory region support for kexec on x86
platform.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 arch/x86/kernel/setup.c |   56 +++++++++++++++++++++++++++++-----------------
 1 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..9b395bb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -501,19 +501,11 @@ static inline unsigned long long get_total_mem(void)
 	return total << PAGE_SHIFT;
 }
 
-static void __init reserve_crashkernel(void)
+static int __init reserve_crashkernel_region(char *region_name,
+					     struct resource *crashk,
+					     unsigned long long crash_size,
+					     unsigned long long crash_base)
 {
-	unsigned long long total_mem;
-	unsigned long long crash_size, crash_base;
-	int ret;
-
-	total_mem = get_total_mem();
-
-	ret = parse_crashkernel(boot_command_line, total_mem,
-			&crash_size, &crash_base);
-	if (ret != 0 || crash_size <= 0)
-		return;
-
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
 		const unsigned long long alignment = 16<<20;	/* 16M */
@@ -522,7 +514,7 @@ static void __init reserve_crashkernel(void)
 				 alignment);
 		if (crash_base == -1ULL) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
-			return;
+			return -EINVAL;
 		}
 	} else {
 		unsigned long long start;
@@ -531,20 +523,42 @@ static void __init reserve_crashkernel(void)
 				 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
-			return;
+			return -EINVAL;
 		}
 	}
-	reserve_early(crash_base, crash_base + crash_size, "CRASH KERNEL");
+	reserve_early(crash_base, crash_base + crash_size, region_name);
 
 	printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
-			"for crashkernel (System RAM: %ldMB)\n",
+			"for crashkernel\n",
 			(unsigned long)(crash_size >> 20),
-			(unsigned long)(crash_base >> 20),
-			(unsigned long)(total_mem >> 20));
+			(unsigned long)(crash_base >> 20));
+
+	crashk->start = crash_base;
+	crashk->end   = crash_base + crash_size - 1;
+	insert_resource(&iomem_resource, crashk);
+	return 0;
+}
+
+static void __init reserve_crashkernel(void)
+{
+	unsigned long long total_mem;
+	unsigned long long crash_size, crash_base;
+	unsigned long long crash_size_hi, crash_base_hi;
+	int ret;
+
+	total_mem = get_total_mem();
+
+	ret = parse_crashkernel_ext(boot_command_line, total_mem,
+				    &crash_size, &crash_base,
+				    &crash_size_hi, &crash_base_hi);
+	if (ret != 0 || crash_size <= 0)
+		return;
 
-	crashk_res.start = crash_base;
-	crashk_res.end   = crash_base + crash_size - 1;
-	insert_resource(&iomem_resource, &crashk_res);
+	ret = reserve_crashkernel_region("CRASH KERNEL", &crashk_res,
+					 crash_size, crash_base);
+	if (ret == 0 && crash_size_hi > 0)
+		reserve_crashkernel_region("CRASH HIMEM", &crashk_res_hi,
+					   crash_size_hi, crash_base_hi);
 }
 #else
 static void __init reserve_crashkernel(void)
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] x86: use second memory region for dump-capture kernel
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Neil Horman, kexec, Haren Myneni, Ingo Molnar,
	Eric Biederman, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

This patch adds second memory region support for kexec on x86
platform.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 arch/x86/kernel/setup.c |   56 +++++++++++++++++++++++++++++-----------------
 1 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index c4851ef..9b395bb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -501,19 +501,11 @@ static inline unsigned long long get_total_mem(void)
 	return total << PAGE_SHIFT;
 }
 
-static void __init reserve_crashkernel(void)
+static int __init reserve_crashkernel_region(char *region_name,
+					     struct resource *crashk,
+					     unsigned long long crash_size,
+					     unsigned long long crash_base)
 {
-	unsigned long long total_mem;
-	unsigned long long crash_size, crash_base;
-	int ret;
-
-	total_mem = get_total_mem();
-
-	ret = parse_crashkernel(boot_command_line, total_mem,
-			&crash_size, &crash_base);
-	if (ret != 0 || crash_size <= 0)
-		return;
-
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
 		const unsigned long long alignment = 16<<20;	/* 16M */
@@ -522,7 +514,7 @@ static void __init reserve_crashkernel(void)
 				 alignment);
 		if (crash_base == -1ULL) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
-			return;
+			return -EINVAL;
 		}
 	} else {
 		unsigned long long start;
@@ -531,20 +523,42 @@ static void __init reserve_crashkernel(void)
 				 1<<20);
 		if (start != crash_base) {
 			pr_info("crashkernel reservation failed - memory is in use.\n");
-			return;
+			return -EINVAL;
 		}
 	}
-	reserve_early(crash_base, crash_base + crash_size, "CRASH KERNEL");
+	reserve_early(crash_base, crash_base + crash_size, region_name);
 
 	printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
-			"for crashkernel (System RAM: %ldMB)\n",
+			"for crashkernel\n",
 			(unsigned long)(crash_size >> 20),
-			(unsigned long)(crash_base >> 20),
-			(unsigned long)(total_mem >> 20));
+			(unsigned long)(crash_base >> 20));
+
+	crashk->start = crash_base;
+	crashk->end   = crash_base + crash_size - 1;
+	insert_resource(&iomem_resource, crashk);
+	return 0;
+}
+
+static void __init reserve_crashkernel(void)
+{
+	unsigned long long total_mem;
+	unsigned long long crash_size, crash_base;
+	unsigned long long crash_size_hi, crash_base_hi;
+	int ret;
+
+	total_mem = get_total_mem();
+
+	ret = parse_crashkernel_ext(boot_command_line, total_mem,
+				    &crash_size, &crash_base,
+				    &crash_size_hi, &crash_base_hi);
+	if (ret != 0 || crash_size <= 0)
+		return;
 
-	crashk_res.start = crash_base;
-	crashk_res.end   = crash_base + crash_size - 1;
-	insert_resource(&iomem_resource, &crashk_res);
+	ret = reserve_crashkernel_region("CRASH KERNEL", &crashk_res,
+					 crash_size, crash_base);
+	if (ret == 0 && crash_size_hi > 0)
+		reserve_crashkernel_region("CRASH HIMEM", &crashk_res_hi,
+					   crash_size_hi, crash_base_hi);
 }
 #else
 static void __init reserve_crashkernel(void)
-- 
1.7.0.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] kexec: update documentation
  2010-04-22 16:23 ` Vitaly Mayatskikh
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vivek Goyal,
	Haren Myneni, Eric Biederman, Neil Horman, Cong Wang, kexec

Mention new crashkernel= syntax in documentation.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 Documentation/kdump/kdump.txt       |   40 +++++++++++++++++++++++++++++++++++
 Documentation/kernel-parameters.txt |   19 +++++++++++-----
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index cab61d8..9f93d17 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -266,7 +266,47 @@ This would mean:
     2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
     3) if the RAM size is larger than 2G, then reserve 128M
 
+Avoiding memory reservation problem on large systems
+====================================================
 
+For large systems with huge amount of memory dump-capture kernel
+requires more memory to handle properly old kernel's pages. However,
+it raises issues with h/w-dependent limitations on some platforms. For
+example, on x86-64 system kernel and initrd still have to be placed in
+first 2 gigabytes, because kernel starts executing in 32-bit mode, and
+kdump purgatory code can jump only to 32-bit signed addresses. This
+limitation is a real problem in cases, when dump-capturing region is
+large and cannot fit in good area. For such cases it's possible to use
+special crashkernel syntax:
+
+    crashkernel=<low>/<high>
+
+<low> and <high> are memory regions for dump-capture kernel in usual
+crashkernel format (size@offset). For example:
+
+    crashkernel=64M/1G@4G
+
+This would mean to allocate 64M of memory at the lowest valid address
+and to allocate 1G at physical address 4G.
+
+New syntax for extended format (in case of memory dependent
+reservation):
+
+    crashkernel=<range1>:<low_size1>[/<high_size1>]
+                [,<range2>:<low_size2>[/high_size2],...]
+                [@low_offset][/high_offset]
+    range=start-[end]
+
+For example:
+
+    crashkernel=2G-32G:256M,32G-:256M/1G@0/8G
+
+This would mean:
+
+    1) if the RAM is smaller than 2G, then don't reserve anything
+    2) if the RAM size is between 2G and 32G (exclusive), then reserve 256M
+    3) if the RAM size is larger than 32G, then reserve 256M at first suitable
+    address (offset 0 means automatically) and reserve 1G at address 8G
 
 Boot into System Kernel
 =======================
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e2202e9..5e9f234 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -568,16 +568,23 @@ and is between 256 and 4096 characters. It is defined in the file
 			Format:
 			<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
 
-	crashkernel=nn[KMG]@ss[KMG]
-			[KNL] Reserve a chunk of physical memory to
-			hold a kernel to switch to with kexec on panic.
-
-	crashkernel=range1:size1[,range2:size2,...][@offset]
-			[KNL] Same as above, but depends on the memory
+	crashkernel=	[KNL]
+		nn[KMG]@ss[KMG]
+			Reserve a chunk of physical memory to hold a
+			kernel to switch to with kexec on panic.
+		nn1[KMG]@ss1[KMG]/nn2[KMG]@ss2[KMG]
+			Same as above, but reserve 2 chunks of
+			physical memory.
+
+	crashkernel=	[KNL]
+		range1:size1[,range2:size2,...][@offset]
+			Same as above, but depends on the memory
 			in the running system. The syntax of range is
 			start-[end] where start and end are both
 			a memory unit (amount[KMG]). See also
 			Documentation/kdump/kdump.txt for a example.
+		range1:size1lo/size1hi[,range2:size2lo/size2hi,...][@offset_lo][/offset_hi]
+			Same as above, but reserve 2 chunks of memory.
 
 	cs89x0_dma=	[HW,NET]
 			Format: <dma>
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] kexec: update documentation
@ 2010-04-22 16:23   ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-22 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Cong Wang, Neil Horman, kexec, Haren Myneni, Ingo Molnar,
	Eric Biederman, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

Mention new crashkernel= syntax in documentation.

Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
---
 Documentation/kdump/kdump.txt       |   40 +++++++++++++++++++++++++++++++++++
 Documentation/kernel-parameters.txt |   19 +++++++++++-----
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index cab61d8..9f93d17 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -266,7 +266,47 @@ This would mean:
     2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
     3) if the RAM size is larger than 2G, then reserve 128M
 
+Avoiding memory reservation problem on large systems
+====================================================
 
+For large systems with huge amount of memory dump-capture kernel
+requires more memory to handle properly old kernel's pages. However,
+it raises issues with h/w-dependent limitations on some platforms. For
+example, on x86-64 system kernel and initrd still have to be placed in
+first 2 gigabytes, because kernel starts executing in 32-bit mode, and
+kdump purgatory code can jump only to 32-bit signed addresses. This
+limitation is a real problem in cases, when dump-capturing region is
+large and cannot fit in good area. For such cases it's possible to use
+special crashkernel syntax:
+
+    crashkernel=<low>/<high>
+
+<low> and <high> are memory regions for dump-capture kernel in usual
+crashkernel format (size@offset). For example:
+
+    crashkernel=64M/1G@4G
+
+This would mean to allocate 64M of memory at the lowest valid address
+and to allocate 1G at physical address 4G.
+
+New syntax for extended format (in case of memory dependent
+reservation):
+
+    crashkernel=<range1>:<low_size1>[/<high_size1>]
+                [,<range2>:<low_size2>[/high_size2],...]
+                [@low_offset][/high_offset]
+    range=start-[end]
+
+For example:
+
+    crashkernel=2G-32G:256M,32G-:256M/1G@0/8G
+
+This would mean:
+
+    1) if the RAM is smaller than 2G, then don't reserve anything
+    2) if the RAM size is between 2G and 32G (exclusive), then reserve 256M
+    3) if the RAM size is larger than 32G, then reserve 256M at first suitable
+    address (offset 0 means automatically) and reserve 1G at address 8G
 
 Boot into System Kernel
 =======================
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e2202e9..5e9f234 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -568,16 +568,23 @@ and is between 256 and 4096 characters. It is defined in the file
 			Format:
 			<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
 
-	crashkernel=nn[KMG]@ss[KMG]
-			[KNL] Reserve a chunk of physical memory to
-			hold a kernel to switch to with kexec on panic.
-
-	crashkernel=range1:size1[,range2:size2,...][@offset]
-			[KNL] Same as above, but depends on the memory
+	crashkernel=	[KNL]
+		nn[KMG]@ss[KMG]
+			Reserve a chunk of physical memory to hold a
+			kernel to switch to with kexec on panic.
+		nn1[KMG]@ss1[KMG]/nn2[KMG]@ss2[KMG]
+			Same as above, but reserve 2 chunks of
+			physical memory.
+
+	crashkernel=	[KNL]
+		range1:size1[,range2:size2,...][@offset]
+			Same as above, but depends on the memory
 			in the running system. The syntax of range is
 			start-[end] where start and end are both
 			a memory unit (amount[KMG]). See also
 			Documentation/kdump/kdump.txt for a example.
+		range1:size1lo/size1hi[,range2:size2lo/size2hi,...][@offset_lo][/offset_hi]
+			Same as above, but reserve 2 chunks of memory.
 
 	cs89x0_dma=	[HW,NET]
 			Format: <dma>
-- 
1.7.0.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-22 16:23 ` Vitaly Mayatskikh
@ 2010-04-22 22:07   ` Eric W. Biederman
  -1 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2010-04-22 22:07 UTC (permalink / raw)
  To: Vitaly Mayatskikh
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Vivek Goyal, Haren Myneni, Neil Horman, Cong Wang, kexec

Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:

> Patch applies to 2.6.34-rc5
>
> On x86 platform, even if hardware is 64-bit capable, kernel starts
> execution in 32-bit mode. When system is kdump-enabled, crashed kernel
> switches to 32 bit mode and jumps into new kernel. This automatically
> limits location of dump-capture kernel image and it's initrd by first
> 4Gb of memory. Switching to 32 bit mode is performed by purgatory
> code, which has relocations of type R_X86_64_32S (32-bit signed), and
> this cuts "good" address space for crash kernel down to 2 Gb. I/O
> regions may cut down this space further.
>
> When system has a lot of memory (hundreds of gigabytes), dump-capture
> kernel also needs relatively a lot of memory to account old kernel's
> pages. It may be impossible to reserve enough memory below 2 or even 4
> Gb. Simplest solution is it break dump-capture kernel's reserved
> memory region into two pieces: first (small) region for kernel and
> initrd images may be easily placed in "good" address space in the
> beginning of physical memory, and second region may be located
> anywhere.
>
> This serie of patches realizes this approach. It requires also changes
> in kexec utility to make this feature work, but is
> backward-compatible: old versions of kexec will work with new
> kernel. I will post patch to kexec-tools upstream separately.

Have you tried loading a 64bit vmlinux directly into a higher address
range?  There may be a bit or two missing but you should be able to
load a linux kernel above 4GB.  I tested the basics of that mechanism
when I made the 64bit relocatable kernel.

I don't buy the argument that there is a direct connection between
the amount of memory you have and how much memory it takes to dump it.
Even an indirect connections seems suspicious.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-22 22:07   ` Eric W. Biederman
  0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2010-04-22 22:07 UTC (permalink / raw)
  To: Vitaly Mayatskikh
  Cc: Cong Wang, Neil Horman, kexec, linux-kernel, Haren Myneni,
	Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Vivek Goyal

Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:

> Patch applies to 2.6.34-rc5
>
> On x86 platform, even if hardware is 64-bit capable, kernel starts
> execution in 32-bit mode. When system is kdump-enabled, crashed kernel
> switches to 32 bit mode and jumps into new kernel. This automatically
> limits location of dump-capture kernel image and it's initrd by first
> 4Gb of memory. Switching to 32 bit mode is performed by purgatory
> code, which has relocations of type R_X86_64_32S (32-bit signed), and
> this cuts "good" address space for crash kernel down to 2 Gb. I/O
> regions may cut down this space further.
>
> When system has a lot of memory (hundreds of gigabytes), dump-capture
> kernel also needs relatively a lot of memory to account old kernel's
> pages. It may be impossible to reserve enough memory below 2 or even 4
> Gb. Simplest solution is it break dump-capture kernel's reserved
> memory region into two pieces: first (small) region for kernel and
> initrd images may be easily placed in "good" address space in the
> beginning of physical memory, and second region may be located
> anywhere.
>
> This serie of patches realizes this approach. It requires also changes
> in kexec utility to make this feature work, but is
> backward-compatible: old versions of kexec will work with new
> kernel. I will post patch to kexec-tools upstream separately.

Have you tried loading a 64bit vmlinux directly into a higher address
range?  There may be a bit or two missing but you should be able to
load a linux kernel above 4GB.  I tested the basics of that mechanism
when I made the 64bit relocatable kernel.

I don't buy the argument that there is a direct connection between
the amount of memory you have and how much memory it takes to dump it.
Even an indirect connections seems suspicious.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-22 22:07   ` Eric W. Biederman
@ 2010-04-22 22:37     ` H. Peter Anvin
  -1 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2010-04-22 22:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Vivek Goyal, Haren Myneni, Neil Horman, Cong Wang, kexec

On 04/22/2010 03:07 PM, Eric W. Biederman wrote:
> 
> Have you tried loading a 64bit vmlinux directly into a higher address
> range?  There may be a bit or two missing but you should be able to
> load a linux kernel above 4GB.  I tested the basics of that mechanism
> when I made the 64bit relocatable kernel.
> 
> I don't buy the argument that there is a direct connection between
> the amount of memory you have and how much memory it takes to dump it.
> Even an indirect connections seems suspicious.
> 

We actually have a 64-bit entry point even in bzImage; it is at offset
+0x200 from the 32-bit entry point.  Right now that offset is not
exported anywhere, but it has been stable for a very long time... at
least for as far back as the decompressor has been 64 bits.

The interface to the 64-bit code is by necessity wider, since there is
no such thing as paging off in 64-bit mode, but it probably isn't *too*
hard to figure out how page tables need to be set up in order to work
properly.  At that point, it would be good to document it.

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-22 22:37     ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2010-04-22 22:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, Cong Wang, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, Thomas Gleixner, Vivek Goyal

On 04/22/2010 03:07 PM, Eric W. Biederman wrote:
> 
> Have you tried loading a 64bit vmlinux directly into a higher address
> range?  There may be a bit or two missing but you should be able to
> load a linux kernel above 4GB.  I tested the basics of that mechanism
> when I made the 64bit relocatable kernel.
> 
> I don't buy the argument that there is a direct connection between
> the amount of memory you have and how much memory it takes to dump it.
> Even an indirect connections seems suspicious.
> 

We actually have a 64-bit entry point even in bzImage; it is at offset
+0x200 from the 32-bit entry point.  Right now that offset is not
exported anywhere, but it has been stable for a very long time... at
least for as far back as the decompressor has been 64 bits.

The interface to the 64-bit code is by necessity wider, since there is
no such thing as paging off in 64-bit mode, but it probably isn't *too*
hard to figure out how page tables need to be set up in order to work
properly.  At that point, it would be good to document it.

	-hpa

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-22 22:07   ` Eric W. Biederman
@ 2010-04-22 22:45     ` Vivek Goyal
  -1 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2010-04-22 22:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Haren Myneni, Neil Horman, Cong Wang, kexec

On Thu, Apr 22, 2010 at 03:07:11PM -0700, Eric W. Biederman wrote:
> Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:
> 
> > Patch applies to 2.6.34-rc5
> >
> > On x86 platform, even if hardware is 64-bit capable, kernel starts
> > execution in 32-bit mode. When system is kdump-enabled, crashed kernel
> > switches to 32 bit mode and jumps into new kernel. This automatically
> > limits location of dump-capture kernel image and it's initrd by first
> > 4Gb of memory. Switching to 32 bit mode is performed by purgatory
> > code, which has relocations of type R_X86_64_32S (32-bit signed), and
> > this cuts "good" address space for crash kernel down to 2 Gb. I/O
> > regions may cut down this space further.
> >
> > When system has a lot of memory (hundreds of gigabytes), dump-capture
> > kernel also needs relatively a lot of memory to account old kernel's
> > pages. It may be impossible to reserve enough memory below 2 or even 4
> > Gb. Simplest solution is it break dump-capture kernel's reserved
> > memory region into two pieces: first (small) region for kernel and
> > initrd images may be easily placed in "good" address space in the
> > beginning of physical memory, and second region may be located
> > anywhere.
> >
> > This serie of patches realizes this approach. It requires also changes
> > in kexec utility to make this feature work, but is
> > backward-compatible: old versions of kexec will work with new
> > kernel. I will post patch to kexec-tools upstream separately.
> 
> Have you tried loading a 64bit vmlinux directly into a higher address
> range?  There may be a bit or two missing but you should be able to
> load a linux kernel above 4GB.  I tested the basics of that mechanism
> when I made the 64bit relocatable kernel.

I guess even if it works, for distributions it will become additional
liability to carry vmlinux (instead of relocatable bzImage). So we shall
have to find a way to make bzImage work.

> 
> I don't buy the argument that there is a direct connection between
> the amount of memory you have and how much memory it takes to dump it.
> Even an indirect connections seems suspicious.

Memory requirement by user space might be of interest though like dump
filtering tools. I vaguely remember that it used to first traverse all
the memory pages, create some internal data structures and then start
dumping.

So memory required by filtering tool might be directly proportional to
amount of memory present in the system.

Vitaly, have you really run into cases where 2G upper limit is a concern.
What is the configuration you have, how much memory it has and how much
memory are you planning to reserve for kdump kernel?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-22 22:45     ` Vivek Goyal
  0 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2010-04-22 22:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, Cong Wang, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

On Thu, Apr 22, 2010 at 03:07:11PM -0700, Eric W. Biederman wrote:
> Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:
> 
> > Patch applies to 2.6.34-rc5
> >
> > On x86 platform, even if hardware is 64-bit capable, kernel starts
> > execution in 32-bit mode. When system is kdump-enabled, crashed kernel
> > switches to 32 bit mode and jumps into new kernel. This automatically
> > limits location of dump-capture kernel image and it's initrd by first
> > 4Gb of memory. Switching to 32 bit mode is performed by purgatory
> > code, which has relocations of type R_X86_64_32S (32-bit signed), and
> > this cuts "good" address space for crash kernel down to 2 Gb. I/O
> > regions may cut down this space further.
> >
> > When system has a lot of memory (hundreds of gigabytes), dump-capture
> > kernel also needs relatively a lot of memory to account old kernel's
> > pages. It may be impossible to reserve enough memory below 2 or even 4
> > Gb. Simplest solution is it break dump-capture kernel's reserved
> > memory region into two pieces: first (small) region for kernel and
> > initrd images may be easily placed in "good" address space in the
> > beginning of physical memory, and second region may be located
> > anywhere.
> >
> > This serie of patches realizes this approach. It requires also changes
> > in kexec utility to make this feature work, but is
> > backward-compatible: old versions of kexec will work with new
> > kernel. I will post patch to kexec-tools upstream separately.
> 
> Have you tried loading a 64bit vmlinux directly into a higher address
> range?  There may be a bit or two missing but you should be able to
> load a linux kernel above 4GB.  I tested the basics of that mechanism
> when I made the 64bit relocatable kernel.

I guess even if it works, for distributions it will become additional
liability to carry vmlinux (instead of relocatable bzImage). So we shall
have to find a way to make bzImage work.

> 
> I don't buy the argument that there is a direct connection between
> the amount of memory you have and how much memory it takes to dump it.
> Even an indirect connections seems suspicious.

Memory requirement by user space might be of interest though like dump
filtering tools. I vaguely remember that it used to first traverse all
the memory pages, create some internal data structures and then start
dumping.

So memory required by filtering tool might be directly proportional to
amount of memory present in the system.

Vitaly, have you really run into cases where 2G upper limit is a concern.
What is the configuration you have, how much memory it has and how much
memory are you planning to reserve for kdump kernel?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-22 22:45     ` Vivek Goyal
@ 2010-04-23  0:48       ` Eric W. Biederman
  -1 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2010-04-23  0:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Vitaly Mayatskikh, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Haren Myneni, Neil Horman, Cong Wang, kexec

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Apr 22, 2010 at 03:07:11PM -0700, Eric W. Biederman wrote:
>> Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:
>> >
>> > This serie of patches realizes this approach. It requires also changes
>> > in kexec utility to make this feature work, but is
>> > backward-compatible: old versions of kexec will work with new
>> > kernel. I will post patch to kexec-tools upstream separately.
>> 
>> Have you tried loading a 64bit vmlinux directly into a higher address
>> range?  There may be a bit or two missing but you should be able to
>> load a linux kernel above 4GB.  I tested the basics of that mechanism
>> when I made the 64bit relocatable kernel.
>
> I guess even if it works, for distributions it will become additional
> liability to carry vmlinux (instead of relocatable bzImage). So we shall
> have to find a way to make bzImage work.

As Peter pointed out we actually have everything thing we need except
a bit of documentation and the flag that says this is a 64bit kernel.

>From a testing perspective a 64bit vmlinux should work today without
changes.  Once it is confirmed there is a solution with the 64bit
kernel we just need a small patch to boot.txt and a few tweaks to 
/sbin/kexec to handle a 64bit bzImage.

>> I don't buy the argument that there is a direct connection between
>> the amount of memory you have and how much memory it takes to dump it.
>> Even an indirect connections seems suspicious.
>
> Memory requirement by user space might be of interest though like dump
> filtering tools. I vaguely remember that it used to first traverse all
> the memory pages, create some internal data structures and then start
> dumping.
>
> So memory required by filtering tool might be directly proportional to
> amount of memory present in the system.

Assuming your dump filtering tool creates a bitmap of pages to be dumped
you get a ration of 32K to 1.  Or 3MB for 100GB and 32MB for 1TB.
Which is noticeable in the worst case but definitely not enough to push
us past 2GB.

> Vitaly, have you really run into cases where 2G upper limit is a concern.
> What is the configuration you have, how much memory it has and how much
> memory are you planning to reserve for kdump kernel?

A good question.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-23  0:48       ` Eric W. Biederman
  0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2010-04-23  0:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Vitaly Mayatskikh, Cong Wang, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Apr 22, 2010 at 03:07:11PM -0700, Eric W. Biederman wrote:
>> Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:
>> >
>> > This serie of patches realizes this approach. It requires also changes
>> > in kexec utility to make this feature work, but is
>> > backward-compatible: old versions of kexec will work with new
>> > kernel. I will post patch to kexec-tools upstream separately.
>> 
>> Have you tried loading a 64bit vmlinux directly into a higher address
>> range?  There may be a bit or two missing but you should be able to
>> load a linux kernel above 4GB.  I tested the basics of that mechanism
>> when I made the 64bit relocatable kernel.
>
> I guess even if it works, for distributions it will become additional
> liability to carry vmlinux (instead of relocatable bzImage). So we shall
> have to find a way to make bzImage work.

As Peter pointed out we actually have everything thing we need except
a bit of documentation and the flag that says this is a 64bit kernel.

From a testing perspective a 64bit vmlinux should work today without
changes.  Once it is confirmed there is a solution with the 64bit
kernel we just need a small patch to boot.txt and a few tweaks to 
/sbin/kexec to handle a 64bit bzImage.

>> I don't buy the argument that there is a direct connection between
>> the amount of memory you have and how much memory it takes to dump it.
>> Even an indirect connections seems suspicious.
>
> Memory requirement by user space might be of interest though like dump
> filtering tools. I vaguely remember that it used to first traverse all
> the memory pages, create some internal data structures and then start
> dumping.
>
> So memory required by filtering tool might be directly proportional to
> amount of memory present in the system.

Assuming your dump filtering tool creates a bitmap of pages to be dumped
you get a ration of 32K to 1.  Or 3MB for 100GB and 32MB for 1TB.
Which is noticeable in the worst case but definitely not enough to push
us past 2GB.

> Vitaly, have you really run into cases where 2G upper limit is a concern.
> What is the configuration you have, how much memory it has and how much
> memory are you planning to reserve for kdump kernel?

A good question.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-23  0:48       ` Eric W. Biederman
@ 2010-04-23  5:21         ` Cong Wang
  -1 siblings, 0 replies; 30+ messages in thread
From: Cong Wang @ 2010-04-23  5:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, Vitaly Mayatskikh, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Haren Myneni, Neil Horman, kexec

Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
>> Vitaly, have you really run into cases where 2G upper limit is a concern.
>> What is the configuration you have, how much memory it has and how much
>> memory are you planning to reserve for kdump kernel?
> 
> A good question.
> 

We have observed that on a machine which has 66G memory, when we do
crashkernel=1G@4G, kexec failed to load the crash kernel, but the memory
reservation _did_ succeed.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-23  5:21         ` Cong Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Cong Wang @ 2010-04-23  5:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Vivek Goyal

Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
>> Vitaly, have you really run into cases where 2G upper limit is a concern.
>> What is the configuration you have, how much memory it has and how much
>> memory are you planning to reserve for kdump kernel?
> 
> A good question.
> 

We have observed that on a machine which has 66G memory, when we do
crashkernel=1G@4G, kexec failed to load the crash kernel, but the memory
reservation _did_ succeed.

Thanks.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-23  5:21         ` Cong Wang
@ 2010-04-23  5:42           ` Eric W. Biederman
  -1 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2010-04-23  5:42 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vivek Goyal, Vitaly Mayatskikh, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Haren Myneni, Neil Horman, kexec

Cong Wang <amwang@redhat.com> writes:

> Eric W. Biederman wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>>
>>> Vitaly, have you really run into cases where 2G upper limit is a concern.
>>> What is the configuration you have, how much memory it has and how much
>>> memory are you planning to reserve for kdump kernel?
>>
>> A good question.
>>
>
> We have observed that on a machine which has 66G memory, when we do
> crashkernel=1G@4G, kexec failed to load the crash kernel, but the memory
> reservation _did_ succeed.

Did you try loading vmlinux?   If not this sounds like the fact that
/sbin/kexec doesn't realize it can boot a 64bit bzImage in 64bit
mode.

Eric


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-23  5:42           ` Eric W. Biederman
  0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2010-04-23  5:42 UTC (permalink / raw)
  To: Cong Wang
  Cc: Vitaly Mayatskikh, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Vivek Goyal

Cong Wang <amwang@redhat.com> writes:

> Eric W. Biederman wrote:
>> Vivek Goyal <vgoyal@redhat.com> writes:
>>
>>> Vitaly, have you really run into cases where 2G upper limit is a concern.
>>> What is the configuration you have, how much memory it has and how much
>>> memory are you planning to reserve for kdump kernel?
>>
>> A good question.
>>
>
> We have observed that on a machine which has 66G memory, when we do
> crashkernel=1G@4G, kexec failed to load the crash kernel, but the memory
> reservation _did_ succeed.

Did you try loading vmlinux?   If not this sounds like the fact that
/sbin/kexec doesn't realize it can boot a 64bit bzImage in 64bit
mode.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-23  5:42           ` Eric W. Biederman
@ 2010-04-23  6:43             ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-23  6:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Cong Wang, Vivek Goyal, Vitaly Mayatskikh, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Haren Myneni,
	Neil Horman, kexec

At Thu, 22 Apr 2010 22:42:25 -0700, Eric W. Biederman wrote:

> > We have observed that on a machine which has 66G memory, when we do
> > crashkernel=1G@4G, kexec failed to load the crash kernel, but the memory
> > reservation _did_ succeed.
> 
> Did you try loading vmlinux?   If not this sounds like the fact that
> /sbin/kexec doesn't realize it can boot a 64bit bzImage in 64bit
> mode.

/sbin/kexec currently has hardcoded limitations for bzImage and
initrd:

include/x86/x86-linux.h:

#define DEFAULT_INITRD_ADDR_MAX 0x37FFFFFF
#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF

This is easy to override. However, purgatory code still wants to see
kernel below 2 Gb (32-bit signed relocations).
-- 
wbr, Vitaly

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-23  6:43             ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-23  6:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, Cong Wang, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
	Vivek Goyal

At Thu, 22 Apr 2010 22:42:25 -0700, Eric W. Biederman wrote:

> > We have observed that on a machine which has 66G memory, when we do
> > crashkernel=1G@4G, kexec failed to load the crash kernel, but the memory
> > reservation _did_ succeed.
> 
> Did you try loading vmlinux?   If not this sounds like the fact that
> /sbin/kexec doesn't realize it can boot a 64bit bzImage in 64bit
> mode.

/sbin/kexec currently has hardcoded limitations for bzImage and
initrd:

include/x86/x86-linux.h:

#define DEFAULT_INITRD_ADDR_MAX 0x37FFFFFF
#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF

This is easy to override. However, purgatory code still wants to see
kernel below 2 Gb (32-bit signed relocations).
-- 
wbr, Vitaly

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-22 22:45     ` Vivek Goyal
@ 2010-04-23  7:08       ` Vitaly Mayatskikh
  -1 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-23  7:08 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Vitaly Mayatskikh, linux-kernel,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Haren Myneni,
	Neil Horman, Cong Wang, kexec

At Thu, 22 Apr 2010 18:45:25 -0400, Vivek Goyal wrote:

> Vitaly, have you really run into cases where 2G upper limit is a concern.
> What is the configuration you have, how much memory it has and how much
> memory are you planning to reserve for kdump kernel?

I tried it on system with 96G of RAM. When I reserved 512M for kdump
kernel, system stopped loading somewhere in user space. With larger
reserved area /sbin/kexec can't load kernel (because of hardcoded
limitation in /sbin/kexec). After removing this limitation kernel was
loaded below 2G, but system even hasn't booted.

Unfortunately, I don't remember exact details now and have no access
to that machine temporarily. Will try to get access and come back with
details.
-- 
wbr, Vitaly

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-23  7:08       ` Vitaly Mayatskikh
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Mayatskikh @ 2010-04-23  7:08 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Vitaly Mayatskikh, Cong Wang, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Thomas Gleixner

At Thu, 22 Apr 2010 18:45:25 -0400, Vivek Goyal wrote:

> Vitaly, have you really run into cases where 2G upper limit is a concern.
> What is the configuration you have, how much memory it has and how much
> memory are you planning to reserve for kdump kernel?

I tried it on system with 96G of RAM. When I reserved 512M for kdump
kernel, system stopped loading somewhere in user space. With larger
reserved area /sbin/kexec can't load kernel (because of hardcoded
limitation in /sbin/kexec). After removing this limitation kernel was
loaded below 2G, but system even hasn't booted.

Unfortunately, I don't remember exact details now and have no access
to that machine temporarily. Will try to get access and come back with
details.
-- 
wbr, Vitaly

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
  2010-04-23  0:48       ` Eric W. Biederman
@ 2010-04-23 14:44         ` Vivek Goyal
  -1 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2010-04-23 14:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Haren Myneni, Neil Horman, Cong Wang, kexec

On Thu, Apr 22, 2010 at 05:48:53PM -0700, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Thu, Apr 22, 2010 at 03:07:11PM -0700, Eric W. Biederman wrote:
> >> Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:
> >> >
> >> > This serie of patches realizes this approach. It requires also changes
> >> > in kexec utility to make this feature work, but is
> >> > backward-compatible: old versions of kexec will work with new
> >> > kernel. I will post patch to kexec-tools upstream separately.
> >> 
> >> Have you tried loading a 64bit vmlinux directly into a higher address
> >> range?  There may be a bit or two missing but you should be able to
> >> load a linux kernel above 4GB.  I tested the basics of that mechanism
> >> when I made the 64bit relocatable kernel.
> >
> > I guess even if it works, for distributions it will become additional
> > liability to carry vmlinux (instead of relocatable bzImage). So we shall
> > have to find a way to make bzImage work.
> 
> As Peter pointed out we actually have everything thing we need except
> a bit of documentation and the flag that says this is a 64bit kernel.
> 
> >From a testing perspective a 64bit vmlinux should work today without
> changes.  Once it is confirmed there is a solution with the 64bit
> kernel we just need a small patch to boot.txt and a few tweaks to 
> /sbin/kexec to handle a 64bit bzImage.
> 

Agreed. Doing little more testing and fixing some issues, if need be, and
making 64 bzImage work is the better way instead of splitting the reserved
memory.
 
Thanks
Vivek

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Add second memory region for crash kernel
@ 2010-04-23 14:44         ` Vivek Goyal
  0 siblings, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2010-04-23 14:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vitaly Mayatskikh, Cong Wang, Neil Horman, kexec, linux-kernel,
	Haren Myneni, Ingo Molnar, H. Peter Anvin, Thomas Gleixner

On Thu, Apr 22, 2010 at 05:48:53PM -0700, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Thu, Apr 22, 2010 at 03:07:11PM -0700, Eric W. Biederman wrote:
> >> Vitaly Mayatskikh <v.mayatskih@gmail.com> writes:
> >> >
> >> > This serie of patches realizes this approach. It requires also changes
> >> > in kexec utility to make this feature work, but is
> >> > backward-compatible: old versions of kexec will work with new
> >> > kernel. I will post patch to kexec-tools upstream separately.
> >> 
> >> Have you tried loading a 64bit vmlinux directly into a higher address
> >> range?  There may be a bit or two missing but you should be able to
> >> load a linux kernel above 4GB.  I tested the basics of that mechanism
> >> when I made the 64bit relocatable kernel.
> >
> > I guess even if it works, for distributions it will become additional
> > liability to carry vmlinux (instead of relocatable bzImage). So we shall
> > have to find a way to make bzImage work.
> 
> As Peter pointed out we actually have everything thing we need except
> a bit of documentation and the flag that says this is a 64bit kernel.
> 
> >From a testing perspective a 64bit vmlinux should work today without
> changes.  Once it is confirmed there is a solution with the 64bit
> kernel we just need a small patch to boot.txt and a few tweaks to 
> /sbin/kexec to handle a 64bit bzImage.
> 

Agreed. Doing little more testing and fixing some issues, if need be, and
making 64 bzImage work is the better way instead of splitting the reserved
memory.
 
Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-04-23 14:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-22 16:23 [PATCH 0/5] Add second memory region for crash kernel Vitaly Mayatskikh
2010-04-22 16:23 ` Vitaly Mayatskikh
2010-04-22 16:23 ` [PATCH 1/5] Introduce second memory resource " Vitaly Mayatskikh
2010-04-22 16:23   ` Vitaly Mayatskikh
2010-04-22 16:23 ` [PATCH 2/5] Modify parse_crashkernel* for new syntax Vitaly Mayatskikh
2010-04-22 16:23   ` Vitaly Mayatskikh
2010-04-22 16:23 ` [PATCH 3/5] Support second memory region in crash_shrink_memory() Vitaly Mayatskikh
2010-04-22 16:23   ` Vitaly Mayatskikh
2010-04-22 16:23 ` [PATCH 4/5] x86: use second memory region for dump-capture kernel Vitaly Mayatskikh
2010-04-22 16:23   ` Vitaly Mayatskikh
2010-04-22 16:23 ` [PATCH 5/5] kexec: update documentation Vitaly Mayatskikh
2010-04-22 16:23   ` Vitaly Mayatskikh
2010-04-22 22:07 ` [PATCH 0/5] Add second memory region for crash kernel Eric W. Biederman
2010-04-22 22:07   ` Eric W. Biederman
2010-04-22 22:37   ` H. Peter Anvin
2010-04-22 22:37     ` H. Peter Anvin
2010-04-22 22:45   ` Vivek Goyal
2010-04-22 22:45     ` Vivek Goyal
2010-04-23  0:48     ` Eric W. Biederman
2010-04-23  0:48       ` Eric W. Biederman
2010-04-23  5:21       ` Cong Wang
2010-04-23  5:21         ` Cong Wang
2010-04-23  5:42         ` Eric W. Biederman
2010-04-23  5:42           ` Eric W. Biederman
2010-04-23  6:43           ` Vitaly Mayatskikh
2010-04-23  6:43             ` Vitaly Mayatskikh
2010-04-23 14:44       ` Vivek Goyal
2010-04-23 14:44         ` Vivek Goyal
2010-04-23  7:08     ` Vitaly Mayatskikh
2010-04-23  7:08       ` Vitaly Mayatskikh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.