All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools
@ 2013-04-02 17:19 Yinghai Lu
  2013-04-02 17:19 ` [PATCH 1/4] x86, kdump: Set crashkernel_low automatically Yinghai Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 17:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: WANG Chao, Vivek Goyal, Eric W. Biederman, linux-kernel

Vivek found some problems with old kexec-tools.

We keep the old crashkernel=X to old behavoir, so it will not break
old tools.
Add crashkernel=X,high to support new kexec-tools that supports loading high.
when high is used, memblock will search from top to low.
if the allocated one is above 4G, kernel will try to auto allocate
72M under 4G for swiotlb.
user could crashkernel=Y,low to change 72M to other value.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] x86, kdump: Set crashkernel_low automatically
  2013-04-02 17:19 [PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools Yinghai Lu
@ 2013-04-02 17:19 ` Yinghai Lu
  2013-04-02 17:19 ` [PATCH 2/4] kexec: use Crash kernel for Crash kernel low Yinghai Lu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 17:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: WANG Chao, Vivek Goyal, Eric W. Biederman, linux-kernel, Yinghai Lu

Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y.  This causes regression
if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
     also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.

Reported-by: WANG Chao <chaowang@redhat.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 Documentation/kernel-parameters.txt |   15 ++++++++++++---
 arch/x86/kernel/setup.c             |   20 +++++++++++++++++---
 include/linux/swiotlb.h             |    1 +
 lib/swiotlb.c                       |   19 +++++++++++++++----
 4 files changed, 45 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -521,19 +521,33 @@ static void __init reserve_crashkernel_l
 	unsigned long long low_base = 0, low_size = 0;
 	unsigned long total_low_mem;
 	unsigned long long base;
+	bool auto_set = false;
 	int ret;
 
 	total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
 	ret = parse_crashkernel_low(boot_command_line, total_low_mem,
 						&low_size, &base);
-	if (ret != 0 || low_size <= 0)
-		return;
+	if (ret != 0) {
+		/*
+		 * two parts from lib/swiotlb.c:
+		 *	swiotlb size: user specified with swiotlb= or default.
+		 *	swiotlb overflow buffer: now is hardcoded to 32k,
+		 *		round to 8M to cover more others.
+		 */
+		low_size = swiotlb_size_or_default() + (8UL<<20);
+		auto_set = true;
+	} else {
+		/* passed with crashkernel_low=0 ? */
+		if (!low_size)
+			return;
+	}
 
 	low_base = memblock_find_in_range(low_size, (1ULL<<32),
 					low_size, alignment);
 
 	if (!low_base) {
-		pr_info("crashkernel low reservation failed - No suitable area found.\n");
+		if (!auto_set)
+			pr_info("crashkernel low reservation failed - No suitable area found.\n");
 
 		return;
 	}
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -25,6 +25,7 @@ extern int swiotlb_force;
 extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 extern unsigned long swiotlb_nr_tbl(void);
+unsigned long swiotlb_size_or_default(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 
 /*
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
 	if (!strcmp(str, "force"))
 		swiotlb_force = 1;
 
-	return 1;
+	return 0;
 }
-__setup("swiotlb=", setup_io_tlb_npages);
+early_param("swiotlb", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 
 unsigned long swiotlb_nr_tbl(void)
@@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
 	return io_tlb_nslabs;
 }
 EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
+
+/* default to 64MB */
+#define IO_TLB_DEFAULT_SIZE (64UL<<20)
+unsigned long swiotlb_size_or_default(void)
+{
+	unsigned long size;
+
+	size = io_tlb_nslabs << IO_TLB_SHIFT;
+
+	return size ? size : (IO_TLB_DEFAULT_SIZE);
+}
+
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
 				      volatile void *address)
@@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
 void  __init
 swiotlb_init(int verbose)
 {
-	/* default to 64MB */
-	size_t default_size = 64UL<<20;
+	size_t default_size = IO_TLB_DEFAULT_SIZE;
 	unsigned char *vstart;
 	unsigned long bytes;
 
Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
 			is selected automatically. Check
 			Documentation/kdump/kdump.txt for further details.
 
-	crashkernel_low=size[KMG]
-			[KNL, x86] parts under 4G.
-
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory
 			in the running system. The syntax of range is
@@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
 			a memory unit (amount[KMG]). See also
 			Documentation/kdump/kdump.txt for an example.
 
+	crashkernel_low=size[KMG]
+			[KNL, x86_64] range under 4G. When crashkernel= is
+			passed, kernel allocate physical memory region
+			above 4G, that cause second kernel crash on system
+			that need swiotlb later. Kernel would try to allocate
+			some region below 4G automatically. This one let
+			user to specify own low range under 4G for second
+			kernel instead.
+			0: to disable low allocation on systems that do not
+			need swiotlb, that will save 72M low ram in first
+			kernel.
+
 	cs89x0_dma=	[HW,NET]
 			Format: <dma>
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/4] kexec: use Crash kernel for Crash kernel low
  2013-04-02 17:19 [PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools Yinghai Lu
  2013-04-02 17:19 ` [PATCH 1/4] x86, kdump: Set crashkernel_low automatically Yinghai Lu
@ 2013-04-02 17:19 ` Yinghai Lu
  2013-04-02 17:19 ` [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low Yinghai Lu
  2013-04-02 17:19 ` [PATCH] x86, kdump: Change crashkernel_high/low= to crashkernel=;high/low Yinghai Lu
  3 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 17:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: WANG Chao, Vivek Goyal, Eric W. Biederman, linux-kernel, Yinghai Lu

We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem
instead.

So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 kernel/kexec.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -55,7 +55,7 @@ struct resource crashk_res = {
 	.flags = IORESOURCE_BUSY | IORESOURCE_MEM
 };
 struct resource crashk_low_res = {
-	.name  = "Crash kernel low",
+	.name  = "Crash kernel",
 	.start = 0,
 	.end   = 0,
 	.flags = IORESOURCE_BUSY | IORESOURCE_MEM

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 17:19 [PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools Yinghai Lu
  2013-04-02 17:19 ` [PATCH 1/4] x86, kdump: Set crashkernel_low automatically Yinghai Lu
  2013-04-02 17:19 ` [PATCH 2/4] kexec: use Crash kernel for Crash kernel low Yinghai Lu
@ 2013-04-02 17:19 ` Yinghai Lu
  2013-04-02 18:06   ` Vivek Goyal
  2013-04-02 17:19 ` [PATCH] x86, kdump: Change crashkernel_high/low= to crashkernel=;high/low Yinghai Lu
  3 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 17:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: WANG Chao, Vivek Goyal, Eric W. Biederman, linux-kernel, Yinghai Lu

Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 Documentation/kernel-parameters.txt |    8 ++++++--
 arch/x86/kernel/setup.c             |   26 ++++++++++++++++++++------
 include/linux/kexec.h               |    2 ++
 kernel/kexec.c                      |    9 +++++++++
 4 files changed, 37 insertions(+), 8 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,9 +603,13 @@ bytes respectively. Such letter suffixes
 			a memory unit (amount[KMG]). See also
 			Documentation/kdump/kdump.txt for an example.
 
+	crashkernel_high=size[KMG]
+			[KNL, x86_64] range could be above 4G. Allow kernel
+			to allocate physical memory region from top, so could
+			be above 4G if system have more than 4G ram installed.
 	crashkernel_low=size[KMG]
-			[KNL, x86_64] range under 4G. When crashkernel= is
-			passed, kernel allocate physical memory region
+			[KNL, x86_64] range under 4G. When crashkernel_high= is
+			passed, kernel could allocate physical memory region
 			above 4G, that cause second kernel crash on system
 			that need swiotlb later. Kernel would try to allocate
 			some region below 4G automatically. This one let
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -507,11 +507,14 @@ static void __init memblock_x86_reserve_
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
+ * On 64bit, old kexec-tools need to under 896MiB.
  */
 #ifdef CONFIG_X86_32
-# define CRASH_KERNEL_ADDR_MAX	(512 << 20)
+# define CRASH_KERNEL_ADDR_LOW_MAX	(512 << 20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX	(512 << 20)
 #else
-# define CRASH_KERNEL_ADDR_MAX	MAXMEM
+# define CRASH_KERNEL_ADDR_LOW_MAX	(896UL<<20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX	MAXMEM
 #endif
 
 static void __init reserve_crashkernel_low(void)
@@ -525,6 +528,7 @@ static void __init reserve_crashkernel_l
 	int ret;
 
 	total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
+	/* crashkernel_low=YM */
 	ret = parse_crashkernel_low(boot_command_line, total_low_mem,
 						&low_size, &base);
 	if (ret != 0) {
@@ -568,14 +572,22 @@ static void __init reserve_crashkernel(v
 	const unsigned long long alignment = 16<<20;	/* 16M */
 	unsigned long long total_mem;
 	unsigned long long crash_size, crash_base;
+	bool high = true;
 	int ret;
 
 	total_mem = memblock_phys_mem_size();
 
-	ret = parse_crashkernel(boot_command_line, total_mem,
+	/* crashkernel_high=XM */
+	ret = parse_crashkernel_high(boot_command_line, total_mem,
 			&crash_size, &crash_base);
-	if (ret != 0 || crash_size <= 0)
-		return;
+	if (ret != 0 || crash_size <= 0) {
+		/* crashkernel=XM */
+		ret = parse_crashkernel(boot_command_line, total_mem,
+				&crash_size, &crash_base);
+		if (ret != 0 || crash_size <= 0)
+			return;
+		high = false;
+	}
 
 	/* 0 means: find the address automatically */
 	if (crash_base <= 0) {
@@ -583,7 +595,9 @@ static void __init reserve_crashkernel(v
 		 *  kexec want bzImage is below CRASH_KERNEL_ADDR_MAX
 		 */
 		crash_base = memblock_find_in_range(alignment,
-			       CRASH_KERNEL_ADDR_MAX, crash_size, alignment);
+					high ? CRASH_KERNEL_ADDR_HIGH_MAX :
+					       CRASH_KERNEL_ADDR_LOW_MAX,
+					crash_size, alignment);
 
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
Index: linux-2.6/include/linux/kexec.h
===================================================================
--- linux-2.6.orig/include/linux/kexec.h
+++ linux-2.6/include/linux/kexec.h
@@ -200,6 +200,8 @@ extern size_t vmcoreinfo_max_size;
 
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
+int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
+		unsigned long long *crash_size, unsigned long long *crash_base);
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
 		unsigned long long *crash_size, unsigned long long *crash_base);
 int crash_shrink_memory(unsigned long new_size);
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1422,6 +1422,15 @@ int __init parse_crashkernel(char *cmdli
 					"crashkernel=");
 }
 
+int __init parse_crashkernel_high(char *cmdline,
+			     unsigned long long system_ram,
+			     unsigned long long *crash_size,
+			     unsigned long long *crash_base)
+{
+	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+					"crashkernel_high=");
+}
+
 int __init parse_crashkernel_low(char *cmdline,
 			     unsigned long long system_ram,
 			     unsigned long long *crash_size,

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] x86, kdump: Change crashkernel_high/low= to crashkernel=;high/low
  2013-04-02 17:19 [PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools Yinghai Lu
                   ` (2 preceding siblings ...)
  2013-04-02 17:19 ` [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low Yinghai Lu
@ 2013-04-02 17:19 ` Yinghai Lu
  3 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 17:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: WANG Chao, Vivek Goyal, Eric W. Biederman, linux-kernel, Yinghai Lu

Per hpa, use crashkernel=XM;high crashkernel=YM;low instead of
crashkernel_hign=XM crashkernel_low=YM. As that could be extensible.

-v2: according to Vivek, change delimiter to ;

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 Documentation/kernel-parameters.txt |    8 ++++----
 arch/x86/kernel/setup.c             |    6 +++---
 kernel/kexec.c                      |   23 +++++++++++++++++------
 3 files changed, 24 insertions(+), 13 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,13 +603,13 @@ bytes respectively. Such letter suffixes
 			a memory unit (amount[KMG]). See also
 			Documentation/kdump/kdump.txt for an example.
 
-	crashkernel_high=size[KMG]
+	crashkernel=size[KMG];high
 			[KNL, x86_64] range could be above 4G. Allow kernel
 			to allocate physical memory region from top, so could
 			be above 4G if system have more than 4G ram installed.
-	crashkernel_low=size[KMG]
-			[KNL, x86_64] range under 4G. When crashkernel_high= is
-			passed, kernel could allocate physical memory region
+	crashkernel=size[KMG];low
+			[KNL, x86_64] range under 4G. When crashkernel=X;high
+			is passed, kernel could allocate physical memory region
 			above 4G, that cause second kernel crash on system
 			that need swiotlb later. Kernel would try to allocate
 			some region below 4G automatically. This one let
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -528,7 +528,7 @@ static void __init reserve_crashkernel_l
 	int ret;
 
 	total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
-	/* crashkernel_low=YM */
+	/* crashkernel=YM;low */
 	ret = parse_crashkernel_low(boot_command_line, total_low_mem,
 						&low_size, &base);
 	if (ret != 0) {
@@ -541,7 +541,7 @@ static void __init reserve_crashkernel_l
 		low_size = swiotlb_size_or_default() + (8UL<<20);
 		auto_set = true;
 	} else {
-		/* passed with crashkernel_low=0 ? */
+		/* passed with crashkernel=0;low ? */
 		if (!low_size)
 			return;
 	}
@@ -577,7 +577,7 @@ static void __init reserve_crashkernel(v
 
 	total_mem = memblock_phys_mem_size();
 
-	/* crashkernel_high=XM */
+	/* crashkernel=XM;high */
 	ret = parse_crashkernel_high(boot_command_line, total_mem,
 			&crash_size, &crash_base);
 	if (ret != 0 || crash_size <= 0) {
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1360,7 +1360,7 @@ static int __init parse_crashkernel_simp
 
 	if (*cur == '@')
 		*crash_base = memparse(cur+1, &cur);
-	else if (*cur != ' ' && *cur != '\0') {
+	else if (*cur != ' ' && *cur != ';' && *cur != '\0') {
 		pr_warning("crashkernel: unrecognized char\n");
 		return -EINVAL;
 	}
@@ -1376,7 +1376,8 @@ static int __init __parse_crashkernel(ch
 			     unsigned long long system_ram,
 			     unsigned long long *crash_size,
 			     unsigned long long *crash_base,
-				const char *name)
+			     const char *name,
+			     const char *suffix)
 {
 	char 	*p = cmdline, *ck_cmdline = NULL;
 	char	*first_colon, *first_space;
@@ -1388,7 +1389,17 @@ static int __init __parse_crashkernel(ch
 	/* find crashkernel and use the last one if there are more */
 	p = strstr(p, name);
 	while (p) {
-		ck_cmdline = p;
+		if (!suffix)
+			ck_cmdline = p;
+		else {
+			char *end_p = strchr(p, ' ');
+
+			if (!end_p)
+				end_p = p + strlen(p);
+			end_p -= strlen(suffix);
+			if (!strncmp(end_p, suffix, strlen(suffix)))
+				ck_cmdline = p;
+		}
 		p = strstr(p+1, name);
 	}
 
@@ -1419,7 +1430,7 @@ int __init parse_crashkernel(char *cmdli
 			     unsigned long long *crash_base)
 {
 	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-					"crashkernel=");
+					"crashkernel=", NULL);
 }
 
 int __init parse_crashkernel_high(char *cmdline,
@@ -1428,7 +1439,7 @@ int __init parse_crashkernel_high(char *
 			     unsigned long long *crash_base)
 {
 	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-					"crashkernel_high=");
+					"crashkernel=", ";high");
 }
 
 int __init parse_crashkernel_low(char *cmdline,
@@ -1437,7 +1448,7 @@ int __init parse_crashkernel_low(char *c
 			     unsigned long long *crash_base)
 {
 	return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-					"crashkernel_low=");
+					"crashkernel=", ";low");
 }
 
 static void update_vmcoreinfo_note(void)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 17:19 ` [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low Yinghai Lu
@ 2013-04-02 18:06   ` Vivek Goyal
  2013-04-02 18:42     ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2013-04-02 18:06 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, linux-kernel

On Tue, Apr 02, 2013 at 10:19:42AM -0700, Yinghai Lu wrote:

[..]
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6/Documentation/kernel-parameters.txt
> @@ -603,9 +603,13 @@ bytes respectively. Such letter suffixes
>  			a memory unit (amount[KMG]). See also
>  			Documentation/kdump/kdump.txt for an example.
>  
> +	crashkernel_high=size[KMG]
> +			[KNL, x86_64] range could be above 4G. Allow kernel
> +			to allocate physical memory region from top, so could
> +			be above 4G if system have more than 4G ram installed.
>  	crashkernel_low=size[KMG]
> -			[KNL, x86_64] range under 4G. When crashkernel= is
> -			passed, kernel allocate physical memory region
> +			[KNL, x86_64] range under 4G. When crashkernel_high= is
> +			passed, kernel could allocate physical memory region
>  			above 4G, that cause second kernel crash on system
>  			that need swiotlb later. Kernel would try to allocate
>  			some region below 4G automatically. This one let

Hi Yinghai,

I think there are still some issues with crashkernel= semantics.

What if I specify both crashkernel_high= as well as crashkernel_low=.
Looks like crashkernel_low will be parsed only if crashkernel_high
reserved memory above 4G.

So if one gives following command line.

crashkernel=256M;high crashkernel=100M;low

Final outcome will vary across systems. If system has all RAM below 4G
we will see only one 256M chunk reserved otherwise we will see one 256M
and one 100M chunk reserved. And a user might think that I asked you to
reserve two chunks. One 256M and otherr 100M.

Also interesting is, what if user specifies both crashkernel=X and
crashkernel=Y;high. Looks like we will ignore crashkernel=X and honor
only crashkernel=Y;high.

So the problem here is, do we want to parse multiple crashkernel=
command line and support reserving multiple ranges? Till 3.8 kernel
we did not do that.  If we want to do that, then parsing crashkernel=
logic needs to be more generic. 

- I would say that to keep things simple, we can stick to semantics
  of 3.8 kernel and say only first crashkernel= option is parsed and
  acted upon. Rest are ignored. Trying to support multiple ranges will
  require much more work.

- If we say that we will only parse first crashkernel= option, then 
  crashkernel=X;high and crashkernel0;low can not co-exist. I would say
  use a new option to disable automatically reserved low memory. Say,
  crashkernel_no_auto_low; That way it can co-exist with other
  crashkernel= options without any conflict.

  In fact this will also work with crashkernel=X, if we decide to extend
  crashkernel=X to look for memory below 4G and look beyond 4G.

- Support crashkernel=X;high as a new crashkernel= option.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 18:06   ` Vivek Goyal
@ 2013-04-02 18:42     ` Yinghai Lu
  2013-04-02 18:49       ` Yinghai Lu
  2013-04-02 19:09       ` Vivek Goyal
  0 siblings, 2 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 18:42 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 2, 2013 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 10:19:42AM -0700, Yinghai Lu wrote:
>
> [..]
>> Index: linux-2.6/Documentation/kernel-parameters.txt
>> ===================================================================
>> --- linux-2.6.orig/Documentation/kernel-parameters.txt
>> +++ linux-2.6/Documentation/kernel-parameters.txt
>> @@ -603,9 +603,13 @@ bytes respectively. Such letter suffixes
>>                       a memory unit (amount[KMG]). See also
>>                       Documentation/kdump/kdump.txt for an example.
>>
>> +     crashkernel_high=size[KMG]
>> +                     [KNL, x86_64] range could be above 4G. Allow kernel
>> +                     to allocate physical memory region from top, so could
>> +                     be above 4G if system have more than 4G ram installed.
>>       crashkernel_low=size[KMG]
>> -                     [KNL, x86_64] range under 4G. When crashkernel= is
>> -                     passed, kernel allocate physical memory region
>> +                     [KNL, x86_64] range under 4G. When crashkernel_high= is
>> +                     passed, kernel could allocate physical memory region
>>                       above 4G, that cause second kernel crash on system
>>                       that need swiotlb later. Kernel would try to allocate
>>                       some region below 4G automatically. This one let
>
> Hi Yinghai,
>
> I think there are still some issues with crashkernel= semantics.
>
> What if I specify both crashkernel_high= as well as crashkernel_low=.
> Looks like crashkernel_low will be parsed only if crashkernel_high
> reserved memory above 4G.
>
> So if one gives following command line.
>
> crashkernel=256M;high crashkernel=100M;low
>
> Final outcome will vary across systems. If system has all RAM below 4G
> we will see only one 256M chunk reserved otherwise we will see one 256M
> and one 100M chunk reserved. And a user might think that I asked you to
> reserve two chunks. One 256M and otherr 100M.

Yes, that is intentional.

If you like, I could remove that checking, just add the low.

>
> Also interesting is, what if user specifies both crashkernel=X and
> crashkernel=Y;high. Looks like we will ignore crashkernel=X and honor
> only crashkernel=Y;high.

Yes, that is intentional.

>
> So the problem here is, do we want to parse multiple crashkernel=
> command line and support reserving multiple ranges? Till 3.8 kernel
> we did not do that.  If we want to do that, then parsing crashkernel=
> logic needs to be more generic.
>
> - I would say that to keep things simple, we can stick to semantics
>   of 3.8 kernel and say only first crashkernel= option is parsed and
>   acted upon. Rest are ignored. Trying to support multiple ranges will
>   require much more work.

we could do that, but that is not necessary.

>
> - If we say that we will only parse first crashkernel= option, then
>   crashkernel=X;high and crashkernel0;low can not co-exist. I would say
>   use a new option to disable automatically reserved low memory. Say,
>   crashkernel_no_auto_low; That way it can co-exist with other
>   crashkernel= options without any conflict.

I don't see any reason to make them co-exist.

aka:
old kexec-tools stay with "crashkernel=X"
new kexec-tools stay with
1. like old kexec tools
2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
Y could be 100M or 0 etc.

>
>   In fact this will also work with crashkernel=X, if we decide to extend
>   crashkernel=X to look for memory below 4G and look beyond 4G.
>
> - Support crashkernel=X;high as a new crashkernel= option.

Actually we still support only one region that is could be high or low,
and that extra low is just for workaround
buggy system that does not support iommu with kdump.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 18:42     ` Yinghai Lu
@ 2013-04-02 18:49       ` Yinghai Lu
  2013-04-02 19:11         ` Vivek Goyal
  2013-04-02 19:09       ` Vivek Goyal
  1 sibling, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 18:49 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> aka:
> old kexec-tools stay with "crashkernel=X"
> new kexec-tools stay with
> 1. like old kexec tools
> 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> Y could be 100M or 0 etc.

I keep the old logic like:
if there are several "crashkernel=X,high", only last one is honored.
if there are several "crashkernel=Y,low", only last one is honored.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 18:42     ` Yinghai Lu
  2013-04-02 18:49       ` Yinghai Lu
@ 2013-04-02 19:09       ` Vivek Goyal
  2013-04-02 20:04         ` Yinghai Lu
  1 sibling, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2013-04-02 19:09 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 02, 2013 at 11:42:09AM -0700, Yinghai Lu wrote:
> On Tue, Apr 2, 2013 at 11:06 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 02, 2013 at 10:19:42AM -0700, Yinghai Lu wrote:
> >
> > [..]
> >> Index: linux-2.6/Documentation/kernel-parameters.txt
> >> ===================================================================
> >> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> >> +++ linux-2.6/Documentation/kernel-parameters.txt
> >> @@ -603,9 +603,13 @@ bytes respectively. Such letter suffixes
> >>                       a memory unit (amount[KMG]). See also
> >>                       Documentation/kdump/kdump.txt for an example.
> >>
> >> +     crashkernel_high=size[KMG]
> >> +                     [KNL, x86_64] range could be above 4G. Allow kernel
> >> +                     to allocate physical memory region from top, so could
> >> +                     be above 4G if system have more than 4G ram installed.
> >>       crashkernel_low=size[KMG]
> >> -                     [KNL, x86_64] range under 4G. When crashkernel= is
> >> -                     passed, kernel allocate physical memory region
> >> +                     [KNL, x86_64] range under 4G. When crashkernel_high= is
> >> +                     passed, kernel could allocate physical memory region
> >>                       above 4G, that cause second kernel crash on system
> >>                       that need swiotlb later. Kernel would try to allocate
> >>                       some region below 4G automatically. This one let
> >
> > Hi Yinghai,
> >
> > I think there are still some issues with crashkernel= semantics.
> >
> > What if I specify both crashkernel_high= as well as crashkernel_low=.
> > Looks like crashkernel_low will be parsed only if crashkernel_high
> > reserved memory above 4G.
> >
> > So if one gives following command line.
> >
> > crashkernel=256M;high crashkernel=100M;low
> >
> > Final outcome will vary across systems. If system has all RAM below 4G
> > we will see only one 256M chunk reserved otherwise we will see one 256M
> > and one 100M chunk reserved. And a user might think that I asked you to
> > reserve two chunks. One 256M and otherr 100M.
> 
> Yes, that is intentional.

Why it is intentional. It seems be to aberration from user's point of
view.

> 
> If you like, I could remove that checking, just add the low.
> 
> >
> > Also interesting is, what if user specifies both crashkernel=X and
> > crashkernel=Y;high. Looks like we will ignore crashkernel=X and honor
> > only crashkernel=Y;high.
> 
> Yes, that is intentional.

Again, it is not clear that why are we prefering crashkernel=Y;high
over crashkernel=X. There needs to be clearly defined behavior.

> 
> >
> > So the problem here is, do we want to parse multiple crashkernel=
> > command line and support reserving multiple ranges? Till 3.8 kernel
> > we did not do that.  If we want to do that, then parsing crashkernel=
> > logic needs to be more generic.
> >
> > - I would say that to keep things simple, we can stick to semantics
> >   of 3.8 kernel and say only first crashkernel= option is parsed and
> >   acted upon. Rest are ignored. Trying to support multiple ranges will
> >   require much more work.
> 
> we could do that, but that is not necessary.
> 
> >
> > - If we say that we will only parse first crashkernel= option, then
> >   crashkernel=X;high and crashkernel0;low can not co-exist. I would say
> >   use a new option to disable automatically reserved low memory. Say,
> >   crashkernel_no_auto_low; That way it can co-exist with other
> >   crashkernel= options without any conflict.
> 
> I don't see any reason to make them co-exist.

We still need to define a clear behavior. What happens if user specifies
multiple crashkernel= options.

> 
> aka:
> old kexec-tools stay with "crashkernel=X"
> new kexec-tools stay with
> 1. like old kexec tools
> 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> Y could be 100M or 0 etc.

You are thinking that user will specify only the options you are looking
for. But a user is free to specify all the possible inputs and we need
to define very clearly what happens in those cases.

> 
> >
> >   In fact this will also work with crashkernel=X, if we decide to extend
> >   crashkernel=X to look for memory below 4G and look beyond 4G.
> >
> > - Support crashkernel=X;high as a new crashkernel= option.
> 
> Actually we still support only one region that is could be high or low,
> and that extra low is just for workaround
> buggy system that does not support iommu with kdump.

Well, crashkernel=X;high crashkernel=Y;low will reserve two ranges (one
high and one low). So in some cases we are supporting 2 and in some
cases we are supporting 1 range.

So I still think that let us stick to old behavior of supporting one
crashkernel= option. Last crashkernel= option on command line will be
acted upon.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 18:49       ` Yinghai Lu
@ 2013-04-02 19:11         ` Vivek Goyal
  2013-04-02 20:00           ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2013-04-02 19:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > aka:
> > old kexec-tools stay with "crashkernel=X"
> > new kexec-tools stay with
> > 1. like old kexec tools
> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> > Y could be 100M or 0 etc.
> 
> I keep the old logic like:
> if there are several "crashkernel=X,high", only last one is honored.
> if there are several "crashkernel=Y,low", only last one is honored.

Yes but if different types of crashkernel= options are mixes then
behavior is not defined.

crashkernel=X,high crashkernel=X
crashkernel=X,high crashkernel=Y;low
crashkernel=Y;low crashkernel=X

And possibilities go on. So I think it makes life simpler if we always
parse last crashkernel= option and act upon that. And use
crashkernel_no_auto_low to opt out of auto reserved low memory area.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 19:11         ` Vivek Goyal
@ 2013-04-02 20:00           ` Yinghai Lu
  2013-04-02 20:11             ` Vivek Goyal
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 20:00 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 2, 2013 at 12:11 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
>> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> > aka:
>> > old kexec-tools stay with "crashkernel=X"
>> > new kexec-tools stay with
>> > 1. like old kexec tools
>> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
>> > Y could be 100M or 0 etc.
>>
>> I keep the old logic like:
>> if there are several "crashkernel=X,high", only last one is honored.
>> if there are several "crashkernel=Y,low", only last one is honored.
>
> Yes but if different types of crashkernel= options are mixes then
> behavior is not defined.

dmesg or /proc/iomem will show them what is finally reserved.

>
> crashkernel=X,high crashkernel=X

==> crashkernel=X is tossed away.

> crashkernel=X,high crashkernel=Y;low

normal case. if user want more or disable low range.

> crashkernel=Y;low crashkernel=X

crashkernel=X will be used.

>
> And possibilities go on. So I think it makes life simpler if we always
> parse last crashkernel= option and act upon that. And use
> crashkernel_no_auto_low to opt out of auto reserved low memory area.

No, that is not just disable. User could want more like 128M instead of 72M.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 19:09       ` Vivek Goyal
@ 2013-04-02 20:04         ` Yinghai Lu
  0 siblings, 0 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 20:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 2, 2013 at 12:09 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>> Actually we still support only one region that is could be high or low,
>> and that extra low is just for workaround
>> buggy system that does not support iommu with kdump.
>
> Well, crashkernel=X;high crashkernel=Y;low will reserve two ranges (one
> high and one low). So in some cases we are supporting 2 and in some
> cases we are supporting 1 range.

when you have more 4G ram or not.

and when we have two ranges, low range actually is not for second kernel to
be loaded. and it is only for swiotlb in case.

>
> So I still think that let us stick to old behavior of supporting one
> crashkernel= option. Last crashkernel= option on command line will be
> acted upon.

Yes, only one crashkernel=;high and one crashkernel=;low is honored.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 20:00           ` Yinghai Lu
@ 2013-04-02 20:11             ` Vivek Goyal
  2013-04-02 20:25               ` Vivek Goyal
  2013-04-02 20:36               ` Yinghai Lu
  0 siblings, 2 replies; 25+ messages in thread
From: Vivek Goyal @ 2013-04-02 20:11 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 02, 2013 at 01:00:46PM -0700, Yinghai Lu wrote:
> On Tue, Apr 2, 2013 at 12:11 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
> >> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> >> > aka:
> >> > old kexec-tools stay with "crashkernel=X"
> >> > new kexec-tools stay with
> >> > 1. like old kexec tools
> >> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> >> > Y could be 100M or 0 etc.
> >>
> >> I keep the old logic like:
> >> if there are several "crashkernel=X,high", only last one is honored.
> >> if there are several "crashkernel=Y,low", only last one is honored.
> >
> > Yes but if different types of crashkernel= options are mixes then
> > behavior is not defined.
> 
> dmesg or /proc/iomem will show them what is finally reserved.
> 
> >
> > crashkernel=X,high crashkernel=X
> 
> ==> crashkernel=X is tossed away.

You are just describing what your code does. There is no theme or
justification behind this behavior. There is no uniformity. A user can
question that so far you used to honor last crashkernel= parameter and
suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
overriding crashkernel=X and it is not obivious why.

> 
> > crashkernel=X,high crashkernel=Y;low
> 
> normal case. if user want more or disable low range.

Again, behavior is not clear to user. Please stop describing your code
behavior. Discuss more in terms of presenting a consistent interface to
user.

> 
> > crashkernel=Y;low crashkernel=X
> 
> crashkernel=X will be used.

Why? When crashkernel=X is specified with crashkernel=Y;high, high takes
over but when crashkernel=X is specified with crashkernel=Y;low,
crashkernel=X takes over. These is highly unintutive.

> 
> >
> > And possibilities go on. So I think it makes life simpler if we always
> > parse last crashkernel= option and act upon that. And use
> > crashkernel_no_auto_low to opt out of auto reserved low memory area.
> 
> No, that is not just disable. User could want more like 128M instead of 72M.

If user wants 128M in low memory, they will just specify
crashkernel=128M;low

If they want to control multiple ranges of memory, then that's the feature
we currently don't support. Currently we support only reserving one range
of memory.

If you want to support multiple ranges of memory,then do it properly.
Parse all crashkernel= options, prepare a list of memory to be reserved
and unreserved, resolve all the conflicts between various options and
then reserve the memory. But that does not seem to be a requirement at
this point of time.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 20:11             ` Vivek Goyal
@ 2013-04-02 20:25               ` Vivek Goyal
  2013-04-02 20:36               ` Yinghai Lu
  1 sibling, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2013-04-02 20:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 02, 2013 at 04:11:48PM -0400, Vivek Goyal wrote:

[..]
> > No, that is not just disable. User could want more like 128M instead of 72M.

So apart from swiotlb, are there other scenarios where we need to reserve
low memory (With main memory reserved high).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 20:11             ` Vivek Goyal
  2013-04-02 20:25               ` Vivek Goyal
@ 2013-04-02 20:36               ` Yinghai Lu
  2013-04-03 13:18                 ` Vivek Goyal
  1 sibling, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-04-02 20:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 2, 2013 at 1:11 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 01:00:46PM -0700, Yinghai Lu wrote:
>> On Tue, Apr 2, 2013 at 12:11 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
>> >> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> >> > aka:
>> >> > old kexec-tools stay with "crashkernel=X"
>> >> > new kexec-tools stay with
>> >> > 1. like old kexec tools
>> >> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
>> >> > Y could be 100M or 0 etc.
>> >>
>> >> I keep the old logic like:
>> >> if there are several "crashkernel=X,high", only last one is honored.
>> >> if there are several "crashkernel=Y,low", only last one is honored.
>> >
>> > Yes but if different types of crashkernel= options are mixes then
>> > behavior is not defined.
>>
>> dmesg or /proc/iomem will show them what is finally reserved.
>>
>> >
>> > crashkernel=X,high crashkernel=X
>>
>> ==> crashkernel=X is tossed away.
>
> You are just describing what your code does. There is no theme or
> justification behind this behavior. There is no uniformity. A user can
> question that so far you used to honor last crashkernel= parameter and
> suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
> overriding crashkernel=X and it is not obivious why.

Let me repeat again:
we keep crashkernel=X old behavior with old kexec-tools.
crashkernel=X;high is for new kexec-tools that support loading high.

If the user want to use ,high but still with old kexec-tools, that is
not going to work.

Can we just keep it separated?

>
>>
>> > crashkernel=X,high crashkernel=Y;low
>>
>> normal case. if user want more or disable low range.
>
> Again, behavior is not clear to user. Please stop describing your code
> behavior. Discuss more in terms of presenting a consistent interface to
> user.
>
>>
>> > crashkernel=Y;low crashkernel=X
>>
>> crashkernel=X will be used.
>
> Why? When crashkernel=X is specified with crashkernel=Y;high, high takes
> over but when crashkernel=X is specified with crashkernel=Y;low,
> crashkernel=X takes over. These is highly unintutive.
>
>>
>> >
>> > And possibilities go on. So I think it makes life simpler if we always
>> > parse last crashkernel= option and act upon that. And use
>> > crashkernel_no_auto_low to opt out of auto reserved low memory area.
>>
>> No, that is not just disable. User could want more like 128M instead of 72M.
>
> If user wants 128M in low memory, they will just specify
> crashkernel=128M;low

in the kernel-parameter.txt, already says ;low is need to used with ;high.

>
> If they want to control multiple ranges of memory, then that's the feature
> we currently don't support. Currently we support only reserving one range
> of memory.
>
> If you want to support multiple ranges of memory,then do it properly.
> Parse all crashkernel= options, prepare a list of memory to be reserved
> and unreserved, resolve all the conflicts between various options and
> then reserve the memory. But that does not seem to be a requirement at
> this point of time.

No we does not support multiple ranges, as it will need more changes
in kexec-tools.

Can we stop here with those four patches?

Later, we can extend it if it is really needed.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-02 20:36               ` Yinghai Lu
@ 2013-04-03 13:18                 ` Vivek Goyal
  2013-04-03 17:12                   ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2013-04-03 13:18 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Tue, Apr 02, 2013 at 01:36:02PM -0700, Yinghai Lu wrote:

[..]
> > You are just describing what your code does. There is no theme or
> > justification behind this behavior. There is no uniformity. A user can
> > question that so far you used to honor last crashkernel= parameter and
> > suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
> > overriding crashkernel=X and it is not obivious why.
> 
> Let me repeat again:
> we keep crashkernel=X old behavior with old kexec-tools.
> crashkernel=X;high is for new kexec-tools that support loading high.
> 
> If the user want to use ,high but still with old kexec-tools, that is
> not going to work.
> 
> Can we just keep it separated?

Kernel does not know about old kexec-tools or new kexec-tools. Neither
kernel can enforce what command line options are passed by user. So
kernel needs to define a clean interface here which is easily understood
and is extensible also in future.

[..]
> >
> > If user wants 128M in low memory, they will just specify
> > crashkernel=128M;low
> 
> in the kernel-parameter.txt, already says ;low is need to used with ;high.

But why are we tying ;low to ;high. One should be easily extend
crashkernel=X to be able to reserve memory above 4G if specified amount
is not available below 4G. In that case also one might want to reserve
some low memory?

For that matter crashkernel=range1:size,range2:size syntax should be
extendible too to reserve memory above 4G if desired size of memory
is not available in low memory.

Now in those cases too, one would like to have 72M of low memory
reserved. So ;low shoud not be tied to ;high necessarily.

In fact current code does not care whetehr ;high was specified or not.
If memory is reserved above 4G, ;low code will kick in.

> 
> >
> > If they want to control multiple ranges of memory, then that's the feature
> > we currently don't support. Currently we support only reserving one range
> > of memory.
> >
> > If you want to support multiple ranges of memory,then do it properly.
> > Parse all crashkernel= options, prepare a list of memory to be reserved
> > and unreserved, resolve all the conflicts between various options and
> > then reserve the memory. But that does not seem to be a requirement at
> > this point of time.
> 
> No we does not support multiple ranges, as it will need more changes
> in kexec-tools.
> 
> Can we stop here with those four patches?
> 
> Later, we can extend it if it is really needed.

crashkernel= options are already confusing. I think we with this patchset
we will just make them even more confusing and future extensions
difficult.

We really need to stick to the notion of only one crashkernel= option
is accepted and that is last one on command line. And if need be,
we need to work on multi range reservation feature where we process
and reserve ranges as specified by all crashkernel= parameters on
command line.

Creating new combinations where some crashkernel= are preferred over
others and some crashkernel= options work with only selected crashkernel=
options, is asking for trouble, especially keeping in mind future
extensions.

I prefer following for 3.9.

- process only right most crashkernel= option.
- implement crashkernel_no_auto_low option to opt out of auto reserved
  low memory
- implement crashkernel=X;high to support high memory reservations.

And now old kexec-tools user can use crashkernel=X while users needing
high memory reservation can use crashkernel=X;high.

If you really want to support user defined crashkernel=X;low along with
crashkernel=Y;high, that is really a multi range reservation feature and
need to be implemented properly instead of coming up with short cuts.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 13:18                 ` Vivek Goyal
@ 2013-04-03 17:12                   ` Yinghai Lu
  2013-04-03 17:32                     ` Yinghai Lu
  2013-04-03 17:36                     ` Vivek Goyal
  0 siblings, 2 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-03 17:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 01:36:02PM -0700, Yinghai Lu wrote:
>
> [..]
>> > You are just describing what your code does. There is no theme or
>> > justification behind this behavior. There is no uniformity. A user can
>> > question that so far you used to honor last crashkernel= parameter and
>> > suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
>> > overriding crashkernel=X and it is not obivious why.
>>
>> Let me repeat again:
>> we keep crashkernel=X old behavior with old kexec-tools.
>> crashkernel=X;high is for new kexec-tools that support loading high.
>>
>> If the user want to use ,high but still with old kexec-tools, that is
>> not going to work.
>>
>> Can we just keep it separated?
>
> Kernel does not know about old kexec-tools or new kexec-tools. Neither
> kernel can enforce what command line options are passed by user. So
> kernel needs to define a clean interface here which is easily understood
> and is extensible also in future.

Looks you are chasing wrong direction.

Those four patches fixes the regression that Wang and you reported,
User don't need to change their kexec-tools and boot command lines
kdump still works.

We will never can stop user doing crazy thing with their system.

>
> [..]
>> >
>> > If user wants 128M in low memory, they will just specify
>> > crashkernel=128M;low
>>
>> in the kernel-parameter.txt, already says ;low is need to used with ;high.
>
> But why are we tying ;low to ;high. One should be easily extend
> crashkernel=X to be able to reserve memory above 4G if specified amount
> is not available below 4G. In that case also one might want to reserve
> some low memory?

I want to keep crashkernel=X to the old behavior.

If you want to have crashkernel=X to allocate high above 4G, old kexec-tools
will not work with new kernel.

>
> For that matter crashkernel=range1:size,range2:size syntax should be
> extendible too to reserve memory above 4G if desired size of memory
> is not available in low memory.
>
> Now in those cases too, one would like to have 72M of low memory
> reserved. So ;low shoud not be tied to ;high necessarily.
>
> In fact current code does not care whetehr ;high was specified or not.
> If memory is reserved above 4G, ;low code will kick in.

No, that is not right.

only when ;high is specified, kernel will try to allocate high above 4G.


>
>>
>> >
>> > If they want to control multiple ranges of memory, then that's the feature
>> > we currently don't support. Currently we support only reserving one range
>> > of memory.
>> >
>> > If you want to support multiple ranges of memory,then do it properly.
>> > Parse all crashkernel= options, prepare a list of memory to be reserved
>> > and unreserved, resolve all the conflicts between various options and
>> > then reserve the memory. But that does not seem to be a requirement at
>> > this point of time.
>>
>> No we does not support multiple ranges, as it will need more changes
>> in kexec-tools.
>>
>> Can we stop here with those four patches?
>>
>> Later, we can extend it if it is really needed.
>
> crashkernel= options are already confusing. I think we with this patchset
> we will just make them even more confusing and future extensions
> difficult.

So keep crashkernel= without high and low to old behavior.

>
> We really need to stick to the notion of only one crashkernel= option
> is accepted and that is last one on command line. And if need be,
> we need to work on multi range reservation feature where we process
> and reserve ranges as specified by all crashkernel= parameters on
> command line.

That is kept.

and only last high is honored

>
> Creating new combinations where some crashkernel= are preferred over
> others and some crashkernel= options work with only selected crashkernel=
> options, is asking for trouble, especially keeping in mind future
> extensions.

I don't think so.

old conf that works before still use crashkernel= with high and low.
old conf that does not work, could switch to crashkernel=;high/low
with new kexec-tools

>
> I prefer following for 3.9.
>
> - process only right most crashkernel= option.

what is "right most" ?
only last crashkernel=X is honored?
I restored that already with those four patches.

> - implement crashkernel_no_auto_low option to opt out of auto reserved
>   low memory

No, that is ugly.

> - implement crashkernel=X;high to support high memory reservations.
>
> And now old kexec-tools user can use crashkernel=X while users needing
> high memory reservation can use crashkernel=X;high.

The four patches did not do that?

>
> If you really want to support user defined crashkernel=X;low along with
> crashkernel=Y;high, that is really a multi range reservation feature and
> need to be implemented properly instead of coming up with short cuts.

No it is not.

It's *you* want me to change "Crash kernel low" to "Crash kernel".

Do we need to drop second patch? So will still keep
"Crash kernel low" in /proc/iomem?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 17:12                   ` Yinghai Lu
@ 2013-04-03 17:32                     ` Yinghai Lu
  2013-04-03 17:47                       ` Vivek Goyal
  2013-04-03 17:36                     ` Vivek Goyal
  1 sibling, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-04-03 17:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 3, 2013 at 10:12 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>
>> - implement crashkernel_no_auto_low option to opt out of auto reserved
>>   low memory
>
> No, that is ugly.
...
>
> It's *you* want me to change "Crash kernel low" to "Crash kernel".
>
> Do we need to drop second patch? So will still keep
> "Crash kernel low" in /proc/iomem?

also we can drop the last patch and keep "crashkernel_high=" and
"crashkernel_low="

as you even like to introduce "crashkernel_no_auto_low".

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 17:12                   ` Yinghai Lu
  2013-04-03 17:32                     ` Yinghai Lu
@ 2013-04-03 17:36                     ` Vivek Goyal
  1 sibling, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2013-04-03 17:36 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 03, 2013 at 10:12:46AM -0700, Yinghai Lu wrote:

[..]
> >> Can we just keep it separated?
> >
> > Kernel does not know about old kexec-tools or new kexec-tools. Neither
> > kernel can enforce what command line options are passed by user. So
> > kernel needs to define a clean interface here which is easily understood
> > and is extensible also in future.
> 
> Looks you are chasing wrong direction.

I don't think so. Once we are defining bunch of crashkernel= parameters
we need to make sure it is well understood how do they work together.

> 
> Those four patches fixes the regression that Wang and you reported,
> User don't need to change their kexec-tools and boot command lines
> kdump still works.

I am not objecting to that. I am objecting to introducing unexpected
behavior in crashkernel= command line space and ignoring how does
it co-exist with existing syntax.

> 
> We will never can stop user doing crazy thing with their system.

Yes, but we need to make sure if bunch of crashkernel= are passed on
kernel command line then behavior is well defined and extensible for
future usage.

[..]
> > But why are we tying ;low to ;high. One should be easily extend
> > crashkernel=X to be able to reserve memory above 4G if specified amount
> > is not available below 4G. In that case also one might want to reserve
> > some low memory?
> 
> I want to keep crashkernel=X to the old behavior.
> 
> If you want to have crashkernel=X to allocate high above 4G, old kexec-tools
> will not work with new kernel.

It does not work anyway. Because current crashkernel=X will fail to
reserve memory if sufficient memory is not available below 896MB. So no
surprises there.

It will help new kexec-tools continue to work with crashkernel=X even
in high memory ranges.

> >
> > For that matter crashkernel=range1:size,range2:size syntax should be
> > extendible too to reserve memory above 4G if desired size of memory
> > is not available in low memory.
> >
> > Now in those cases too, one would like to have 72M of low memory
> > reserved. So ;low shoud not be tied to ;high necessarily.
> >
> > In fact current code does not care whetehr ;high was specified or not.
> > If memory is reserved above 4G, ;low code will kick in.
> 
> No, that is not right.
> 
> only when ;high is specified, kernel will try to allocate high above 4G.

As of today. But one can always extend crashkernel=X to allocate from
high addresses if memory is not available in low addresses without
breaking old tools.

Current ;low logic does not care whether high reservation was done using
crashkernel=X or crashkernel=X;high. And it should not. Tying ;low with
;high is the problem. Each crashkernel= directive should be able to 
specify its own range to reserve with constraints. There is no dependency
on other crashkernel= options passed. And that keeps the behavior well
defined.

If we start introducing dependency between various crashkernel= options
our behavior matrix will explode and it will be very difficult to explain
how does it work.

[..]
> > We really need to stick to the notion of only one crashkernel= option
> > is accepted and that is last one on command line. And if need be,
> > we need to work on multi range reservation feature where we process
> > and reserve ranges as specified by all crashkernel= parameters on
> > command line.
> 
> That is kept.
> 
> and only last high is honored

What about rest of the crashkernel=. I am not sure why are you not
seeing that you are stepping onto to already defined crashkernel=
command line option and breaking its semantics.

If you were defining crashkernel_foo, I couldn't care less.

Given the fact you are using crashkernel=, you need to take already
defined parameters in to consideration and stick to those semantics.

- Either support single range reservation and always use rightmost
  crashkernel= option.

- Or support multiple ranges and process all the crashkernel= options
  as specified on command line.

Please don't define more modes here.
 
> 
> >
> > Creating new combinations where some crashkernel= are preferred over
> > others and some crashkernel= options work with only selected crashkernel=
> > options, is asking for trouble, especially keeping in mind future
> > extensions.
> 
> I don't think so.
> 
> old conf that works before still use crashkernel= with high and low.
> old conf that does not work, could switch to crashkernel=;high/low
> with new kexec-tools

Please come out of this old conf and new conf mode. That is specific
usage of interface you are providing. But interface semantics should
still be well defined. And currently they are not.

> 
> >
> > I prefer following for 3.9.
> >
> > - process only right most crashkernel= option.
> 
> what is "right most" ?
> only last crashkernel=X is honored?
> I restored that already with those four patches.

only last crashkernel= is honored. It could be either crashkernel=X
or crashkernel=X;high or crashkernel=X;<option1>;<option2>;....

Point being crashkernel= space is taken and there is scope for defining
more options to it as our needs grow. 

> 
> > - implement crashkernel_no_auto_low option to opt out of auto reserved
> >   low memory
> 
> No, that is ugly.

It might be but it is better than special casing ;high or ;low.

> 
> > - implement crashkernel=X;high to support high memory reservations.
> >
> > And now old kexec-tools user can use crashkernel=X while users needing
> > high memory reservation can use crashkernel=X;high.
> 
> The four patches did not do that?

I am objecting to.

- Not parsing right most crashkernel=
- Tying ;high and ;low together.

> 
> >
> > If you really want to support user defined crashkernel=X;low along with
> > crashkernel=Y;high, that is really a multi range reservation feature and
> > need to be implemented properly instead of coming up with short cuts.
> 
> No it is not.
> 
> It's *you* want me to change "Crash kernel low" to "Crash kernel".
> 
> Do we need to drop second patch? So will still keep
> "Crash kernel low" in /proc/iomem?

No. I want to keep everything named "Crash kernel". There is no point
in naming low memory as "Crash kernel low".

Just that right now we don't support reserving multiple range feature,
execpt the exception of a low memory reserved automatically. And we
are providing a parameter to disable reservation of low memory.

If we don't support multiple ranges, don't try to special case it by
;high and ;low combination. Then one needs to parse all crashkernel=
options passed on command line (including crashkernel=X) and reserve
all the ranges as specified by various crashkernel= options.

You can't say, well ;high is specified so I am going to process only
crashkernel=X;high and crashkernel=Y;low and ignore rest of the
crashkernel= options. That does not make sense to me.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 17:32                     ` Yinghai Lu
@ 2013-04-03 17:47                       ` Vivek Goyal
  2013-04-03 20:38                         ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2013-04-03 17:47 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 03, 2013 at 10:32:23AM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 10:12 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> >> - implement crashkernel_no_auto_low option to opt out of auto reserved
> >>   low memory
> >
> > No, that is ugly.
> ...
> >
> > It's *you* want me to change "Crash kernel low" to "Crash kernel".
> >
> > Do we need to drop second patch? So will still keep
> > "Crash kernel low" in /proc/iomem?
> 
> also we can drop the last patch and keep "crashkernel_high=" and
> "crashkernel_low="

as hpa mentioned, we should express memory reservation and dependency
of it in crashkernel= options. So introducing crashkernel_high or
crashkernel_low, just because you we don't want to support multiple
ranges is a kludge.

> 
> as you even like to introduce "crashkernel_no_auto_low".

This is a kludge too for ease of use. At least it does not spoil 
crashkernel= space and also works with existing crashkernel=X
parameters.

You know what, I think multiple ranges has another problem. And that is
all of the kexec/kdump code is written thinking there is one contiguous
reserved range.

        /* Verify we have a valid entry point */
        if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
                result = -EADDRNOTAVAIL;
                goto out;
        }


Also look at crash_shrink_memory().

So what I am saying that all our code is written assuming there is one
single reserved range. Now if we need to reserve two ranges, then let
us make it generic to suppoprt multiple ranges instead of hardcoding
things and assume there can be 2 ranges. That will be a more generic
solution.

So how about this.

- In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
  memory. Support reservation of single range only. It could be either
  high or low.

- Those who are using iommu, they can use crashkernel=X;high. Old code
  can continue to use crashkernel=X and get memory reserved in low
  memory areas.

- In 3.10 add a feature to support multiple crash reserved ranges.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 17:47                       ` Vivek Goyal
@ 2013-04-03 20:38                         ` Yinghai Lu
  2013-04-03 21:00                           ` Vivek Goyal
  0 siblings, 1 reply; 25+ messages in thread
From: Yinghai Lu @ 2013-04-03 20:38 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> So what I am saying that all our code is written assuming there is one
> single reserved range. Now if we need to reserve two ranges, then let
> us make it generic to suppoprt multiple ranges instead of hardcoding
> things and assume there can be 2 ranges. That will be a more generic
> solution.

I don't think we have case that we need to support more than two ranges.

We only need to have one big range above 4G, and put second kernel and
initrd in it.
and another low one is only for switotlb or others that will be used by
second kernel.

>
> So how about this.
>
> - In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
>   memory. Support reservation of single range only. It could be either
>   high or low.
>
> - Those who are using iommu, they can use crashkernel=X;high. Old code
>   can continue to use crashkernel=X and get memory reserved in low
>   memory areas.

That will not handle the case: big system that crashkernel=X;high
and kdump does not work with iommu.

>
> - In 3.10 add a feature to support multiple crash reserved ranges.

Again, we only need one high and one low range.
We don't need to support more than two ranges for crash kernel.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 20:38                         ` Yinghai Lu
@ 2013-04-03 21:00                           ` Vivek Goyal
  2013-04-04  0:56                             ` Yinghai Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Vivek Goyal @ 2013-04-03 21:00 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 03, 2013 at 01:38:56PM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > So what I am saying that all our code is written assuming there is one
> > single reserved range. Now if we need to reserve two ranges, then let
> > us make it generic to suppoprt multiple ranges instead of hardcoding
> > things and assume there can be 2 ranges. That will be a more generic
> > solution.
> 
> I don't think we have case that we need to support more than two ranges.
> 

never say never.

One of the biggest problems is that we are trying to reserve all the
memory at system bootup time using kernel command line.

Why not reserve memory using some kernel interface after system has booted.
That will make life much simpler.

Reason being that currently we depend on one big single chunk being
available in low memory area. And after boot there is no guarantee
we will have that memory.

But what if we just reserve enough memory for kernel and initramfs 
during boot and rest of the memory is reserved from user space. If
system has more memory, there are high chances that after boot we will
get memory immediately.

What if we don't get a single range of memory, but multiple ranges,
we should still be able to make use of all those ranges. And I think
that is one possible area where multiple ranges could be useful.

So yes, we don't have an immediate use case, but one can always pop
up in future as we try to extend the functionality.


> We only need to have one big range above 4G, and put second kernel and
> initrd in it.
> and another low one is only for switotlb or others that will be used by
> second kernel.
> 
> >
> > So how about this.
> >
> > - In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
> >   memory. Support reservation of single range only. It could be either
> >   high or low.
> >
> > - Those who are using iommu, they can use crashkernel=X;high. Old code
> >   can continue to use crashkernel=X and get memory reserved in low
> >   memory areas.
> 
> That will not handle the case: big system that crashkernel=X;high
> and kdump does not work with iommu.

If we start supporting multiple ranges properly in 3.10, then it will.
And we have never supported it so it is not a regression. Delaying a
feature by a realease should be worth it in an attempt to do it
properly. 

> 
> >
> > - In 3.10 add a feature to support multiple crash reserved ranges.
> 
> Again, we only need one high and one low range.
> We don't need to support more than two ranges for crash kernel.

For your current use case, you need one high and one low currently. But 
there can always be scenarios where multiple crash ranges are required,
like reserving memory from user space.

Anyway, if you are adamant on hardcoding this stuff, please don't
use crashkernel= space. Continue to use crashkernel_high and/or
crashkernel_low options. That way it allows us to extend crashkernel=
in the form of crashkernel=X;<range-options>;<option2>;...
and support multiple range feature properly down the line and not be
bogged down with this new strange semantics.

Also docuemnt very clearly that specifying crashkernel_high ignores
any of the crashkernel= options.

Also what happens to reserve memory release interface.  IIUC, there
is no way to release crashkernel_low memory from user space and
only memory reserved by crashkernel_high will be released?

Seriously, why are you rushing with this high/low hardcoding. Why
not wait for one more release and support multiple ranges properly
both in kernel and kexec-tools.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-03 21:00                           ` Vivek Goyal
@ 2013-04-04  0:56                             ` Yinghai Lu
  2013-04-04 13:41                               ` Vivek Goyal
  2013-04-04 13:51                               ` Vivek Goyal
  0 siblings, 2 replies; 25+ messages in thread
From: Yinghai Lu @ 2013-04-04  0:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 3, 2013 at 2:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Apr 03, 2013 at 01:38:56PM -0700, Yinghai Lu wrote:
>> On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > So what I am saying that all our code is written assuming there is one
>> > single reserved range. Now if we need to reserve two ranges, then let
>> > us make it generic to suppoprt multiple ranges instead of hardcoding
>> > things and assume there can be 2 ranges. That will be a more generic
>> > solution.
>>
>> I don't think we have case that we need to support more than two ranges.
>>
>
> never say never.

One big rang under 4G is working well till second kernel need more than 512M
on bigger system with more memory.

Now we make the 64bit kernel/initrd etc could stay above 4G, we can have
enough range above 4G.

The only problem is during that second kernel running it need some ram
under 4G when iommu can not be used.

So have one big range above 4G, and a small range below 4g like 72M
is rational in a long run.

>
> One of the biggest problems is that we are trying to reserve all the
> memory at system bootup time using kernel command line.
>
> Why not reserve memory using some kernel interface after system has booted.
> That will make life much simpler.
>
> Reason being that currently we depend on one big single chunk being
> available in low memory area. And after boot there is no guarantee
> we will have that memory.
>
> But what if we just reserve enough memory for kernel and initramfs
> during boot and rest of the memory is reserved from user space. If
> system has more memory, there are high chances that after boot we will
> get memory immediately.

Allocate 72M continuous below 4G after system is booted up?

>
> What if we don't get a single range of memory, but multiple ranges,
> we should still be able to make use of all those ranges. And I think
> that is one possible area where multiple ranges could be useful.

swiotlb does not need continuous area?

>
> So yes, we don't have an immediate use case, but one can always pop
> up in future as we try to extend the functionality.

I don't think we need to extend it anymore with high/low two range support.

>
>
>> We only need to have one big range above 4G, and put second kernel and
>> initrd in it.
>> and another low one is only for switotlb or others that will be used by
>> second kernel.
>>
>> >
>> > So how about this.
>> >
>> > - In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
>> >   memory. Support reservation of single range only. It could be either
>> >   high or low.
>> >
>> > - Those who are using iommu, they can use crashkernel=X;high. Old code
>> >   can continue to use crashkernel=X and get memory reserved in low
>> >   memory areas.
>>
>> That will not handle the case: big system that crashkernel=X;high
>> and kdump does not work with iommu.
>
> If we start supporting multiple ranges properly in 3.10, then it will.
> And we have never supported it so it is not a regression. Delaying a
> feature by a realease should be worth it in an attempt to do it
> properly.
>
>>
>> >
>> > - In 3.10 add a feature to support multiple crash reserved ranges.
>>
>> Again, we only need one high and one low range.
>> We don't need to support more than two ranges for crash kernel.
>
> For your current use case, you need one high and one low currently. But
> there can always be scenarios where multiple crash ranges are required,
> like reserving memory from user space.
>
> Anyway, if you are adamant on hardcoding this stuff, please don't
> use crashkernel= space. Continue to use crashkernel_high and/or
> crashkernel_low options. That way it allows us to extend crashkernel=
> in the form of crashkernel=X;<range-options>;<option2>;...
> and support multiple range feature properly down the line and not be
> bogged down with this new strange semantics.
>
> Also docuemnt very clearly that specifying crashkernel_high ignores
> any of the crashkernel= options.

will change back to crashkernel= override crashkernel_high.

>
> Also what happens to reserve memory release interface.  IIUC, there
> is no way to release crashkernel_low memory from user space and
> only memory reserved by crashkernel_high will be released?

can not decode.


>
> Seriously, why are you rushing with this high/low hardcoding. Why
> not wait for one more release and support multiple ranges properly
> both in kernel and kexec-tools.

As more then 2 range is not needed.
These four patches are complete solution.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-04  0:56                             ` Yinghai Lu
@ 2013-04-04 13:41                               ` Vivek Goyal
  2013-04-04 13:51                               ` Vivek Goyal
  1 sibling, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2013-04-04 13:41 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 03, 2013 at 05:56:00PM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 2:00 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Apr 03, 2013 at 01:38:56PM -0700, Yinghai Lu wrote:
> >> On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > So what I am saying that all our code is written assuming there is one
> >> > single reserved range. Now if we need to reserve two ranges, then let
> >> > us make it generic to suppoprt multiple ranges instead of hardcoding
> >> > things and assume there can be 2 ranges. That will be a more generic
> >> > solution.
> >>
> >> I don't think we have case that we need to support more than two ranges.
> >>
> >
> > never say never.
> 
> One big rang under 4G is working well till second kernel need more than 512M
> on bigger system with more memory.
> 
> Now we make the 64bit kernel/initrd etc could stay above 4G, we can have
> enough range above 4G.
> 
> The only problem is during that second kernel running it need some ram
> under 4G when iommu can not be used.
> 
> So have one big range above 4G, and a small range below 4g like 72M
> is rational in a long run.

That's one use case. You have not addressed the issue of reserving memory
after boot and not finding one contiguous range and finding smaller
contiguous ranges.

> 
> >
> > One of the biggest problems is that we are trying to reserve all the
> > memory at system bootup time using kernel command line.
> >
> > Why not reserve memory using some kernel interface after system has booted.
> > That will make life much simpler.
> >
> > Reason being that currently we depend on one big single chunk being
> > available in low memory area. And after boot there is no guarantee
> > we will have that memory.
> >
> > But what if we just reserve enough memory for kernel and initramfs
> > during boot and rest of the memory is reserved from user space. If
> > system has more memory, there are high chances that after boot we will
> > get memory immediately.
> 
> Allocate 72M continuous below 4G after system is booted up?
> 
> >
> > What if we don't get a single range of memory, but multiple ranges,
> > we should still be able to make use of all those ranges. And I think
> > that is one possible area where multiple ranges could be useful.
> 
> swiotlb does not need continuous area?

One chunk could be below 4G during boot. Say 128M in low memory area
which is sufficient to load kernel, initrd and swiotlb. Rest of the
memory reservation could come after boot on need basis and that does
not have to be contiguous.

> 
> >
> > So yes, we don't have an immediate use case, but one can always pop
> > up in future as we try to extend the functionality.
> 
> I don't think we need to extend it anymore with high/low two range support.
> 
> >
> >
> >> We only need to have one big range above 4G, and put second kernel and
> >> initrd in it.
> >> and another low one is only for switotlb or others that will be used by
> >> second kernel.
> >>
> >> >
> >> > So how about this.
> >> >
> >> > - In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
> >> >   memory. Support reservation of single range only. It could be either
> >> >   high or low.
> >> >
> >> > - Those who are using iommu, they can use crashkernel=X;high. Old code
> >> >   can continue to use crashkernel=X and get memory reserved in low
> >> >   memory areas.
> >>
> >> That will not handle the case: big system that crashkernel=X;high
> >> and kdump does not work with iommu.
> >
> > If we start supporting multiple ranges properly in 3.10, then it will.
> > And we have never supported it so it is not a regression. Delaying a
> > feature by a realease should be worth it in an attempt to do it
> > properly.
> >
> >>
> >> >
> >> > - In 3.10 add a feature to support multiple crash reserved ranges.
> >>
> >> Again, we only need one high and one low range.
> >> We don't need to support more than two ranges for crash kernel.
> >
> > For your current use case, you need one high and one low currently. But
> > there can always be scenarios where multiple crash ranges are required,
> > like reserving memory from user space.
> >
> > Anyway, if you are adamant on hardcoding this stuff, please don't
> > use crashkernel= space. Continue to use crashkernel_high and/or
> > crashkernel_low options. That way it allows us to extend crashkernel=
> > in the form of crashkernel=X;<range-options>;<option2>;...
> > and support multiple range feature properly down the line and not be
> > bogged down with this new strange semantics.
> >
> > Also docuemnt very clearly that specifying crashkernel_high ignores
> > any of the crashkernel= options.
> 
> will change back to crashkernel= override crashkernel_high.
> 

[..]
> >
> > Also what happens to reserve memory release interface.  IIUC, there
> > is no way to release crashkernel_low memory from user space and
> > only memory reserved by crashkernel_high will be released?
> 
> can not decode.

How do we release low memory from user space. Look at
crash_shrink_memory().


> 
> 
> >
> > Seriously, why are you rushing with this high/low hardcoding. Why
> > not wait for one more release and support multiple ranges properly
> > both in kernel and kexec-tools.
> 
> As more then 2 range is not needed.
> These four patches are complete solution.
> 
> Thanks
> 
> Yinghai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low
  2013-04-04  0:56                             ` Yinghai Lu
  2013-04-04 13:41                               ` Vivek Goyal
@ 2013-04-04 13:51                               ` Vivek Goyal
  1 sibling, 0 replies; 25+ messages in thread
From: Vivek Goyal @ 2013-04-04 13:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao,
	Eric W. Biederman, Linux Kernel Mailing List

On Wed, Apr 03, 2013 at 05:56:00PM -0700, Yinghai Lu wrote:

[..]
> 
> One big rang under 4G is working well till second kernel need more than 512M
> on bigger system with more memory.

Currently one range is allocated below 896MB (and not 4G). So if we extend
crashkernel=X to first try below 896MB and then reserve below 4G, we are
just fine. And in fact we should be able to extend to even look beyond
4G if nothing suitable is available below 4G.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-04-04 13:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-02 17:19 [PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools Yinghai Lu
2013-04-02 17:19 ` [PATCH 1/4] x86, kdump: Set crashkernel_low automatically Yinghai Lu
2013-04-02 17:19 ` [PATCH 2/4] kexec: use Crash kernel for Crash kernel low Yinghai Lu
2013-04-02 17:19 ` [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low Yinghai Lu
2013-04-02 18:06   ` Vivek Goyal
2013-04-02 18:42     ` Yinghai Lu
2013-04-02 18:49       ` Yinghai Lu
2013-04-02 19:11         ` Vivek Goyal
2013-04-02 20:00           ` Yinghai Lu
2013-04-02 20:11             ` Vivek Goyal
2013-04-02 20:25               ` Vivek Goyal
2013-04-02 20:36               ` Yinghai Lu
2013-04-03 13:18                 ` Vivek Goyal
2013-04-03 17:12                   ` Yinghai Lu
2013-04-03 17:32                     ` Yinghai Lu
2013-04-03 17:47                       ` Vivek Goyal
2013-04-03 20:38                         ` Yinghai Lu
2013-04-03 21:00                           ` Vivek Goyal
2013-04-04  0:56                             ` Yinghai Lu
2013-04-04 13:41                               ` Vivek Goyal
2013-04-04 13:51                               ` Vivek Goyal
2013-04-03 17:36                     ` Vivek Goyal
2013-04-02 19:09       ` Vivek Goyal
2013-04-02 20:04         ` Yinghai Lu
2013-04-02 17:19 ` [PATCH] x86, kdump: Change crashkernel_high/low= to crashkernel=;high/low Yinghai Lu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.