All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default
@ 2019-04-21  3:50 Dave Young
  2019-04-21  3:51 ` [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory Dave Young
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dave Young @ 2019-04-21  3:50 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, bhe, piliu, Yinghai Lu,
	Eric Biederman

The kdump crashkernel low reservation is limited to under 896M even for
X86_64. This obscure and miserable limitation exists for old kexec-tools
compatibility, but the reason is not documented anywhere.

Some more tests/investigations about the background:
a) Previously old kexec-tools can only load purgatory to memory under 2G,
   Eric remove that limitation in 2012 in kexec-tools:
   Commit b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere
   in the 64bit address space.")

b) back in 2013 Yinghai removed all the limitations in new kexec-tools,
   bzImage64 can be loaded to anywhere.
   Commit 82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G")

c) test results with old kexec-tools with old and latest kernels.
  1. old kexec-tools can not build with modern toolchain anymore,
     I built it in a RHEL6 vm
  2. 2.0.0 kexec-tools does not work with latest kernel even with
     memory under 896M and give an error:
     "ELF core (kcore) parse failed", it needs below kexec-tools fix 
     Commit ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid")
  3. even with patched kexec-tools which fixes 2),  it still needs some
     other fixes to work correctly for kaslr enabled kernels.

So the situation is:
* old kexec-tools is already broken with latest kernels
* we can not keep this limitations forever just for compatibility of very
  old kexec-tools.
* If one must use old tools then he/she can choose crashkernel=X@Y
* people have reported bugs crashkernel=384M failed because kaslr makes
  the 0-896M space sparse, 
* crashkernel can reserve in low or high area, it is natural to understand 
  low as memory under 4G

Hence drop the 896M limitation, and change crashkernel low reservation to
reserve under 4G by default.

Signed-off-by: Dave Young <dyoung@redhat.com>
---
 arch/x86/kernel/setup.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -71,6 +71,7 @@
 #include <linux/tboot.h>
 #include <linux/jiffies.h>
 #include <linux/mem_encrypt.h>
+#include <linux/sizes.h>
 
 #include <linux/usb/xhci-dbgp.h>
 #include <video/edid.h>
@@ -448,18 +449,17 @@ static void __init memblock_x86_reserve_
 #ifdef CONFIG_KEXEC_CORE
 
 /* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN		(16 << 20)
+#define CRASH_ALIGN		SZ_16M
 
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64bit, old kexec-tools need to under 896MiB.
  */
 #ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX	(512 << 20)
-# define CRASH_ADDR_HIGH_MAX	(512 << 20)
+# define CRASH_ADDR_LOW_MAX	SZ_512M
+# define CRASH_ADDR_HIGH_MAX	SZ_512M
 #else
-# define CRASH_ADDR_LOW_MAX	(896UL << 20)
+# define CRASH_ADDR_LOW_MAX	SZ_4G
 # define CRASH_ADDR_HIGH_MAX	MAXMEM
 #endif
 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory
  2019-04-21  3:50 [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Dave Young
@ 2019-04-21  3:51 ` Dave Young
  2019-04-21 18:26   ` Ingo Molnar
  2019-04-22  3:19   ` [PATCH 2/2 update] " Dave Young
  2019-04-21 18:24 ` [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Ingo Molnar
  2019-04-22  8:28 ` [tip:x86/kdump] x86/kdump: Have crashkernel=X " tip-bot for Dave Young
  2 siblings, 2 replies; 9+ messages in thread
From: Dave Young @ 2019-04-21  3:51 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, bhe, piliu, Yinghai Lu,
	Eric Biederman

crashkernel=xM tries to reserve crashkernel memory under 4G, which
is enough for usual cases.  But this could fail sometimes, for example
one tries to reserve a big chunk like 2G, it is possible to fail.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range.  Do not set the ,high as default
because it allocs extra low memory for DMA buffers and swiotlb, this is
not always necessary for all machines. Typically like crashkernel=128M
usually work with low reservation under 4G, so still keep <4G as default.

Signed-off-by: Dave Young <dyoung@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |    7 +++++--
 arch/x86/kernel/setup.c                         |   22 ++++++++++++++--------
 2 files changed, 19 insertions(+), 10 deletions(-)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
 	}
 
 	/* 0 means: find the address automatically */
-	if (crash_base <= 0) {
+	if (!crash_base) {
 		/*
 		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-		 * as old kexec-tools loads bzImage below that, unless
-		 * "crashkernel=size[KMG],high" is specified.
+		 * as crashkernel=x,high allocs memory over 4G, also allocs
+		 * 256M extra low memory for DMA buffers and swiotlb.
+		 * but the extra memory is not required for all machines.
+		 * So prefer low memory first, and fallback to high memory
+		 * unless "crashkernel=size[KMG],high" is specified.
 		 */
-		crash_base = memblock_find_in_range(CRASH_ALIGN,
-						    high ? CRASH_ADDR_HIGH_MAX
-							 : CRASH_ADDR_LOW_MAX,
-						    crash_size, CRASH_ALIGN);
+		if (!high)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_LOW_MAX,
+						crash_size, CRASH_ALIGN);
+		if (!crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX,
+						crash_size, CRASH_ALIGN);
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
-
 	} else {
 		unsigned long long start;
 
--- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
@@ -704,8 +704,11 @@
 			upon panic. This parameter reserves the physical
 			memory region [offset, offset + size] for that kernel
 			image. If '@offset' is omitted, then a suitable offset
-			is selected automatically. Check
-			Documentation/kdump/kdump.txt for further details.
+			is selected automatically.
+			[KNL, x86_64] select a region under 4G first, and
+			fallback to reserve region above 4G in case without
+			'@offset'.
+			See Documentation/kdump/kdump.txt for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default
  2019-04-21  3:50 [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Dave Young
  2019-04-21  3:51 ` [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory Dave Young
@ 2019-04-21 18:24 ` Ingo Molnar
  2019-04-22  8:28 ` [tip:x86/kdump] x86/kdump: Have crashkernel=X " tip-bot for Dave Young
  2 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2019-04-21 18:24 UTC (permalink / raw)
  To: Dave Young
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, bhe, piliu, Yinghai Lu,
	Eric Biederman


* Dave Young <dyoung@redhat.com> wrote:

> The kdump crashkernel low reservation is limited to under 896M even for
> X86_64. This obscure and miserable limitation exists for old kexec-tools
> compatibility, but the reason is not documented anywhere.
> 
> Some more tests/investigations about the background:
> a) Previously old kexec-tools can only load purgatory to memory under 2G,
>    Eric remove that limitation in 2012 in kexec-tools:
>    Commit b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere
>    in the 64bit address space.")
> 
> b) back in 2013 Yinghai removed all the limitations in new kexec-tools,
>    bzImage64 can be loaded to anywhere.
>    Commit 82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G")
> 
> c) test results with old kexec-tools with old and latest kernels.
>   1. old kexec-tools can not build with modern toolchain anymore,
>      I built it in a RHEL6 vm
>   2. 2.0.0 kexec-tools does not work with latest kernel even with
>      memory under 896M and give an error:
>      "ELF core (kcore) parse failed", it needs below kexec-tools fix 
>      Commit ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid")
>   3. even with patched kexec-tools which fixes 2),  it still needs some
>      other fixes to work correctly for kaslr enabled kernels.
> 
> So the situation is:
> * old kexec-tools is already broken with latest kernels
> * we can not keep this limitations forever just for compatibility of very
>   old kexec-tools.
> * If one must use old tools then he/she can choose crashkernel=X@Y
> * people have reported bugs crashkernel=384M failed because kaslr makes
>   the 0-896M space sparse, 
> * crashkernel can reserve in low or high area, it is natural to understand 
>   low as memory under 4G
> 
> Hence drop the 896M limitation, and change crashkernel low reservation to
> reserve under 4G by default.
> 
> Signed-off-by: Dave Young <dyoung@redhat.com>
> ---
>  arch/x86/kernel/setup.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -71,6 +71,7 @@
>  #include <linux/tboot.h>
>  #include <linux/jiffies.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/sizes.h>
>  
>  #include <linux/usb/xhci-dbgp.h>
>  #include <video/edid.h>
> @@ -448,18 +449,17 @@ static void __init memblock_x86_reserve_
>  #ifdef CONFIG_KEXEC_CORE
>  
>  /* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN		(16 << 20)
> +#define CRASH_ALIGN		SZ_16M
>  
>  /*
>   * Keep the crash kernel below this limit.  On 32 bits earlier kernels
>   * would limit the kernel to the low 512 MiB due to mapping restrictions.
> - * On 64bit, old kexec-tools need to under 896MiB.
>   */
>  #ifdef CONFIG_X86_32
> -# define CRASH_ADDR_LOW_MAX	(512 << 20)
> -# define CRASH_ADDR_HIGH_MAX	(512 << 20)
> +# define CRASH_ADDR_LOW_MAX	SZ_512M
> +# define CRASH_ADDR_HIGH_MAX	SZ_512M
>  #else
> -# define CRASH_ADDR_LOW_MAX	(896UL << 20)
> +# define CRASH_ADDR_LOW_MAX	SZ_4G
>  # define CRASH_ADDR_HIGH_MAX	MAXMEM
>  #endif

Reviewed-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory
  2019-04-21  3:51 ` [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory Dave Young
@ 2019-04-21 18:26   ` Ingo Molnar
  2019-04-22  3:03     ` Dave Young
  2019-04-22  3:19   ` [PATCH 2/2 update] " Dave Young
  1 sibling, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2019-04-21 18:26 UTC (permalink / raw)
  To: Dave Young
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, bhe, piliu, Yinghai Lu,
	Eric Biederman


* Dave Young <dyoung@redhat.com> wrote:

> crashkernel=xM tries to reserve crashkernel memory under 4G, which
> is enough for usual cases.  But this could fail sometimes, for example
> one tries to reserve a big chunk like 2G, it is possible to fail.
> 
> So let the crashkernel=xM just fall back to use high memory in case it
> fails to find a suitable low range.  Do not set the ,high as default
> because it allocs extra low memory for DMA buffers and swiotlb, this is
> not always necessary for all machines. Typically like crashkernel=128M
> usually work with low reservation under 4G, so still keep <4G as default.
> 
> Signed-off-by: Dave Young <dyoung@redhat.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |    7 +++++--
>  arch/x86/kernel/setup.c                         |   22 ++++++++++++++--------
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
>  	}
>  
>  	/* 0 means: find the address automatically */
> -	if (crash_base <= 0) {
> +	if (!crash_base) {
>  		/*
>  		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -		 * as old kexec-tools loads bzImage below that, unless
> -		 * "crashkernel=size[KMG],high" is specified.
> +		 * as crashkernel=x,high allocs memory over 4G, also allocs

s/allocs
 /allocates

> +		 * 256M extra low memory for DMA buffers and swiotlb.
> +		 * but the extra memory is not required for all machines.
> +		 * So prefer low memory first, and fallback to high memory

s/fallback
 /fall back

> +		 * unless "crashkernel=size[KMG],high" is specified.
>  		 */
> -		crash_base = memblock_find_in_range(CRASH_ALIGN,
> -						    high ? CRASH_ADDR_HIGH_MAX
> -							 : CRASH_ADDR_LOW_MAX,
> -						    crash_size, CRASH_ALIGN);
> +		if (!high)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_LOW_MAX,
> +						crash_size, CRASH_ALIGN);
> +		if (!crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX,
> +						crash_size, CRASH_ALIGN);
>  		if (!crash_base) {
>  			pr_info("crashkernel reservation failed - No suitable area found.\n");
>  			return;
>  		}
> -
>  	} else {
>  		unsigned long long start;
>  
> --- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
> +++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
> @@ -704,8 +704,11 @@
>  			upon panic. This parameter reserves the physical
>  			memory region [offset, offset + size] for that kernel
>  			image. If '@offset' is omitted, then a suitable offset
> -			is selected automatically. Check
> -			Documentation/kdump/kdump.txt for further details.
> +			is selected automatically.
> +			[KNL, x86_64] select a region under 4G first, and
> +			fallback to reserve region above 4G in case without

s/fallback
 /fall back

> +			'@offset'.
> +			See Documentation/kdump/kdump.txt for further details.
>  
>  	crashkernel=range1:size1[,range2:size2,...][@offset]
>  			[KNL] Same as above, but depends on the memory

With the nits fixed:

Reviewed-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory
  2019-04-21 18:26   ` Ingo Molnar
@ 2019-04-22  3:03     ` Dave Young
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Young @ 2019-04-22  3:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, bhe, piliu, Yinghai Lu,
	Eric Biederman

On 04/21/19 at 08:26pm, Ingo Molnar wrote:
> 
> * Dave Young <dyoung@redhat.com> wrote:
> 
> > crashkernel=xM tries to reserve crashkernel memory under 4G, which
> > is enough for usual cases.  But this could fail sometimes, for example
> > one tries to reserve a big chunk like 2G, it is possible to fail.
> > 
> > So let the crashkernel=xM just fall back to use high memory in case it
> > fails to find a suitable low range.  Do not set the ,high as default
> > because it allocs extra low memory for DMA buffers and swiotlb, this is
> > not always necessary for all machines. Typically like crashkernel=128M
> > usually work with low reservation under 4G, so still keep <4G as default.
> > 
> > Signed-off-by: Dave Young <dyoung@redhat.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |    7 +++++--
> >  arch/x86/kernel/setup.c                         |   22 ++++++++++++++--------
> >  2 files changed, 19 insertions(+), 10 deletions(-)
> > 
> > --- linux-x86.orig/arch/x86/kernel/setup.c
> > +++ linux-x86/arch/x86/kernel/setup.c
> > @@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
> >  	}
> >  
> >  	/* 0 means: find the address automatically */
> > -	if (crash_base <= 0) {
> > +	if (!crash_base) {
> >  		/*
> >  		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> > -		 * as old kexec-tools loads bzImage below that, unless
> > -		 * "crashkernel=size[KMG],high" is specified.
> > +		 * as crashkernel=x,high allocs memory over 4G, also allocs
> 
> s/allocs
>  /allocates
> 
> > +		 * 256M extra low memory for DMA buffers and swiotlb.
> > +		 * but the extra memory is not required for all machines.
> > +		 * So prefer low memory first, and fallback to high memory
> 
> s/fallback
>  /fall back
> 
> > +		 * unless "crashkernel=size[KMG],high" is specified.
> >  		 */
> > -		crash_base = memblock_find_in_range(CRASH_ALIGN,
> > -						    high ? CRASH_ADDR_HIGH_MAX
> > -							 : CRASH_ADDR_LOW_MAX,
> > -						    crash_size, CRASH_ALIGN);
> > +		if (!high)
> > +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +						CRASH_ADDR_LOW_MAX,
> > +						crash_size, CRASH_ALIGN);
> > +		if (!crash_base)
> > +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +						CRASH_ADDR_HIGH_MAX,
> > +						crash_size, CRASH_ALIGN);
> >  		if (!crash_base) {
> >  			pr_info("crashkernel reservation failed - No suitable area found.\n");
> >  			return;
> >  		}
> > -
> >  	} else {
> >  		unsigned long long start;
> >  
> > --- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
> > +++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
> > @@ -704,8 +704,11 @@
> >  			upon panic. This parameter reserves the physical
> >  			memory region [offset, offset + size] for that kernel
> >  			image. If '@offset' is omitted, then a suitable offset
> > -			is selected automatically. Check
> > -			Documentation/kdump/kdump.txt for further details.
> > +			is selected automatically.
> > +			[KNL, x86_64] select a region under 4G first, and
> > +			fallback to reserve region above 4G in case without
> 
> s/fallback
>  /fall back
> 
> > +			'@offset'.
> > +			See Documentation/kdump/kdump.txt for further details.
> >  
> >  	crashkernel=range1:size1[,range2:size2,...][@offset]
> >  			[KNL] Same as above, but depends on the memory
> 
> With the nits fixed:
> 
> Reviewed-by: Ingo Molnar <mingo@kernel.org>

Thanks for review, will reply to 2/2 with an update of those spelling
issues.

Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2 update] X86/kdump: fall back to reserve high crashkernel memory
  2019-04-21  3:51 ` [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory Dave Young
  2019-04-21 18:26   ` Ingo Molnar
@ 2019-04-22  3:19   ` Dave Young
  2019-04-22  3:29     ` Baoquan He
  2019-04-22  8:28     ` [tip:x86/kdump] x86/kdump: Fall " tip-bot for Dave Young
  1 sibling, 2 replies; 9+ messages in thread
From: Dave Young @ 2019-04-22  3:19 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, bhe, piliu, Yinghai Lu,
	Eric Biederman

crashkernel=xM tries to reserve crashkernel memory under 4G, which
is enough for usual cases.  But this could fail sometimes, for example
one tries to reserve a big chunk like 2G, it is possible to fail.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range.  Do not set the ,high as default
because it allocates extra low memory for DMA buffers and swiotlb, this is
not always necessary for all machines. Typically like crashkernel=128M
usually work with low reservation under 4G, so still keep <4G as default.

Signed-off-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |    7 +++++--
 arch/x86/kernel/setup.c                         |   22 ++++++++++++++--------
 2 files changed, 19 insertions(+), 10 deletions(-)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
 	}
 
 	/* 0 means: find the address automatically */
-	if (crash_base <= 0) {
+	if (!crash_base) {
 		/*
 		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-		 * as old kexec-tools loads bzImage below that, unless
-		 * "crashkernel=size[KMG],high" is specified.
+		 * crashkernel=x,high reserves memory over 4G, also allocates
+		 * 256M extra low memory for DMA buffers and swiotlb.
+		 * but the extra memory is not required for all machines.
+		 * So prefer low memory first, and fall back to high memory
+		 * unless "crashkernel=size[KMG],high" is specified.
 		 */
-		crash_base = memblock_find_in_range(CRASH_ALIGN,
-						    high ? CRASH_ADDR_HIGH_MAX
-							 : CRASH_ADDR_LOW_MAX,
-						    crash_size, CRASH_ALIGN);
+		if (!high)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_LOW_MAX,
+						crash_size, CRASH_ALIGN);
+		if (!crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX,
+						crash_size, CRASH_ALIGN);
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
-
 	} else {
 		unsigned long long start;
 
--- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
@@ -704,8 +704,11 @@
 			upon panic. This parameter reserves the physical
 			memory region [offset, offset + size] for that kernel
 			image. If '@offset' is omitted, then a suitable offset
-			is selected automatically. Check
-			Documentation/kdump/kdump.txt for further details.
+			is selected automatically.
+			[KNL, x86_64] select a region under 4G first, and
+			fall back to reserve region above 4G in case without
+			'@offset'.
+			See Documentation/kdump/kdump.txt for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2 update] X86/kdump: fall back to reserve high crashkernel memory
  2019-04-22  3:19   ` [PATCH 2/2 update] " Dave Young
@ 2019-04-22  3:29     ` Baoquan He
  2019-04-22  8:28     ` [tip:x86/kdump] x86/kdump: Fall " tip-bot for Dave Young
  1 sibling, 0 replies; 9+ messages in thread
From: Baoquan He @ 2019-04-22  3:29 UTC (permalink / raw)
  To: Dave Young
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, linux-kernel, vgoyal, piliu, Yinghai Lu, Eric Biederman

On 04/22/19 at 11:19am, Dave Young wrote:
> crashkernel=xM tries to reserve crashkernel memory under 4G, which
> is enough for usual cases.  But this could fail sometimes, for example
> one tries to reserve a big chunk like 2G, it is possible to fail.
> 
> So let the crashkernel=xM just fall back to use high memory in case it
> fails to find a suitable low range.  Do not set the ,high as default
> because it allocates extra low memory for DMA buffers and swiotlb, this is
> not always necessary for all machines. Typically like crashkernel=128M
> usually work with low reservation under 4G, so still keep <4G as default.
> 
> Signed-off-by: Dave Young <dyoung@redhat.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---

Ack the whole series, thanks for the effort.

Acked-by: Baoquan He <bhe@redhat.com>

>  Documentation/admin-guide/kernel-parameters.txt |    7 +++++--
>  arch/x86/kernel/setup.c                         |   22 ++++++++++++++--------
>  2 files changed, 19 insertions(+), 10 deletions(-)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
>  	}
>  
>  	/* 0 means: find the address automatically */
> -	if (crash_base <= 0) {
> +	if (!crash_base) {
>  		/*
>  		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -		 * as old kexec-tools loads bzImage below that, unless
> -		 * "crashkernel=size[KMG],high" is specified.
> +		 * crashkernel=x,high reserves memory over 4G, also allocates
> +		 * 256M extra low memory for DMA buffers and swiotlb.
> +		 * but the extra memory is not required for all machines.
> +		 * So prefer low memory first, and fall back to high memory
> +		 * unless "crashkernel=size[KMG],high" is specified.
>  		 */
> -		crash_base = memblock_find_in_range(CRASH_ALIGN,
> -						    high ? CRASH_ADDR_HIGH_MAX
> -							 : CRASH_ADDR_LOW_MAX,
> -						    crash_size, CRASH_ALIGN);
> +		if (!high)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_LOW_MAX,
> +						crash_size, CRASH_ALIGN);
> +		if (!crash_base)
> +			crash_base = memblock_find_in_range(CRASH_ALIGN,
> +						CRASH_ADDR_HIGH_MAX,
> +						crash_size, CRASH_ALIGN);
>  		if (!crash_base) {
>  			pr_info("crashkernel reservation failed - No suitable area found.\n");
>  			return;
>  		}
> -
>  	} else {
>  		unsigned long long start;
>  
> --- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
> +++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
> @@ -704,8 +704,11 @@
>  			upon panic. This parameter reserves the physical
>  			memory region [offset, offset + size] for that kernel
>  			image. If '@offset' is omitted, then a suitable offset
> -			is selected automatically. Check
> -			Documentation/kdump/kdump.txt for further details.
> +			is selected automatically.
> +			[KNL, x86_64] select a region under 4G first, and
> +			fall back to reserve region above 4G in case without
> +			'@offset'.
> +			See Documentation/kdump/kdump.txt for further details.
>  
>  	crashkernel=range1:size1[,range2:size2,...][@offset]
>  			[KNL] Same as above, but depends on the memory

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:x86/kdump] x86/kdump: Have crashkernel=X reserve under 4G by default
  2019-04-21  3:50 [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Dave Young
  2019-04-21  3:51 ` [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory Dave Young
  2019-04-21 18:24 ` [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Ingo Molnar
@ 2019-04-22  8:28 ` tip-bot for Dave Young
  2 siblings, 0 replies; 9+ messages in thread
From: tip-bot for Dave Young @ 2019-04-22  8:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dyoung, x86, mingo, linux-kernel, ptesarik, dhowells, bhe,
	jgross, tglx, kookoo.gu, ebiederm, hpa, bp, linuxram, okaya,
	mingo, dave.hansen, yinghai

Commit-ID:  9ca5c8e632ce8f144ec6d00da2dc5e16b41d593c
Gitweb:     https://git.kernel.org/tip/9ca5c8e632ce8f144ec6d00da2dc5e16b41d593c
Author:     Dave Young <dyoung@redhat.com>
AuthorDate: Sun, 21 Apr 2019 11:50:59 +0800
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Mon, 22 Apr 2019 10:15:16 +0200

x86/kdump: Have crashkernel=X reserve under 4G by default

The kdump crashkernel low reservation is limited to under 896M even for
X86_64. This obscure and miserable limitation exists for compatibility
with old kexec-tools but the reason is not documented anywhere.

Some more tests/investigations about the background:

a) Previously, old kexec-tools could only load purgatory to memory under
   2G. Eric removed that limitation in 2012 in kexec-tools:

     b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere
		   in the 64bit address space.")

b) Back in 2013 Yinghai removed all the limitations in new kexec-tools,
   bzImage64 can be loaded anywhere:

     82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G")

c) Test results with old kexec-tools with old and latest kernels:

  1. Old kexec-tools can not build with modern toolchain anymore,
     I built it in a RHEL6 vm.

  2. 2.0.0 kexec-tools does not work with the latest kernel even with
     memory under 896M and gives an error:

     "ELF core (kcore) parse failed"

     For that it needs below kexec-tools fix:

       ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid")

  3. Even with patched kexec-tools which fixes 2),  it still needs some
     other fixes to work correctly for KASLR-enabled kernels.

So the situation is:

* Old kexec-tools is already broken with latest kernels.

* We can not keep these limitations forever just for compatibility with very
  old kexec-tools.

* If one must use old tools then he/she can choose crashkernel=X@Y.

* People have reported bugs where crashkernel=384M failed because KASLR
  makes the 0-896M space sparse.

* Crashkernel can reserve in low or high area, it is natural to understand
  low as memory under 4G.

Hence drop the 896M limitation and change crashkernel low reservation to
reserve under 4G by default.

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: piliu@redhat.com
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Zhimin Gu <kookoo.gu@intel.com>
Link: https://lkml.kernel.org/r/20190421035058.943630505@redhat.com
---
 arch/x86/kernel/setup.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a527cd9..daf7c5650c18 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -71,6 +71,7 @@
 #include <linux/tboot.h>
 #include <linux/jiffies.h>
 #include <linux/mem_encrypt.h>
+#include <linux/sizes.h>
 
 #include <linux/usb/xhci-dbgp.h>
 #include <video/edid.h>
@@ -448,18 +449,17 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 #ifdef CONFIG_KEXEC_CORE
 
 /* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN		(16 << 20)
+#define CRASH_ALIGN		SZ_16M
 
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64bit, old kexec-tools need to under 896MiB.
  */
 #ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX	(512 << 20)
-# define CRASH_ADDR_HIGH_MAX	(512 << 20)
+# define CRASH_ADDR_LOW_MAX	SZ_512M
+# define CRASH_ADDR_HIGH_MAX	SZ_512M
 #else
-# define CRASH_ADDR_LOW_MAX	(896UL << 20)
+# define CRASH_ADDR_LOW_MAX	SZ_4G
 # define CRASH_ADDR_HIGH_MAX	MAXMEM
 #endif
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip:x86/kdump] x86/kdump: Fall back to reserve high crashkernel memory
  2019-04-22  3:19   ` [PATCH 2/2 update] " Dave Young
  2019-04-22  3:29     ` Baoquan He
@ 2019-04-22  8:28     ` tip-bot for Dave Young
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Dave Young @ 2019-04-22  8:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: corbet, gregkh, konrad.wilk, bp, paulmck, jkosina, okaya, jgross,
	hpa, linuxram, thymovanbeers, x86, kookoo.gu, ptesarik, mingo,
	mingo, linux-kernel, ebiederm, yinghai, dhowells, bhe, tglx,
	keescook, dyoung

Commit-ID:  b9ac3849af412fd3887d7652bdbabf29d2aecc16
Gitweb:     https://git.kernel.org/tip/b9ac3849af412fd3887d7652bdbabf29d2aecc16
Author:     Dave Young <dyoung@redhat.com>
AuthorDate: Mon, 22 Apr 2019 11:19:05 +0800
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Mon, 22 Apr 2019 10:23:05 +0200

x86/kdump: Fall back to reserve high crashkernel memory

crashkernel=xM tries to reserve memory for the crash kernel under 4G,
which is enough, usually. But this could fail sometimes, for example
when one tries to reserve a big chunk like 2G, for example.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range. Do not set the ,high as default
because it allocates extra low memory for DMA buffers and swiotlb, and
this is not always necessary for all machines.

Typically, crashkernel=128M usually works with low reservation under 4G,
so keep <4G as default.

 [ bp: Massage. ]

Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: piliu@redhat.com
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thymo van Beers <thymovanbeers@gmail.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Zhimin Gu <kookoo.gu@intel.com>
Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++--
 arch/x86/kernel/setup.c                         | 22 ++++++++++++++--------
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..24d01648edeb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -704,8 +704,11 @@
 			upon panic. This parameter reserves the physical
 			memory region [offset, offset + size] for that kernel
 			image. If '@offset' is omitted, then a suitable offset
-			is selected automatically. Check
-			Documentation/kdump/kdump.txt for further details.
+			is selected automatically.
+			[KNL, x86_64] select a region under 4G first, and
+			fall back to reserve region above 4G when '@offset'
+			hasn't been specified.
+			See Documentation/kdump/kdump.txt for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index daf7c5650c18..c15f362a2516 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -541,21 +541,27 @@ static void __init reserve_crashkernel(void)
 	}
 
 	/* 0 means: find the address automatically */
-	if (crash_base <= 0) {
+	if (!crash_base) {
 		/*
 		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-		 * as old kexec-tools loads bzImage below that, unless
-		 * "crashkernel=size[KMG],high" is specified.
+		 * crashkernel=x,high reserves memory over 4G, also allocates
+		 * 256M extra low memory for DMA buffers and swiotlb.
+		 * But the extra memory is not required for all machines.
+		 * So try low memory first and fall back to high memory
+		 * unless "crashkernel=size[KMG],high" is specified.
 		 */
-		crash_base = memblock_find_in_range(CRASH_ALIGN,
-						    high ? CRASH_ADDR_HIGH_MAX
-							 : CRASH_ADDR_LOW_MAX,
-						    crash_size, CRASH_ALIGN);
+		if (!high)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_LOW_MAX,
+						crash_size, CRASH_ALIGN);
+		if (!crash_base)
+			crash_base = memblock_find_in_range(CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX,
+						crash_size, CRASH_ALIGN);
 		if (!crash_base) {
 			pr_info("crashkernel reservation failed - No suitable area found.\n");
 			return;
 		}
-
 	} else {
 		unsigned long long start;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-04-22  8:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-21  3:50 [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Dave Young
2019-04-21  3:51 ` [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory Dave Young
2019-04-21 18:26   ` Ingo Molnar
2019-04-22  3:03     ` Dave Young
2019-04-22  3:19   ` [PATCH 2/2 update] " Dave Young
2019-04-22  3:29     ` Baoquan He
2019-04-22  8:28     ` [tip:x86/kdump] x86/kdump: Fall " tip-bot for Dave Young
2019-04-21 18:24 ` [PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default Ingo Molnar
2019-04-22  8:28 ` [tip:x86/kdump] x86/kdump: Have crashkernel=X " tip-bot for Dave Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.