linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/setup: consolidate early memory reservations
@ 2021-01-15  8:32 Mike Rapoport
  2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Mike Rapoport @ 2021-01-15  8:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Baoquan He, Borislav Petkov, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Mike Rapoport, Qian Cai, Thomas Gleixner,
	Vlastimil Babka, linux-kernel, linux-mm, x86

From: Mike Rapoport <rppt@linux.ibm.com>

Hi,

David noticed that we do some of memblock_reserve() calls after allocations
are possible:

https://lore.kernel.org/lkml/6ba6bde3-1520-5cd0-f987-32d543f0b79f@redhat.com

For now there is no actual problem because in top-down mode we allocate
from the end of the memory and in bottom-up mode we allocate above the
kernel image. But there is a patch in the mm tree that allow bottom-up
allocations below the kernel:

https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com

and with this change we may get a memory corruption if an allocation steps
on some of the firmware areas that are yet to be reserved.

The below patches consolidate early memory reservations done during
setup_arch() so that memory used by firmware, bootloader, kernel text/data
and the memory that should be excluded from the available memory for
whatever other reason is reserved before memblock allocations are possible.

The patches are vs v5.11-rc3-mmots-2021-01-12-02-00 as I think they are
prerequisite for the memblock bottom-up changes, but if needed I can rebase
then on another tree.

Mike Rapoport (2):
  x86/setup: consolidate early memory reservations
  x86/setup: merge several reservations of start of the memory

 arch/x86/kernel/setup.c | 85 +++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 42 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-15  8:32 [PATCH 0/2] x86/setup: consolidate early memory reservations Mike Rapoport
@ 2021-01-15  8:32 ` Mike Rapoport
  2021-01-15 10:10   ` David Hildenbrand
                     ` (2 more replies)
  2021-01-15  8:32 ` [PATCH 2/2] x86/setup: merge several reservations of start of the memory Mike Rapoport
  2021-01-15 11:42 ` [PATCH 0/2] x86/setup: consolidate early memory reservations Baoquan He
  2 siblings, 3 replies; 13+ messages in thread
From: Mike Rapoport @ 2021-01-15  8:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Baoquan He, Borislav Petkov, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Mike Rapoport, Qian Cai, Thomas Gleixner,
	Vlastimil Babka, linux-kernel, linux-mm, x86

From: Mike Rapoport <rppt@linux.ibm.com>

The early reservations of memory areas used by the firmware, bootloader,
kernel text and data are spread over setup_arch(). Moreover, some of them
happen *after* memblock allocations, e.g trim_platform_memory_ranges() and
trim_low_memory_range() are called after reserve_real_mode() that allocates
memory.

We did not observe corruption of these memory regions because memblock
always allocates memory either from the end of memory (in top-down mode) or
above the kernel image (in bottom-up mode). However, the bottom up mode is
going to be updated to span the entire memory [1] to avoid limitations
caused by KASLR.

Consolidate early memory reservations in a dedicated function to improve
robustness against future changes. Having the early reservations in one
place also makes it clearer what memory must be reserved before we allow
memblock allocations.

[1] https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/x86/kernel/setup.c | 80 ++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3412c4595efd..32cd2e790a0a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -728,7 +728,38 @@ static void __init trim_low_memory_range(void)
 	 */
 	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
 }
-	
+
+static void __init early_reserve_memory(void)
+{
+	/*
+	 * Reserve the memory occupied by the kernel between _text and
+	 * __end_of_kernel_reserve symbols. Any kernel sections after the
+	 * __end_of_kernel_reserve symbol must be explicitly reserved with a
+	 * separate memblock_reserve() or they will be discarded.
+	 */
+	memblock_reserve(__pa_symbol(_text),
+			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
+
+	/*
+	 * Make sure page 0 is always reserved because on systems with
+	 * L1TF its contents can be leaked to user processes.
+	 */
+	memblock_reserve(0, PAGE_SIZE);
+
+	early_reserve_initrd();
+
+	if (efi_enabled(EFI_BOOT))
+		efi_memblock_x86_reserve_range();
+
+	memblock_x86_reserve_range_setup_data();
+
+	reserve_ibft_region();
+	reserve_bios_regions();
+
+	trim_platform_memory_ranges();
+	trim_low_memory_range();
+}
+
 /*
  * Dump out kernel offset information on panic.
  */
@@ -763,29 +794,6 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 
 void __init setup_arch(char **cmdline_p)
 {
-	/*
-	 * Reserve the memory occupied by the kernel between _text and
-	 * __end_of_kernel_reserve symbols. Any kernel sections after the
-	 * __end_of_kernel_reserve symbol must be explicitly reserved with a
-	 * separate memblock_reserve() or they will be discarded.
-	 */
-	memblock_reserve(__pa_symbol(_text),
-			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
-
-	/*
-	 * Make sure page 0 is always reserved because on systems with
-	 * L1TF its contents can be leaked to user processes.
-	 */
-	memblock_reserve(0, PAGE_SIZE);
-
-	early_reserve_initrd();
-
-	/*
-	 * At this point everything still needed from the boot loader
-	 * or BIOS or kernel text should be early reserved or marked not
-	 * RAM in e820. All other memory is free game.
-	 */
-
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 
@@ -909,8 +917,18 @@ void __init setup_arch(char **cmdline_p)
 
 	parse_early_param();
 
-	if (efi_enabled(EFI_BOOT))
-		efi_memblock_x86_reserve_range();
+	/*
+	 * Do some memory reservations *before* memory is added to
+	 * memblock, so memblock allocations won't overwrite it.
+	 * Do it after early param, so we could get (unlikely) panic from
+	 * serial.
+	 *
+	 * After this point everything still needed from the boot loader or
+	 * firmware or kernel text should be early reserved or marked not
+	 * RAM in e820. All other memory is free game.
+	 */
+	early_reserve_memory();
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/*
 	 * Memory used by the kernel cannot be hot-removed because Linux
@@ -937,9 +955,6 @@ void __init setup_arch(char **cmdline_p)
 
 	x86_report_nx();
 
-	/* after early param, so could get panic from serial */
-	memblock_x86_reserve_range_setup_data();
-
 	if (acpi_mps_check()) {
 #ifdef CONFIG_X86_LOCAL_APIC
 		disable_apic = 1;
@@ -1031,8 +1046,6 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	find_smp_config();
 
-	reserve_ibft_region();
-
 	early_alloc_pgt_buf();
 
 	/*
@@ -1053,8 +1066,6 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	sev_setup_arch();
 
-	reserve_bios_regions();
-
 	efi_fake_memmap();
 	efi_find_mirror();
 	efi_esrt_init();
@@ -1080,9 +1091,6 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_real_mode();
 
-	trim_platform_memory_ranges();
-	trim_low_memory_range();
-
 	init_mem_mapping();
 
 	idt_setup_early_pf();
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] x86/setup: merge several reservations of start of the memory
  2021-01-15  8:32 [PATCH 0/2] x86/setup: consolidate early memory reservations Mike Rapoport
  2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
@ 2021-01-15  8:32 ` Mike Rapoport
  2021-01-15 10:09   ` David Hildenbrand
  2021-01-25 14:55   ` Borislav Petkov
  2021-01-15 11:42 ` [PATCH 0/2] x86/setup: consolidate early memory reservations Baoquan He
  2 siblings, 2 replies; 13+ messages in thread
From: Mike Rapoport @ 2021-01-15  8:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, Baoquan He, Borislav Petkov, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Mike Rapoport, Qian Cai, Thomas Gleixner,
	Vlastimil Babka, linux-kernel, linux-mm, x86

From: Mike Rapoport <rppt@linux.ibm.com>

Currently the first several pages are reserved both to avoid leaking their
contents on systems with L1TF and to avoid corrupting BIOS memory.

Merge the two memory reservations.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/x86/kernel/setup.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 32cd2e790a0a..3f2fd67240f8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -715,20 +715,6 @@ static int __init parse_reservelow(char *p)
 
 early_param("reservelow", parse_reservelow);
 
-static void __init trim_low_memory_range(void)
-{
-	/*
-	 * A special case is the first 4Kb of memory;
-	 * This is a BIOS owned area, not kernel ram, but generally
-	 * not listed as such in the E820 table.
-	 *
-	 * This typically reserves additional memory (64KiB by default)
-	 * since some BIOSes are known to corrupt low memory.  See the
-	 * Kconfig help text for X86_RESERVE_LOW.
-	 */
-	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
-}
-
 static void __init early_reserve_memory(void)
 {
 	/*
@@ -741,10 +727,18 @@ static void __init early_reserve_memory(void)
 			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
 
 	/*
-	 * Make sure page 0 is always reserved because on systems with
-	 * L1TF its contents can be leaked to user processes.
+	 * The first 4Kb of memory is a BIOS owned area, but generally it is
+	 * not listed as such in the E820 table.
+	 *
+	 * Reserve the first memory page and typically some additional
+	 * memory (64KiB by default) since some BIOSes are known to corrupt
+	 * low memory. See the Kconfig help text for X86_RESERVE_LOW.
+	 *
+	 * In addition, we must make sure page 0 is always reserved because
+	 * on systems with L1TF its contents can be leaked to user
+	 * processes.
 	 */
-	memblock_reserve(0, PAGE_SIZE);
+	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
 
 	early_reserve_initrd();
 
@@ -757,7 +751,6 @@ static void __init early_reserve_memory(void)
 	reserve_bios_regions();
 
 	trim_platform_memory_ranges();
-	trim_low_memory_range();
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] x86/setup: merge several reservations of start of the memory
  2021-01-15  8:32 ` [PATCH 2/2] x86/setup: merge several reservations of start of the memory Mike Rapoport
@ 2021-01-15 10:09   ` David Hildenbrand
  2021-01-25 14:55   ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-01-15 10:09 UTC (permalink / raw)
  To: Mike Rapoport, Andrew Morton
  Cc: Andrea Arcangeli, Baoquan He, Borislav Petkov, H. Peter Anvin,
	Ingo Molnar, Mel Gorman, Michal Hocko, Mike Rapoport, Qian Cai,
	Thomas Gleixner, Vlastimil Babka, linux-kernel, linux-mm, x86

On 15.01.21 09:32, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Currently the first several pages are reserved both to avoid leaking their
> contents on systems with L1TF and to avoid corrupting BIOS memory.
> 
> Merge the two memory reservations.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/x86/kernel/setup.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 32cd2e790a0a..3f2fd67240f8 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -715,20 +715,6 @@ static int __init parse_reservelow(char *p)
>  
>  early_param("reservelow", parse_reservelow);
>  
> -static void __init trim_low_memory_range(void)
> -{
> -	/*
> -	 * A special case is the first 4Kb of memory;
> -	 * This is a BIOS owned area, not kernel ram, but generally
> -	 * not listed as such in the E820 table.
> -	 *
> -	 * This typically reserves additional memory (64KiB by default)
> -	 * since some BIOSes are known to corrupt low memory.  See the
> -	 * Kconfig help text for X86_RESERVE_LOW.
> -	 */
> -	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
> -}
> -
>  static void __init early_reserve_memory(void)
>  {
>  	/*
> @@ -741,10 +727,18 @@ static void __init early_reserve_memory(void)
>  			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
>  
>  	/*
> -	 * Make sure page 0 is always reserved because on systems with
> -	 * L1TF its contents can be leaked to user processes.
> +	 * The first 4Kb of memory is a BIOS owned area, but generally it is
> +	 * not listed as such in the E820 table.
> +	 *
> +	 * Reserve the first memory page and typically some additional
> +	 * memory (64KiB by default) since some BIOSes are known to corrupt
> +	 * low memory. See the Kconfig help text for X86_RESERVE_LOW.
> +	 *
> +	 * In addition, we must make sure page 0 is always reserved because
> +	 * on systems with L1TF its contents can be leaked to user
> +	 * processes.
>  	 */
> -	memblock_reserve(0, PAGE_SIZE);
> +	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
>  
>  	early_reserve_initrd();
>  
> @@ -757,7 +751,6 @@ static void __init early_reserve_memory(void)
>  	reserve_bios_regions();
>  
>  	trim_platform_memory_ranges();
> -	trim_low_memory_range();
>  }
>  
>  /*
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
@ 2021-01-15 10:10   ` David Hildenbrand
  2021-01-25 14:50   ` Borislav Petkov
  2021-01-25 14:59   ` Borislav Petkov
  2 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2021-01-15 10:10 UTC (permalink / raw)
  To: Mike Rapoport, Andrew Morton
  Cc: Andrea Arcangeli, Baoquan He, Borislav Petkov, H. Peter Anvin,
	Ingo Molnar, Mel Gorman, Michal Hocko, Mike Rapoport, Qian Cai,
	Thomas Gleixner, Vlastimil Babka, linux-kernel, linux-mm, x86

On 15.01.21 09:32, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The early reservations of memory areas used by the firmware, bootloader,
> kernel text and data are spread over setup_arch(). Moreover, some of them
> happen *after* memblock allocations, e.g trim_platform_memory_ranges() and
> trim_low_memory_range() are called after reserve_real_mode() that allocates
> memory.
> 
> We did not observe corruption of these memory regions because memblock
> always allocates memory either from the end of memory (in top-down mode) or
> above the kernel image (in bottom-up mode). However, the bottom up mode is
> going to be updated to span the entire memory [1] to avoid limitations
> caused by KASLR.
> 
> Consolidate early memory reservations in a dedicated function to improve
> robustness against future changes. Having the early reservations in one
> place also makes it clearer what memory must be reserved before we allow
> memblock allocations.
> 
> [1] https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/x86/kernel/setup.c | 80 ++++++++++++++++++++++-------------------
>  1 file changed, 44 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3412c4595efd..32cd2e790a0a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -728,7 +728,38 @@ static void __init trim_low_memory_range(void)
>  	 */
>  	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
>  }
> -	
> +
> +static void __init early_reserve_memory(void)
> +{
> +	/*
> +	 * Reserve the memory occupied by the kernel between _text and
> +	 * __end_of_kernel_reserve symbols. Any kernel sections after the
> +	 * __end_of_kernel_reserve symbol must be explicitly reserved with a
> +	 * separate memblock_reserve() or they will be discarded.
> +	 */
> +	memblock_reserve(__pa_symbol(_text),
> +			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
> +
> +	/*
> +	 * Make sure page 0 is always reserved because on systems with
> +	 * L1TF its contents can be leaked to user processes.
> +	 */
> +	memblock_reserve(0, PAGE_SIZE);
> +
> +	early_reserve_initrd();
> +
> +	if (efi_enabled(EFI_BOOT))
> +		efi_memblock_x86_reserve_range();
> +
> +	memblock_x86_reserve_range_setup_data();
> +
> +	reserve_ibft_region();
> +	reserve_bios_regions();
> +
> +	trim_platform_memory_ranges();
> +	trim_low_memory_range();
> +}
> +
>  /*
>   * Dump out kernel offset information on panic.
>   */
> @@ -763,29 +794,6 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> -	/*
> -	 * Reserve the memory occupied by the kernel between _text and
> -	 * __end_of_kernel_reserve symbols. Any kernel sections after the
> -	 * __end_of_kernel_reserve symbol must be explicitly reserved with a
> -	 * separate memblock_reserve() or they will be discarded.
> -	 */
> -	memblock_reserve(__pa_symbol(_text),
> -			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
> -
> -	/*
> -	 * Make sure page 0 is always reserved because on systems with
> -	 * L1TF its contents can be leaked to user processes.
> -	 */
> -	memblock_reserve(0, PAGE_SIZE);
> -
> -	early_reserve_initrd();
> -
> -	/*
> -	 * At this point everything still needed from the boot loader
> -	 * or BIOS or kernel text should be early reserved or marked not
> -	 * RAM in e820. All other memory is free game.
> -	 */
> -
>  #ifdef CONFIG_X86_32
>  	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
>  
> @@ -909,8 +917,18 @@ void __init setup_arch(char **cmdline_p)
>  
>  	parse_early_param();
>  
> -	if (efi_enabled(EFI_BOOT))
> -		efi_memblock_x86_reserve_range();
> +	/*
> +	 * Do some memory reservations *before* memory is added to
> +	 * memblock, so memblock allocations won't overwrite it.
> +	 * Do it after early param, so we could get (unlikely) panic from
> +	 * serial.
> +	 *
> +	 * After this point everything still needed from the boot loader or
> +	 * firmware or kernel text should be early reserved or marked not
> +	 * RAM in e820. All other memory is free game.
> +	 */
> +	early_reserve_memory();
> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	/*
>  	 * Memory used by the kernel cannot be hot-removed because Linux
> @@ -937,9 +955,6 @@ void __init setup_arch(char **cmdline_p)
>  
>  	x86_report_nx();
>  
> -	/* after early param, so could get panic from serial */
> -	memblock_x86_reserve_range_setup_data();
> -
>  	if (acpi_mps_check()) {
>  #ifdef CONFIG_X86_LOCAL_APIC
>  		disable_apic = 1;
> @@ -1031,8 +1046,6 @@ void __init setup_arch(char **cmdline_p)
>  	 */
>  	find_smp_config();
>  
> -	reserve_ibft_region();
> -
>  	early_alloc_pgt_buf();
>  
>  	/*
> @@ -1053,8 +1066,6 @@ void __init setup_arch(char **cmdline_p)
>  	 */
>  	sev_setup_arch();
>  
> -	reserve_bios_regions();
> -
>  	efi_fake_memmap();
>  	efi_find_mirror();
>  	efi_esrt_init();
> @@ -1080,9 +1091,6 @@ void __init setup_arch(char **cmdline_p)
>  
>  	reserve_real_mode();
>  
> -	trim_platform_memory_ranges();
> -	trim_low_memory_range();
> -
>  	init_mem_mapping();
>  
>  	idt_setup_early_pf();
> 

Did not fully review for side-effects, but looks like a reasonable thing
to do!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] x86/setup: consolidate early memory reservations
  2021-01-15  8:32 [PATCH 0/2] x86/setup: consolidate early memory reservations Mike Rapoport
  2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
  2021-01-15  8:32 ` [PATCH 2/2] x86/setup: merge several reservations of start of the memory Mike Rapoport
@ 2021-01-15 11:42 ` Baoquan He
  2021-01-15 11:56   ` Baoquan He
  2 siblings, 1 reply; 13+ messages in thread
From: Baoquan He @ 2021-01-15 11:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andrea Arcangeli, Borislav Petkov,
	David Hildenbrand, H. Peter Anvin, Ingo Molnar, Mel Gorman,
	Michal Hocko, Mike Rapoport, Qian Cai, Thomas Gleixner,
	Vlastimil Babka, linux-kernel, linux-mm, x86

On 01/15/21 at 10:32am, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Hi,
> 
> David noticed that we do some of memblock_reserve() calls after allocations
> are possible:
> 
> https://lore.kernel.org/lkml/6ba6bde3-1520-5cd0-f987-32d543f0b79f@redhat.com

Thanks for CC-ing me, so I think the above patch from Roman is dangerous.
KASLR does put kernel randomly in a place, but we did a brutal parse to
get SRAT table so that we know where is hotpluggable area during boot
decompression stage. In kernel, at the beginning, we don't know that
before ACPI init. Roman's patch is wrong if I don't miss something.

I will add comment in that thread.

Thanks
Baoquan

> 
> For now there is no actual problem because in top-down mode we allocate
> from the end of the memory and in bottom-up mode we allocate above the
> kernel image. But there is a patch in the mm tree that allow bottom-up
> allocations below the kernel:
> 
> https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
> 
> and with this change we may get a memory corruption if an allocation steps
> on some of the firmware areas that are yet to be reserved.
> 
> The below patches consolidate early memory reservations done during
> setup_arch() so that memory used by firmware, bootloader, kernel text/data
> and the memory that should be excluded from the available memory for
> whatever other reason is reserved before memblock allocations are possible.
> 
> The patches are vs v5.11-rc3-mmots-2021-01-12-02-00 as I think they are
> prerequisite for the memblock bottom-up changes, but if needed I can rebase
> then on another tree.
> 
> Mike Rapoport (2):
>   x86/setup: consolidate early memory reservations
>   x86/setup: merge several reservations of start of the memory
> 
>  arch/x86/kernel/setup.c | 85 +++++++++++++++++++++--------------------
>  1 file changed, 43 insertions(+), 42 deletions(-)
> 
> -- 
> 2.28.0
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] x86/setup: consolidate early memory reservations
  2021-01-15 11:42 ` [PATCH 0/2] x86/setup: consolidate early memory reservations Baoquan He
@ 2021-01-15 11:56   ` Baoquan He
  0 siblings, 0 replies; 13+ messages in thread
From: Baoquan He @ 2021-01-15 11:56 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andrea Arcangeli, Borislav Petkov,
	David Hildenbrand, H. Peter Anvin, Ingo Molnar, Mel Gorman,
	Michal Hocko, Mike Rapoport, Qian Cai, Thomas Gleixner,
	Vlastimil Babka, linux-kernel, linux-mm, x86

On 01/15/21 at 07:42pm, Baoquan He wrote:
> On 01/15/21 at 10:32am, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Hi,
> > 
> > David noticed that we do some of memblock_reserve() calls after allocations
> > are possible:
> > 
> > https://lore.kernel.org/lkml/6ba6bde3-1520-5cd0-f987-32d543f0b79f@redhat.com
> 
> Thanks for CC-ing me, so I think the above patch from Roman is dangerous.
> KASLR does put kernel randomly in a place, but we did a brutal parse to
> get SRAT table so that we know where is hotpluggable area during boot
> decompression stage. In kernel, at the beginning, we don't know that
> before ACPI init. Roman's patch is wrong if I don't miss something.

Sorry, I was wrong. Bottom up searching disregarding kernel end is
good optimization. Please ignore this noise.

> 
> I will add comment in that thread.
> 
> Thanks
> Baoquan
> 
> > 
> > For now there is no actual problem because in top-down mode we allocate
> > from the end of the memory and in bottom-up mode we allocate above the
> > kernel image. But there is a patch in the mm tree that allow bottom-up
> > allocations below the kernel:
> > 
> > https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
> > 
> > and with this change we may get a memory corruption if an allocation steps
> > on some of the firmware areas that are yet to be reserved.
> > 
> > The below patches consolidate early memory reservations done during
> > setup_arch() so that memory used by firmware, bootloader, kernel text/data
> > and the memory that should be excluded from the available memory for
> > whatever other reason is reserved before memblock allocations are possible.
> > 
> > The patches are vs v5.11-rc3-mmots-2021-01-12-02-00 as I think they are
> > prerequisite for the memblock bottom-up changes, but if needed I can rebase
> > then on another tree.
> > 
> > Mike Rapoport (2):
> >   x86/setup: consolidate early memory reservations
> >   x86/setup: merge several reservations of start of the memory
> > 
> >  arch/x86/kernel/setup.c | 85 +++++++++++++++++++++--------------------
> >  1 file changed, 43 insertions(+), 42 deletions(-)
> > 
> > -- 
> > 2.28.0
> > 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
  2021-01-15 10:10   ` David Hildenbrand
@ 2021-01-25 14:50   ` Borislav Petkov
  2021-01-25 15:31     ` Mike Rapoport
  2021-01-25 14:59   ` Borislav Petkov
  2 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-01-25 14:50 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andrea Arcangeli, Baoquan He, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Qian Cai, Thomas Gleixner, Vlastimil Babka,
	linux-kernel, linux-mm, x86

On Fri, Jan 15, 2021 at 10:32:54AM +0200, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The early reservations of memory areas used by the firmware, bootloader,
> kernel text and data are spread over setup_arch(). Moreover, some of them
> happen *after* memblock allocations, e.g trim_platform_memory_ranges() and
> trim_low_memory_range() are called after reserve_real_mode() that allocates
> memory.
> 
> We did not observe corruption of these memory regions because memblock

Make that "We" impersonal, passive voice pls.

> always allocates memory either from the end of memory (in top-down mode) or
> above the kernel image (in bottom-up mode). However, the bottom up mode is
> going to be updated to span the entire memory [1] to avoid limitations
> caused by KASLR.
> 
> Consolidate early memory reservations in a dedicated function to improve
> robustness against future changes. Having the early reservations in one
> place also makes it clearer what memory must be reserved before we allow
> memblock allocations.

Would it make sense to have a check with a WARN or so to catch early
reservations which get added after memblock allocations have been
allowed? To catch people who don't pay attention...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] x86/setup: merge several reservations of start of the memory
  2021-01-15  8:32 ` [PATCH 2/2] x86/setup: merge several reservations of start of the memory Mike Rapoport
  2021-01-15 10:09   ` David Hildenbrand
@ 2021-01-25 14:55   ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2021-01-25 14:55 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andrea Arcangeli, Baoquan He, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Qian Cai, Thomas Gleixner, Vlastimil Babka,
	linux-kernel, linux-mm, x86

On Fri, Jan 15, 2021 at 10:32:55AM +0200, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Currently the first several pages are reserved both to avoid leaking their
> contents on systems with L1TF and to avoid corrupting BIOS memory.
> 
> Merge the two memory reservations.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/x86/kernel/setup.c | 29 +++++++++++------------------
>  1 file changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 32cd2e790a0a..3f2fd67240f8 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -715,20 +715,6 @@ static int __init parse_reservelow(char *p)
>  
>  early_param("reservelow", parse_reservelow);
>  
> -static void __init trim_low_memory_range(void)
> -{
> -	/*
> -	 * A special case is the first 4Kb of memory;
> -	 * This is a BIOS owned area, not kernel ram, but generally
> -	 * not listed as such in the E820 table.
> -	 *
> -	 * This typically reserves additional memory (64KiB by default)
> -	 * since some BIOSes are known to corrupt low memory.  See the
> -	 * Kconfig help text for X86_RESERVE_LOW.
> -	 */
> -	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
> -}
> -
>  static void __init early_reserve_memory(void)
>  {
>  	/*
> @@ -741,10 +727,18 @@ static void __init early_reserve_memory(void)
>  			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
>  
>  	/*
> -	 * Make sure page 0 is always reserved because on systems with
> -	 * L1TF its contents can be leaked to user processes.
> +	 * The first 4Kb of memory is a BIOS owned area, but generally it is
> +	 * not listed as such in the E820 table.
> +	 *
> +	 * Reserve the first memory page and typically some additional
> +	 * memory (64KiB by default) since some BIOSes are known to corrupt
> +	 * low memory. See the Kconfig help text for X86_RESERVE_LOW.
> +	 *
> +	 * In addition, we must make sure page 0 is always reserved because

s/we must//

Other than that:

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
  2021-01-15 10:10   ` David Hildenbrand
  2021-01-25 14:50   ` Borislav Petkov
@ 2021-01-25 14:59   ` Borislav Petkov
  2021-01-25 15:33     ` Mike Rapoport
  2 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2021-01-25 14:59 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andrea Arcangeli, Baoquan He, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Qian Cai, Thomas Gleixner, Vlastimil Babka,
	linux-kernel, linux-mm, x86

On Fri, Jan 15, 2021 at 10:32:54AM +0200, Mike Rapoport wrote:
> +	trim_low_memory_range();

Btw, you can get rid of that one too:

/*
 * Here we put platform-specific memory range workarounds, i.e.
 * memory known to be corrupt or otherwise in need to be reserved on
 * specific platforms.
 *
 * If this gets used more widely it could use a real dispatch mechanism.
 */
static void __init trim_platform_memory_ranges(void)
{
        trim_snb_memory();
}

yeah, yeah, we can do a real dispatch mechanism but we didn't need one
since 2012 so I guess we can get rid of trim_platform_memory_ranges()
and call trim_snb_memory() directly and simplify it even more.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-25 14:50   ` Borislav Petkov
@ 2021-01-25 15:31     ` Mike Rapoport
  2021-01-25 16:56       ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Rapoport @ 2021-01-25 15:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andrew Morton, Andrea Arcangeli, Baoquan He, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Qian Cai, Thomas Gleixner, Vlastimil Babka,
	linux-kernel, linux-mm, x86

On Mon, Jan 25, 2021 at 03:50:41PM +0100, Borislav Petkov wrote:
> On Fri, Jan 15, 2021 at 10:32:54AM +0200, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > The early reservations of memory areas used by the firmware, bootloader,
> > kernel text and data are spread over setup_arch(). Moreover, some of them
> > happen *after* memblock allocations, e.g trim_platform_memory_ranges() and
> > trim_low_memory_range() are called after reserve_real_mode() that allocates
> > memory.
> > 
> > We did not observe corruption of these memory regions because memblock
> 
> Make that "We" impersonal, passive voice pls.

Ok.
 
> > always allocates memory either from the end of memory (in top-down mode) or
> > above the kernel image (in bottom-up mode). However, the bottom up mode is
> > going to be updated to span the entire memory [1] to avoid limitations
> > caused by KASLR.
> > 
> > Consolidate early memory reservations in a dedicated function to improve
> > robustness against future changes. Having the early reservations in one
> > place also makes it clearer what memory must be reserved before we allow
> > memblock allocations.
> 
> Would it make sense to have a check with a WARN or so to catch early
> reservations which get added after memblock allocations have been
> allowed? To catch people who don't pay attention...

This would make sense but it's tricky. From memblock perspective,
allocations are always allowed and it is the user responsibility to ensure
all the early reservations are done before allocating memory.

So adding such a WARN would require a new memblock API and it's adoption by
all architectures, which is way beyond the scope of this series :)

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-25 14:59   ` Borislav Petkov
@ 2021-01-25 15:33     ` Mike Rapoport
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Rapoport @ 2021-01-25 15:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andrew Morton, Andrea Arcangeli, Baoquan He, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Qian Cai, Thomas Gleixner, Vlastimil Babka,
	linux-kernel, linux-mm, x86

On Mon, Jan 25, 2021 at 03:59:11PM +0100, Borislav Petkov wrote:
> On Fri, Jan 15, 2021 at 10:32:54AM +0200, Mike Rapoport wrote:
> > +	trim_low_memory_range();
> 
> Btw, you can get rid of that one too:
> 
> /*
>  * Here we put platform-specific memory range workarounds, i.e.
>  * memory known to be corrupt or otherwise in need to be reserved on
>  * specific platforms.
>  *
>  * If this gets used more widely it could use a real dispatch mechanism.
>  */
> static void __init trim_platform_memory_ranges(void)
> {
>         trim_snb_memory();
> }
> 
> yeah, yeah, we can do a real dispatch mechanism but we didn't need one
> since 2012 so I guess we can get rid of trim_platform_memory_ranges()
> and call trim_snb_memory() directly and simplify it even more.

Ok.
 
> Thx.
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] x86/setup: consolidate early memory reservations
  2021-01-25 15:31     ` Mike Rapoport
@ 2021-01-25 16:56       ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2021-01-25 16:56 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Andrea Arcangeli, Baoquan He, David Hildenbrand,
	H. Peter Anvin, Ingo Molnar, Mel Gorman, Michal Hocko,
	Mike Rapoport, Qian Cai, Thomas Gleixner, Vlastimil Babka,
	linux-kernel, linux-mm, x86

On Mon, Jan 25, 2021 at 05:31:14PM +0200, Mike Rapoport wrote:
> This would make sense but it's tricky. From memblock perspective,
> allocations are always allowed and it is the user responsibility to ensure
> all the early reservations are done before allocating memory.

Yah, I don't trust my users to know that for sure...

> So adding such a WARN would require a new memblock API and it's adoption by
> all architectures, which is way beyond the scope of this series :)

So definitely not for those series but I could imagine something like

memblock_reserve()
	
	if (memblock_allocations_allowed())
		WARN

or so. This way you don't need to touch the archtectures. It all depends
on what the other arches need/use.

Or you could even make that a new memblock_reserve_warn() thing or so
and wrap that functionality in it and have x86 call it only...

Anyway, something to that effect.

As to those two patches, you can add

Acked-by: Borislav Petkov <bp@suse.de>

to the next revision since akpm is going to take them.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-25 16:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-15  8:32 [PATCH 0/2] x86/setup: consolidate early memory reservations Mike Rapoport
2021-01-15  8:32 ` [PATCH 1/2] " Mike Rapoport
2021-01-15 10:10   ` David Hildenbrand
2021-01-25 14:50   ` Borislav Petkov
2021-01-25 15:31     ` Mike Rapoport
2021-01-25 16:56       ` Borislav Petkov
2021-01-25 14:59   ` Borislav Petkov
2021-01-25 15:33     ` Mike Rapoport
2021-01-15  8:32 ` [PATCH 2/2] x86/setup: merge several reservations of start of the memory Mike Rapoport
2021-01-15 10:09   ` David Hildenbrand
2021-01-25 14:55   ` Borislav Petkov
2021-01-15 11:42 ` [PATCH 0/2] x86/setup: consolidate early memory reservations Baoquan He
2021-01-15 11:56   ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).