All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: SME: Kexec/kdump memory loading fix
@ 2019-06-14 21:15 Lendacky, Thomas
  2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
  2019-06-14 21:15 ` [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption Lendacky, Thomas
  0 siblings, 2 replies; 12+ messages in thread
From: Lendacky, Thomas @ 2019-06-14 21:15 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra

This series addresses an issue related to kexec/kdump when SME is active.
The SME support uses a workarea located after the end of the kernel to
perform "in-place" encryption of the kernel. When kexec/kdump is used, it
is possible that some other data used by kexec/kdump could be in this area
of memory which would cause the kexec/kdump of the kernel to fail.

Create a section for SME in vmlinux.lds.S that is positioned after "_end",
so that the memory it occupies will be reclaimed after its use during
boot. Since it is part of the kernel image, there is no worry now that
kexec/kdump will place data in the SME workarea when installing the kexec/
kdump kernel. As part of this fix, clarify what occupied kernel memory is
reserved and what parts of the kernel memory are discarded.

The following patches are included:
- Identify and document what parts of the kernel image are reserved (saved)
  and what is discarded.
- Create a new SME workarea section that will be reclaimed after its use
  during boot, thus allow

This patch series is based on tip/master.

---

Tom Lendacky (2):
  x86/mm: Identify the end of the kernel area to be reserved
  x86/mm: Create an SME workarea in the kernel for early encryption

 arch/x86/include/asm/sections.h    |  2 ++
 arch/x86/kernel/setup.c            |  8 +++++++-
 arch/x86/kernel/vmlinux.lds.S      | 33 +++++++++++++++++++++++++++++-
 arch/x86/mm/mem_encrypt_identity.c | 22 ++++++++++++++++++--
 4 files changed, 61 insertions(+), 4 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-14 21:15 [PATCH 0/2] x86: SME: Kexec/kdump memory loading fix Lendacky, Thomas
@ 2019-06-14 21:15 ` Lendacky, Thomas
  2019-06-14 22:06   ` Dave Hansen
                     ` (3 more replies)
  2019-06-14 21:15 ` [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption Lendacky, Thomas
  1 sibling, 4 replies; 12+ messages in thread
From: Lendacky, Thomas @ 2019-06-14 21:15 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

The memory occupied by the kernel is reserved using memblock_reserve()
in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
Everything after __bss_stop must be specifically reserved otherwise it
is discarded. This is not clearly documented.

Add a new symbol, __end_of_kernel_reserve, that more readily identifies
what is reserved, along with comments that indicate what is reserved,
what is discarded and what needs to be done to prevent a section from
being discarded.

Cc: Baoquan He <bhe@redhat.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/sections.h | 2 ++
 arch/x86/kernel/setup.c         | 8 +++++++-
 arch/x86/kernel/vmlinux.lds.S   | 9 ++++++++-
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 8ea1cfdbeabc..71b32f2570ab 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -13,4 +13,6 @@ extern char __end_rodata_aligned[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+extern char __end_of_kernel_reserve[];
+
 #endif	/* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 08a5f4a131f5..32eb70625b3b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -827,8 +827,14 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 
 void __init setup_arch(char **cmdline_p)
 {
+	/*
+	 * Reserve the memory occupied by the kernel between _text and
+	 * __end_of_kernel_reserve symbols. Any kernel sections after the
+	 * __end_of_kernel_reserve symbol must be explicity reserved with a
+	 * separate memblock_reserve() or it will be discarded.
+	 */
 	memblock_reserve(__pa_symbol(_text),
-			 (unsigned long)__bss_stop - (unsigned long)_text);
+			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
 
 	/*
 	 * Make sure page 0 is always reserved because on systems with
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 0850b5149345..ca2252ca6ad7 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -368,6 +368,14 @@ SECTIONS
 		__bss_stop = .;
 	}
 
+	/*
+	 * The memory occupied from _text to here, __end_of_kernel_reserve, is
+	 * automatically reserved in setup_arch(). Anything after here must be
+	 * explicitly reserved using memblock_reserve() or it will be discarded
+	 * and treated as available memory.
+	 */
+	__end_of_kernel_reserve = .;
+
 	. = ALIGN(PAGE_SIZE);
 	.brk : AT(ADDR(.brk) - LOAD_OFFSET) {
 		__brk_base = .;
@@ -382,7 +390,6 @@ SECTIONS
 	STABS_DEBUG
 	DWARF_DEBUG
 
-	/* Sections to be discarded */
 	DISCARDS
 	/DISCARD/ : {
 		*(.eh_frame)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption
  2019-06-14 21:15 [PATCH 0/2] x86: SME: Kexec/kdump memory loading fix Lendacky, Thomas
  2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
@ 2019-06-14 21:15 ` Lendacky, Thomas
  2019-06-17 11:02   ` Borislav Petkov
  1 sibling, 1 reply; 12+ messages in thread
From: Lendacky, Thomas @ 2019-06-14 21:15 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

In order for the kernel to be encrypted "in place" during boot, a workarea
outside of the kernel must be used. This SME workarea used during early
encryption of the kernel is situated on a 2MB boundary after the end of
the kernel text, data, etc. sections (_end).  This works well during
initial boot of a compressed kernel because of the relocation used for
decompression of the kernel. But when performing a kexec boot, there's a
chance that the SME workarea may not be mapped by the kexec pagetables or
that some of the other data used by kexec could exist in this range.

Create a section for SME in vmlinux.lds.S. Position it after "_end", which
is after "__end_of_kernel_reserve", so that the memory will be reclaimed
during boot and since this area is all zeroes, it compresses well. This
new section will be part of the kernel image, so kexec will account for it
in pagetable mappings and placement of data after the kernel.

Here's an example of a kernel size without and with the SME section:
	without:
		vmlinux:	36,501,616
		bzImage:	 6,497,344

		100000000-47f37ffff : System RAM
		  1e4000000-1e47677d4 : Kernel code	(0x7677d4)
		  1e47677d5-1e4e2e0bf : Kernel data	(0x6c68ea)
		  1e5074000-1e5372fff : Kernel bss	(0x2fefff)

	with:
		vmlinux:	44,419,408
		bzImage:	 6,503,136

		880000000-c7ff7ffff : System RAM
		  8cf000000-8cf7677d4 : Kernel code	(0x7677d4)
		  8cf7677d5-8cfe2e0bf : Kernel data	(0x6c68ea)
		  8d0074000-8d0372fff : Kernel bss	(0x2fefff)

Cc: Baoquan He <bhe@redhat.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/vmlinux.lds.S      | 24 ++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_identity.c | 22 ++++++++++++++++++++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index ca2252ca6ad7..a7aa65b44c71 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -387,6 +387,30 @@ SECTIONS
 	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
 	_end = .;
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	/*
+	 * SME workarea section: Lives outside of the kernel proper (_text -
+	 * _end) for performing in-place encryption of the kernel during boot.
+	 *
+	 * Resides after _end because even though the .brk section is after
+	 * __end_of_kernel_reserve, the .brk section is later reserved as a
+	 * part of the kernel. It is used in very early boot code and not
+	 * needed after that, so it is located after __end_of_kernel_reserve
+	 * so that it will be discarded and become part of the available
+	 * memory.
+	 *
+	 * Resides on a 2MB boundary to simplify the pagetable setup used for
+	 * the encryption.
+	 */
+	. = ALIGN(HPAGE_SIZE);
+	.sme : AT(ADDR(.sme) - LOAD_OFFSET) {
+		__sme_begin = .;
+		*(.sme)
+		. = ALIGN(HPAGE_SIZE);
+		__sme_end = .;
+	}
+#endif
+
 	STABS_DEBUG
 	DWARF_DEBUG
 
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index 4aa9b1480866..c55c2ec8fb12 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -73,6 +73,19 @@ struct sme_populate_pgd_data {
 	unsigned long vaddr_end;
 };
 
+/*
+ * This work area lives in the .sme section, which lives outside of
+ * the kernel proper. It is sized to hold the intermediate copy buffer
+ * and more than enough pagetable pages.
+ *
+ * By using this section, the kernel can be encrypted in place and we
+ * avoid any possibility of boot parameters or initramfs images being
+ * placed such that the in-place encryption logic overwrites them.  This
+ * section is 2MB aligned to allow for simple pagetable setup using only
+ * PMD entries (see vmlinux.lds.S).
+ */
+static char sme_workarea[2 * PMD_PAGE_SIZE] __section(.sme);
+
 static char sme_cmdline_arg[] __initdata = "mem_encrypt";
 static char sme_cmdline_on[]  __initdata = "on";
 static char sme_cmdline_off[] __initdata = "off";
@@ -314,8 +327,13 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
 	}
 #endif
 
-	/* Set the encryption workarea to be immediately after the kernel */
-	workarea_start = kernel_end;
+	/*
+	 * We're running identity mapped, so we must obtain the address to the
+	 * SME encryption workarea using rip-relative addressing.
+	 */
+	asm ("lea sme_workarea(%%rip), %0"
+	     : "=r" (workarea_start)
+	     : "p" (sme_workarea));
 
 	/*
 	 * Calculate required number of workarea bytes needed:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
@ 2019-06-14 22:06   ` Dave Hansen
  2019-06-16 12:03   ` lijiang
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2019-06-14 22:06 UTC (permalink / raw)
  To: Lendacky, Thomas, linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

On 6/14/19 2:15 PM, Lendacky, Thomas wrote:
> +	/*
> +	 * The memory occupied from _text to here, __end_of_kernel_reserve, is
> +	 * automatically reserved in setup_arch(). Anything after here must be
> +	 * explicitly reserved using memblock_reserve() or it will be discarded
> +	 * and treated as available memory.
> +	 */
> +	__end_of_kernel_reserve = .;

This new stuff looks really nice to me, including the comments.  Thanks
for doing those!

For both patches:

Reviewed-by: Dave Hansen <dave.hansen@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
  2019-06-14 22:06   ` Dave Hansen
@ 2019-06-16 12:03   ` lijiang
  2019-06-17  1:54   ` Baoquan He
  2019-06-17 10:47   ` Borislav Petkov
  3 siblings, 0 replies; 12+ messages in thread
From: lijiang @ 2019-06-16 12:03 UTC (permalink / raw)
  To: Lendacky, Thomas, linux-kernel, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He

After applied the patch series(v2), the kexec-d kernel and the kdump kernel can
successfully boot.

Thanks.

Tested-by: Lianbo Jiang <lijiang@redhat.com>

在 2019年06月15日 05:15, Lendacky, Thomas 写道:
> The memory occupied by the kernel is reserved using memblock_reserve()
> in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
> Everything after __bss_stop must be specifically reserved otherwise it
> is discarded. This is not clearly documented.
> 
> Add a new symbol, __end_of_kernel_reserve, that more readily identifies
> what is reserved, along with comments that indicate what is reserved,
> what is discarded and what needs to be done to prevent a section from
> being discarded.
> 
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Lianbo Jiang <lijiang@redhat.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/sections.h | 2 ++
>  arch/x86/kernel/setup.c         | 8 +++++++-
>  arch/x86/kernel/vmlinux.lds.S   | 9 ++++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> index 8ea1cfdbeabc..71b32f2570ab 100644
> --- a/arch/x86/include/asm/sections.h
> +++ b/arch/x86/include/asm/sections.h
> @@ -13,4 +13,6 @@ extern char __end_rodata_aligned[];
>  extern char __end_rodata_hpage_align[];
>  #endif
>  
> +extern char __end_of_kernel_reserve[];
> +
>  #endif	/* _ASM_X86_SECTIONS_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 08a5f4a131f5..32eb70625b3b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -827,8 +827,14 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	/*
> +	 * Reserve the memory occupied by the kernel between _text and
> +	 * __end_of_kernel_reserve symbols. Any kernel sections after the
> +	 * __end_of_kernel_reserve symbol must be explicity reserved with a
> +	 * separate memblock_reserve() or it will be discarded.
> +	 */
>  	memblock_reserve(__pa_symbol(_text),
> -			 (unsigned long)__bss_stop - (unsigned long)_text);
> +			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
>  
>  	/*
>  	 * Make sure page 0 is always reserved because on systems with
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 0850b5149345..ca2252ca6ad7 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -368,6 +368,14 @@ SECTIONS
>  		__bss_stop = .;
>  	}
>  
> +	/*
> +	 * The memory occupied from _text to here, __end_of_kernel_reserve, is
> +	 * automatically reserved in setup_arch(). Anything after here must be
> +	 * explicitly reserved using memblock_reserve() or it will be discarded
> +	 * and treated as available memory.
> +	 */
> +	__end_of_kernel_reserve = .;
> +
>  	. = ALIGN(PAGE_SIZE);
>  	.brk : AT(ADDR(.brk) - LOAD_OFFSET) {
>  		__brk_base = .;
> @@ -382,7 +390,6 @@ SECTIONS
>  	STABS_DEBUG
>  	DWARF_DEBUG
>  
> -	/* Sections to be discarded */
>  	DISCARDS
>  	/DISCARD/ : {
>  		*(.eh_frame)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
  2019-06-14 22:06   ` Dave Hansen
  2019-06-16 12:03   ` lijiang
@ 2019-06-17  1:54   ` Baoquan He
  2019-06-17 10:47   ` Borislav Petkov
  3 siblings, 0 replies; 12+ messages in thread
From: Baoquan He @ 2019-06-17  1:54 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Lianbo Jiang

On 06/14/19 at 09:15pm, Lendacky, Thomas wrote:
> The memory occupied by the kernel is reserved using memblock_reserve()
> in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
> Everything after __bss_stop must be specifically reserved otherwise it
> is discarded. This is not clearly documented.
> 
> Add a new symbol, __end_of_kernel_reserve, that more readily identifies
> what is reserved, along with comments that indicate what is reserved,
> what is discarded and what needs to be done to prevent a section from
> being discarded.
> 
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Lianbo Jiang <lijiang@redhat.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/sections.h | 2 ++
>  arch/x86/kernel/setup.c         | 8 +++++++-
>  arch/x86/kernel/vmlinux.lds.S   | 9 ++++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> index 8ea1cfdbeabc..71b32f2570ab 100644
> --- a/arch/x86/include/asm/sections.h
> +++ b/arch/x86/include/asm/sections.h
> @@ -13,4 +13,6 @@ extern char __end_rodata_aligned[];
>  extern char __end_rodata_hpage_align[];
>  #endif
>  
> +extern char __end_of_kernel_reserve[];
> +
>  #endif	/* _ASM_X86_SECTIONS_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 08a5f4a131f5..32eb70625b3b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -827,8 +827,14 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	/*
> +	 * Reserve the memory occupied by the kernel between _text and
> +	 * __end_of_kernel_reserve symbols. Any kernel sections after the
> +	 * __end_of_kernel_reserve symbol must be explicity reserved with a
> +	 * separate memblock_reserve() or it will be discarded.
> +	 */
>  	memblock_reserve(__pa_symbol(_text),
> -			 (unsigned long)__bss_stop - (unsigned long)_text);
> +			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
>  
>  	/*
>  	 * Make sure page 0 is always reserved because on systems with
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 0850b5149345..ca2252ca6ad7 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -368,6 +368,14 @@ SECTIONS
>  		__bss_stop = .;
>  	}
>  
> +	/*
> +	 * The memory occupied from _text to here, __end_of_kernel_reserve, is
> +	 * automatically reserved in setup_arch(). Anything after here must be
> +	 * explicitly reserved using memblock_reserve() or it will be discarded
> +	 * and treated as available memory.
> +	 */
> +	__end_of_kernel_reserve = .;
> +
>  	. = ALIGN(PAGE_SIZE);
>  	.brk : AT(ADDR(.brk) - LOAD_OFFSET) {
>  		__brk_base = .;
> @@ -382,7 +390,6 @@ SECTIONS
>  	STABS_DEBUG
>  	DWARF_DEBUG
>  
> -	/* Sections to be discarded */
>  	DISCARDS
>  	/DISCARD/ : {
>  		*(.eh_frame)

Looks good to me, thanks. To the series,

Reviewed-by: Baoquan He <bhe@redhat.com>

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
                     ` (2 preceding siblings ...)
  2019-06-17  1:54   ` Baoquan He
@ 2019-06-17 10:47   ` Borislav Petkov
  2019-06-18  1:43     ` Lendacky, Thomas
  3 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2019-06-17 10:47 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

On Fri, Jun 14, 2019 at 09:15:18PM +0000, Lendacky, Thomas wrote:
> The memory occupied by the kernel is reserved using memblock_reserve()
> in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
> Everything after __bss_stop must be specifically reserved otherwise it
> is discarded. This is not clearly documented.

Hmm, so I see this in arch/x86/kernel/vmlinux.lds.S after _end:

        _end = .;

        STABS_DEBUG
        DWARF_DEBUG

        /* Sections to be discarded */
        DISCARDS
        /DISCARD/ : {
                *(.eh_frame)
        }

and over DISCARDS:

/*
 * Default discarded sections.
 *
 * Some archs want to discard exit text/data at runtime rather than
 * link time due to cross-section references such as alt instructions,
 * bug table, eh_frame, etc.  DISCARDS must be the last of output
 * section definitions so that such archs put those in earlier section
 * definitions.
 */
#define DISCARDS

That sounds like it is documented to me, or do you mean something else?

> Add a new symbol, __end_of_kernel_reserve, that more readily identifies
> what is reserved, along with comments that indicate what is reserved,
> what is discarded and what needs to be done to prevent a section from
> being discarded.
> 
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Lianbo Jiang <lijiang@redhat.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/sections.h | 2 ++
>  arch/x86/kernel/setup.c         | 8 +++++++-
>  arch/x86/kernel/vmlinux.lds.S   | 9 ++++++++-
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
> index 8ea1cfdbeabc..71b32f2570ab 100644
> --- a/arch/x86/include/asm/sections.h
> +++ b/arch/x86/include/asm/sections.h
> @@ -13,4 +13,6 @@ extern char __end_rodata_aligned[];
>  extern char __end_rodata_hpage_align[];
>  #endif
>  
> +extern char __end_of_kernel_reserve[];
> +
>  #endif	/* _ASM_X86_SECTIONS_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 08a5f4a131f5..32eb70625b3b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -827,8 +827,14 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	/*
> +	 * Reserve the memory occupied by the kernel between _text and
> +	 * __end_of_kernel_reserve symbols. Any kernel sections after the
> +	 * __end_of_kernel_reserve symbol must be explicity reserved with a
> +	 * separate memblock_reserve() or it will be discarded.

s/it/they/

> +	 */
>  	memblock_reserve(__pa_symbol(_text),
> -			 (unsigned long)__bss_stop - (unsigned long)_text);
> +			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
>  
>  	/*
>  	 * Make sure page 0 is always reserved because on systems with
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 0850b5149345..ca2252ca6ad7 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -368,6 +368,14 @@ SECTIONS
>  		__bss_stop = .;
>  	}
>  
> +	/*
> +	 * The memory occupied from _text to here, __end_of_kernel_reserve, is
> +	 * automatically reserved in setup_arch(). Anything after here must be
> +	 * explicitly reserved using memblock_reserve() or it will be discarded
> +	 * and treated as available memory.
> +	 */
> +	__end_of_kernel_reserve = .;
> +
>  	. = ALIGN(PAGE_SIZE);
>  	.brk : AT(ADDR(.brk) - LOAD_OFFSET) {
>  		__brk_base = .;
> @@ -382,7 +390,6 @@ SECTIONS
>  	STABS_DEBUG
>  	DWARF_DEBUG
>  
> -	/* Sections to be discarded */

Huh?

They're called DISCARD* ...

>       DISCARDS
>       /DISCARD/ : {
>               *(.eh_frame)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption
  2019-06-14 21:15 ` [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption Lendacky, Thomas
@ 2019-06-17 11:02   ` Borislav Petkov
  2019-06-18  1:49     ` Lendacky, Thomas
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2019-06-17 11:02 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

On Fri, Jun 14, 2019 at 09:15:19PM +0000, Lendacky, Thomas wrote:
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index ca2252ca6ad7..a7aa65b44c71 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -387,6 +387,30 @@ SECTIONS
>  	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
>  	_end = .;
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/*
> +	 * SME workarea section: Lives outside of the kernel proper (_text -
> +	 * _end) for performing in-place encryption of the kernel during boot.
> +	 *
> +	 * Resides after _end because even though the .brk section is after
> +	 * __end_of_kernel_reserve, the .brk section is later reserved as a
> +	 * part of the kernel. It is used in very early boot code and not
> +	 * needed after that, so it is located after __end_of_kernel_reserve
> +	 * so that it will be discarded and become part of the available
> +	 * memory.
> +	 *
> +	 * Resides on a 2MB boundary to simplify the pagetable setup used for
> +	 * the encryption.
> +	 */
> +	. = ALIGN(HPAGE_SIZE);
> +	.sme : AT(ADDR(.sme) - LOAD_OFFSET) {

Should we call that section something more generic as

	.early_scratch

or so?

Someone else might need something like that too, in the future...

Also, the DISCARDS sections do get freed at runtime so why not make it
part of the DISCARD section...?

> +		__sme_begin = .;
> +		*(.sme)
> +		. = ALIGN(HPAGE_SIZE);
> +		__sme_end = .;
> +	}
> +#endif
> +
>  	STABS_DEBUG
>  	DWARF_DEBUG
>  
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 4aa9b1480866..c55c2ec8fb12 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -73,6 +73,19 @@ struct sme_populate_pgd_data {
>  	unsigned long vaddr_end;
>  };
>  
> +/*
> + * This work area lives in the .sme section, which lives outside of
> + * the kernel proper. It is sized to hold the intermediate copy buffer
> + * and more than enough pagetable pages.
> + *
> + * By using this section, the kernel can be encrypted in place and we

replace that "we" with an impartial passive formulation.

Other than that, I like the commenting, very helpful!

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-17 10:47   ` Borislav Petkov
@ 2019-06-18  1:43     ` Lendacky, Thomas
  2019-06-18  9:37       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Lendacky, Thomas @ 2019-06-18  1:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang



On 6/17/19 5:47 AM, Borislav Petkov wrote:
> On Fri, Jun 14, 2019 at 09:15:18PM +0000, Lendacky, Thomas wrote:
>> The memory occupied by the kernel is reserved using memblock_reserve()
>> in setup_arch(). Currently, the area is from symbols _text to __bss_stop.
>> Everything after __bss_stop must be specifically reserved otherwise it
>> is discarded. This is not clearly documented.
> 
> Hmm, so I see this in arch/x86/kernel/vmlinux.lds.S after _end:
> 
>         _end = .;
> 
>         STABS_DEBUG
>         DWARF_DEBUG
> 
>         /* Sections to be discarded */
>         DISCARDS
>         /DISCARD/ : {
>                 *(.eh_frame)
>         }
> 
> and over DISCARDS:
> 
> /*
>  * Default discarded sections.
>  *
>  * Some archs want to discard exit text/data at runtime rather than
>  * link time due to cross-section references such as alt instructions,
>  * bug table, eh_frame, etc.  DISCARDS must be the last of output
>  * section definitions so that such archs put those in earlier section
>  * definitions.
>  */
> #define DISCARDS
> 
> That sounds like it is documented to me, or do you mean something else?

Yes and no...  it doesn't say how it is done, namely through the use of
memblock_reserve() calls and when and where those occur.

> 
>> Add a new symbol, __end_of_kernel_reserve, that more readily identifies
>> what is reserved, along with comments that indicate what is reserved,
>> what is discarded and what needs to be done to prevent a section from
>> being discarded.
>>
>> Cc: Baoquan He <bhe@redhat.com>
>> Cc: Lianbo Jiang <lijiang@redhat.com>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/sections.h | 2 ++
>>  arch/x86/kernel/setup.c         | 8 +++++++-
>>  arch/x86/kernel/vmlinux.lds.S   | 9 ++++++++-
>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
>> index 8ea1cfdbeabc..71b32f2570ab 100644
>> --- a/arch/x86/include/asm/sections.h
>> +++ b/arch/x86/include/asm/sections.h
>> @@ -13,4 +13,6 @@ extern char __end_rodata_aligned[];
>>  extern char __end_rodata_hpage_align[];
>>  #endif
>>  
>> +extern char __end_of_kernel_reserve[];
>> +
>>  #endif	/* _ASM_X86_SECTIONS_H */
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 08a5f4a131f5..32eb70625b3b 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -827,8 +827,14 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
>>  
>>  void __init setup_arch(char **cmdline_p)
>>  {
>> +	/*
>> +	 * Reserve the memory occupied by the kernel between _text and
>> +	 * __end_of_kernel_reserve symbols. Any kernel sections after the
>> +	 * __end_of_kernel_reserve symbol must be explicity reserved with a
>> +	 * separate memblock_reserve() or it will be discarded.
> 
> s/it/they/
> 
>> +	 */
>>  	memblock_reserve(__pa_symbol(_text),
>> -			 (unsigned long)__bss_stop - (unsigned long)_text);
>> +			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
>>  
>>  	/*
>>  	 * Make sure page 0 is always reserved because on systems with
>> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>> index 0850b5149345..ca2252ca6ad7 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -368,6 +368,14 @@ SECTIONS
>>  		__bss_stop = .;
>>  	}
>>  
>> +	/*
>> +	 * The memory occupied from _text to here, __end_of_kernel_reserve, is
>> +	 * automatically reserved in setup_arch(). Anything after here must be
>> +	 * explicitly reserved using memblock_reserve() or it will be discarded
>> +	 * and treated as available memory.
>> +	 */
>> +	__end_of_kernel_reserve = .;
>> +
>>  	. = ALIGN(PAGE_SIZE);
>>  	.brk : AT(ADDR(.brk) - LOAD_OFFSET) {
>>  		__brk_base = .;
>> @@ -382,7 +390,6 @@ SECTIONS
>>  	STABS_DEBUG
>>  	DWARF_DEBUG
>>  
>> -	/* Sections to be discarded */
> 
> Huh?
> 
> They're called DISCARD* ...

The comment above is more explicit about what will be discarded and
how not to have it discarded, so I removed this comment.

Thanks,
Tom

> 
>>       DISCARDS
>>       /DISCARD/ : {
>>               *(.eh_frame)
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption
  2019-06-17 11:02   ` Borislav Petkov
@ 2019-06-18  1:49     ` Lendacky, Thomas
  2019-06-18 10:11       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Lendacky, Thomas @ 2019-06-18  1:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

On 6/17/19 6:02 AM, Borislav Petkov wrote:
> On Fri, Jun 14, 2019 at 09:15:19PM +0000, Lendacky, Thomas wrote:
>> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
>> index ca2252ca6ad7..a7aa65b44c71 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -387,6 +387,30 @@ SECTIONS
>>  	. = ALIGN(PAGE_SIZE);		/* keep VO_INIT_SIZE page aligned */
>>  	_end = .;
>>  
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +	/*
>> +	 * SME workarea section: Lives outside of the kernel proper (_text -
>> +	 * _end) for performing in-place encryption of the kernel during boot.
>> +	 *
>> +	 * Resides after _end because even though the .brk section is after
>> +	 * __end_of_kernel_reserve, the .brk section is later reserved as a
>> +	 * part of the kernel. It is used in very early boot code and not
>> +	 * needed after that, so it is located after __end_of_kernel_reserve
>> +	 * so that it will be discarded and become part of the available
>> +	 * memory.
>> +	 *
>> +	 * Resides on a 2MB boundary to simplify the pagetable setup used for
>> +	 * the encryption.
>> +	 */
>> +	. = ALIGN(HPAGE_SIZE);
>> +	.sme : AT(ADDR(.sme) - LOAD_OFFSET) {
> 
> Should we call that section something more generic as
> 
> 	.early_scratch
> 
> or so?
> 
> Someone else might need something like that too, in the future...

Whoever uses it in the future could rename it if desired.  But I can do
that now. Is there a preferred name?  I can leave it as .early_scratch
or .early_workarea.

> 
> Also, the DISCARDS sections do get freed at runtime so why not make it
> part of the DISCARD section...?

I think it's easier to show the alignment requirements that SME has for
this section by having it be its own section.

> 
>> +		__sme_begin = .;
>> +		*(.sme)
>> +		. = ALIGN(HPAGE_SIZE);
>> +		__sme_end = .;
>> +	}
>> +#endif
>> +
>>  	STABS_DEBUG
>>  	DWARF_DEBUG
>>  
>> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
>> index 4aa9b1480866..c55c2ec8fb12 100644
>> --- a/arch/x86/mm/mem_encrypt_identity.c
>> +++ b/arch/x86/mm/mem_encrypt_identity.c
>> @@ -73,6 +73,19 @@ struct sme_populate_pgd_data {
>>  	unsigned long vaddr_end;
>>  };
>>  
>> +/*
>> + * This work area lives in the .sme section, which lives outside of
>> + * the kernel proper. It is sized to hold the intermediate copy buffer
>> + * and more than enough pagetable pages.
>> + *
>> + * By using this section, the kernel can be encrypted in place and we
> 
> replace that "we" with an impartial passive formulation.

Ok.

> 
> Other than that, I like the commenting, very helpful!

I'll send out a V3 with the comments addressed (after giving a bit of time
for name suggestions).

Thanks,
Tom

> 
> Thx.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved
  2019-06-18  1:43     ` Lendacky, Thomas
@ 2019-06-18  9:37       ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2019-06-18  9:37 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

On Tue, Jun 18, 2019 at 01:43:00AM +0000, Lendacky, Thomas wrote:
> Yes and no...  it doesn't say how it is done, namely through the use of
> memblock_reserve() calls and when and where those occur.

Ah ok, so you found that out and documented it now. Good.

:-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption
  2019-06-18  1:49     ` Lendacky, Thomas
@ 2019-06-18 10:11       ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2019-06-18 10:11 UTC (permalink / raw)
  To: Lendacky, Thomas
  Cc: linux-kernel, x86, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Baoquan He, Lianbo Jiang

On Tue, Jun 18, 2019 at 01:49:13AM +0000, Lendacky, Thomas wrote:
> Whoever uses it in the future could rename it if desired.  But I can do
> that now. Is there a preferred name?  I can leave it as .early_scratch
> or .early_workarea.

So looking at readelf output of vmlinux, we already have .init.*
sections for stuff which gets freed after booting but I'm guessing we
can't have the SME scratch area in the middle because you need to be
able to say which range gets encrypted without encrypting the scratch
area itself...

But you could call it .init.scratch or so, so that it fits with the
already existing naming nomenclature for ranges which get freed after
init.

> I think it's easier to show the alignment requirements that SME has for
> this section by having it be its own section.

Not only that, from ld.info:

"   The special output section name '/DISCARD/' may be used to discard
input sections.  Any input sections which are assigned to an output
section named '/DISCARD/' are not included in the output file."

but you want that section present in the output file.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-18 10:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-14 21:15 [PATCH 0/2] x86: SME: Kexec/kdump memory loading fix Lendacky, Thomas
2019-06-14 21:15 ` [PATCH v2 1/2] x86/mm: Identify the end of the kernel area to be reserved Lendacky, Thomas
2019-06-14 22:06   ` Dave Hansen
2019-06-16 12:03   ` lijiang
2019-06-17  1:54   ` Baoquan He
2019-06-17 10:47   ` Borislav Petkov
2019-06-18  1:43     ` Lendacky, Thomas
2019-06-18  9:37       ` Borislav Petkov
2019-06-14 21:15 ` [PATCH v2 2/2] x86/mm: Create an SME workarea in the kernel for early encryption Lendacky, Thomas
2019-06-17 11:02   ` Borislav Petkov
2019-06-18  1:49     ` Lendacky, Thomas
2019-06-18 10:11       ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.