linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot
@ 2023-06-07  7:23 Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code Ard Biesheuvel
                   ` (19 more replies)
  0 siblings, 20 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Update the x86 boot path to avoid the bare metal decompressor when
booting via the EFI stub. The bare metal decompressor inherits the
loader's 1:1 mapping of DRAM when entering in 64-bit mode, and assumes
that all of it is mapped read/write/execute, which will no longer be the
case on systems built to comply with recently tightened logo
requirements (*).

NOTE: this series depends on Thomas's __KERNEL_CS switch fix [8], which
is already queued in -tip as a fix.

Changes since v4 [7]:
- avoid CPUID calls after protocol negotiation but before configuring
  exception handling;
- drop patch removing redundant RSI pushes and pops from
  arch/x86/kernel/head_64.S
- rebase onto -tip x86/cc - the conflicts are mostly trivial and
  restricted to the last 4 patches in the series, so applying this onto
  a separate topic branch should be straight-forward as well.

Changes since v3 [6]:
- trivial rebase onto Kirill's unaccepted memory series v13
- test SNP feature mask while running in the EFI boot services, and fail
  gracefully on a mismatch
- perform only the SEV init after ExitBootServices()

Changes since v2 [4]:
- update prose style to comply with -tip guidelines
- rebased onto Kirill's unaccepted memory series [3]
- add Kirill's ack to 4/5-level paging changes
- perform SEV init and SNP feature check after ExitBootServices(), to
  avoid corrupting the firmware's own SEV state
- split out preparatory refactor of handover entry code and BSS clearing
  (patches #1 to #4)

Changes since v1 [2]:
- streamline existing 4/5 level switching code and call it directly from
  the EFI stub - this is covered by the first 9 patches, which can be
  applied in isolation, if desired;
- deal with SEV/SNP init explicitly;
- clear BSS when booting via the 'handover protocol'
- switch to kernel CS before calling SEV init code in kernel proper.

---- v1 cover letter follows ----

This series is conceptually a combination of Evgeny's series [0] and
mine [1], both of which attempt to make the early decompressor code more
amenable to executing in the EFI environment with stricter handling of
memory permissions.

My series [1] implemented zboot for x86, by getting rid of the entire
x86 decompressor, and replacing it with existing EFI code that does the
same but in a generic way. The downside of this is that only EFI boot is
supported, making it unviable for distros, which need to support BIOS
boot and hybrid EFI boot modes that omit the EFI stub.

Evgeny's series [0] adapted the entire decompressor code flow to allow
it to execute in the EFI context as well as the bare metal context, and
this involves changes to the 1:1 mapping code and the page fault
handlers etc, none of which are really needed when doing EFI boot in the
first place.

So this series attempts to occupy the middle ground here: it makes
minimal changes to the existing decompressor so some of it can be called
from the EFI stub. Then, it reimplements the EFI boot flow to decompress
the kernel and boot it directly, without relying on the trampoline
allocation code, page table code or page fault handling code. This
allows us to get rid of quite a bit of unsavory EFI stub code, and
replace it with two clear invocations of the EFI firmware APIs to clear
NX restrictions from allocations that have been populated with
executable code.

The only code that is being reused is the decompression library itself,
along with the minimal ELF parsing that is required to copy the ELF
segments in place, and the relocation processing that fixes up absolute
symbol references to refer to the correct virtual addresses.

Note that some of Evgeny's changes to clean up the PE/COFF header
generation will still be needed, but I've omitted those here for
brevity.

(*) IMHO the following developments are likely to occur:
- the Windows boot chain (along with 3rd party drivers) is cleaned up so
  that it never relies on memory being writable and executable at the
  same time when running under the EFI boot services;
- the EFI reference implementation gets updated to map all memory NX by
  default, and to require read-only permissions for executable mappings;
- BIOS vendors incorporate these changes into their codebases, and
  deploy it more widely than just the 'secure' SKUs;
- OEMs only care about the Windows sticker [5], so they only boot test
  Windows, which works fine in this more restricted context;
- Linux boot no longer works reliably on new hardware built for Windows
  unless we clean up our boot chain as well.

Cc: Evgeniy Baskov <baskov@ispras.ru>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Peter Jones <pjones@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joerg Roedel <jroedel@suse.de>

[0] https://lore.kernel.org/all/cover.1678785672.git.baskov@ispras.ru/
[1] https://lore.kernel.org/all/20230416120729.2470762-1-ardb@kernel.org/
[2] https://lore.kernel.org/all/20230424165726.2245548-1-ardb@kernel.org/
[3] https://lore.kernel.org/all/20230518231434.26080-1-kirill.shutemov@linux.intel.com/
[4] https://lore.kernel.org/all/20230508070330.582131-1-ardb@kernel.org/
[5] https://techcommunity.microsoft.com/t5/hardware-dev-center/new-uefi-ca-memory-mitigation-requirements-for-signing/ba-p/3608714
[6] https://lore.kernel.org/all/20230522071415.501717-1-ardb@kernel.org/
[7] https://lore.kernel.org/all/20230602101313.3557775-1-ardb@kernel.org/
[8] https://lore.kernel.org/all/6ff1f28af2829cc9aea357ebee285825f90a431f.1684340801.git.thomas.lendacky@amd.com/

Ard Biesheuvel (20):
  x86/efistub: Branch straight to kernel entry point from C code
  x86/efistub: Simplify and clean up handover entry code
  x86/decompressor: Avoid magic offsets for EFI handover entrypoint
  x86/efistub: Clear BSS in EFI handover protocol entrypoint
  x86/decompressor: Use proper sequence to take the address of the GOT
  x86/decompressor: Store boot_params pointer in callee save register
  x86/decompressor: Call trampoline as a normal function
  x86/decompressor: Use standard calling convention for trampoline
  x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
  x86/decompressor: Call trampoline directly from C code
  x86/decompressor: Only call the trampoline when changing paging levels
  x86/decompressor: Merge trampoline cleanup with switching code
  x86/efistub: Perform 4/5 level paging switch from the stub
  x86/efistub: Prefer EFI memory attributes protocol over DXE services
  decompress: Use 8 byte alignment
  x86/decompressor: Move global symbol references to C code
  x86/decompressor: Factor out kernel decompression and relocation
  efi/libstub: Add limit argument to efi_random_alloc()
  x86/efistub: Perform SNP feature test while running in the firmware
  x86/efistub: Avoid legacy decompressor when doing EFI boot

 Documentation/arch/x86/boot.rst                |   2 +-
 arch/x86/boot/compressed/Makefile              |   5 +
 arch/x86/boot/compressed/efi_mixed.S           | 107 +++-----
 arch/x86/boot/compressed/head_32.S             |  34 +--
 arch/x86/boot/compressed/head_64.S             | 222 ++++-----------
 arch/x86/boot/compressed/misc.c                |  44 ++-
 arch/x86/boot/compressed/pgtable.h             |   6 +-
 arch/x86/boot/compressed/pgtable_64.c          |  72 +++--
 arch/x86/boot/compressed/sev.c                 |  71 ++---
 arch/x86/include/asm/boot.h                    |   8 +
 arch/x86/include/asm/efi.h                     |   7 +-
 arch/x86/include/asm/sev.h                     |   6 +
 drivers/firmware/efi/libstub/Makefile          |   1 +
 drivers/firmware/efi/libstub/arm64-stub.c      |   2 +-
 drivers/firmware/efi/libstub/efi-stub-helper.c |   2 +
 drivers/firmware/efi/libstub/efistub.h         |   3 +-
 drivers/firmware/efi/libstub/randomalloc.c     |  10 +-
 drivers/firmware/efi/libstub/x86-5lvl.c        |  95 +++++++
 drivers/firmware/efi/libstub/x86-stub.c        | 285 +++++++++++---------
 drivers/firmware/efi/libstub/x86-stub.h        |  17 ++
 drivers/firmware/efi/libstub/zboot.c           |   2 +-
 include/linux/decompress/mm.h                  |   2 +-
 22 files changed, 511 insertions(+), 492 deletions(-)
 create mode 100644 drivers/firmware/efi/libstub/x86-5lvl.c
 create mode 100644 drivers/firmware/efi/libstub/x86-stub.h

-- 
2.39.2


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07 18:53   ` Borislav Petkov
  2023-06-07  7:23 ` [PATCH v5 02/20] x86/efistub: Simplify and clean up handover entry code Ard Biesheuvel
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Instead of returning to the calling code in assembler that does nothing
more than perform an indirect call with the boot_params pointer in
register ESI/RSI, perform the jump directly from the EFI stub C code.
This will allow the asm entrypoint code to be dropped entirely in
subsequent patches.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/libstub/x86-stub.c | 21 ++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 220be75a5cdc1f4c..d4a730f053bdcbfa 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -803,10 +803,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
 	return EFI_SUCCESS;
 }
 
+static void __noreturn enter_kernel(unsigned long kernel_addr,
+				    struct boot_params *boot_params)
+{
+	/* enter decompressed kernel with boot_params pointer in RSI/ESI */
+	asm("jmp *%0"::"r"(kernel_addr), "S"(boot_params));
+
+	unreachable();
+}
+
 /*
- * On success, we return the address of startup_32, which has potentially been
- * relocated by efi_relocate_kernel.
- * On failure, we exit to the firmware via efi_exit instead of returning.
+ * On success, this routine will jump to the relocated image directly and never
+ * return.  On failure, it will exit to the firmware via efi_exit() instead of
+ * returning.
  */
 asmlinkage unsigned long efi_main(efi_handle_t handle,
 				  efi_system_table_t *sys_table_arg,
@@ -950,7 +959,11 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
 		goto fail;
 	}
 
-	return bzimage_addr;
+	if (IS_ENABLED(CONFIG_X86_64))
+		/* add offset of startup_64() */
+		bzimage_addr += 0x200;
+
+	enter_kernel(bzimage_addr, boot_params);
 fail:
 	efi_err("efi_main() failed!\n");
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 02/20] x86/efistub: Simplify and clean up handover entry code
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 03/20] x86/decompressor: Avoid magic offsets for EFI handover entrypoint Ard Biesheuvel
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Now that the EFI entry code in assembler is only used by the optional
and deprecated EFI handover protocol, and given that the EFI stub C code
no longer returns to it, most of it can simply be dropped.

While at it, clarify the symbol naming, by merging efi_main() and
efi_stub_entry(), making the latter the shared entry point for all
different boot modes that enter via the EFI stub.

The efi32_stub_entry() and efi64_stub_entry() names are referenced
explicitly by the tooling that populates the setup header, so these must
be retained, but can be emitted as aliases of efi_stub_entry() where
appropriate.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 Documentation/arch/x86/boot.rst         |  2 +-
 arch/x86/boot/compressed/efi_mixed.S    | 22 +++++++++++---------
 arch/x86/boot/compressed/head_32.S      | 11 ----------
 arch/x86/boot/compressed/head_64.S      | 12 ++---------
 drivers/firmware/efi/libstub/x86-stub.c | 20 ++++++++++++++----
 5 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
index 33520ecdb37abfda..cdbca15a4fc23833 100644
--- a/Documentation/arch/x86/boot.rst
+++ b/Documentation/arch/x86/boot.rst
@@ -1417,7 +1417,7 @@ execution context provided by the EFI firmware.
 
 The function prototype for the handover entry point looks like this::
 
-    efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp)
+    efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp)
 
 'handle' is the EFI image handle passed to the boot loader by the EFI
 firmware, 'table' is the EFI system table - these are the first two
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 4ca70bf93dc0bdcd..dcc562c8f7f35162 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -26,8 +26,8 @@
  * When booting in 64-bit mode on 32-bit EFI firmware, startup_64_mixed_mode()
  * is the first thing that runs after switching to long mode. Depending on
  * whether the EFI handover protocol or the compat entry point was used to
- * enter the kernel, it will either branch to the 64-bit EFI handover
- * entrypoint at offset 0x390 in the image, or to the 64-bit EFI PE/COFF
+ * enter the kernel, it will either branch to the common 64-bit EFI stub
+ * entrypoint efi_stub_entry() directly, or via the 64-bit EFI PE/COFF
  * entrypoint efi_pe_entry(). In the former case, the bootloader must provide a
  * struct bootparams pointer as the third argument, so the presence of such a
  * pointer is used to disambiguate.
@@ -37,21 +37,23 @@
  *  | efi32_pe_entry   |---->|            |            |       +-----------+--+
  *  +------------------+     |            |     +------+----------------+  |
  *                           | startup_32 |---->| startup_64_mixed_mode |  |
- *  +------------------+     |            |     +------+----------------+  V
- *  | efi32_stub_entry |---->|            |            |     +------------------+
- *  +------------------+     +------------+            +---->| efi64_stub_entry |
- *                                                           +-------------+----+
- *                           +------------+     +----------+               |
- *                           | startup_64 |<----| efi_main |<--------------+
- *                           +------------+     +----------+
+ *  +------------------+     |            |     +------+----------------+  |
+ *  | efi32_stub_entry |---->|            |            |                   |
+ *  +------------------+     +------------+            |                   |
+ *                                                     V                   |
+ *                           +------------+     +----------------+         |
+ *                           | startup_64 |<----| efi_stub_entry |<--------+
+ *                           +------------+     +----------------+
  */
 SYM_FUNC_START(startup_64_mixed_mode)
 	lea	efi32_boot_args(%rip), %rdx
 	mov	0(%rdx), %edi
 	mov	4(%rdx), %esi
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
 	mov	8(%rdx), %edx		// saved bootparams pointer
 	test	%edx, %edx
-	jnz	efi64_stub_entry
+	jnz	efi_stub_entry
+#endif
 	/*
 	 * efi_pe_entry uses MS calling convention, which requires 32 bytes of
 	 * shadow space on the stack even if all arguments are passed in
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 987ae727cf9f0d04..8876ffe30e9a4819 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -150,17 +150,6 @@ SYM_FUNC_START(startup_32)
 	jmp	*%eax
 SYM_FUNC_END(startup_32)
 
-#ifdef CONFIG_EFI_STUB
-SYM_FUNC_START(efi32_stub_entry)
-	add	$0x4, %esp
-	movl	8(%esp), %esi	/* save boot_params pointer */
-	call	efi_main
-	/* efi_main returns the possibly relocated address of startup_32 */
-	jmp	*%eax
-SYM_FUNC_END(efi32_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
-#endif
-
 	.text
 SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 03c4328a88cbd5d0..71c1f40a7ac067b9 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -523,19 +523,11 @@ trampoline_return:
 	jmp	*%rax
 SYM_CODE_END(startup_64)
 
-#ifdef CONFIG_EFI_STUB
-#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
 	.org 0x390
-#endif
 SYM_FUNC_START(efi64_stub_entry)
-	and	$~0xf, %rsp			/* realign the stack */
-	movq	%rdx, %rbx			/* save boot_params pointer */
-	call	efi_main
-	movq	%rbx,%rsi
-	leaq	rva(startup_64)(%rax), %rax
-	jmp	*%rax
+	jmp	efi_stub_entry
 SYM_FUNC_END(efi64_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
 #endif
 
 	.text
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index d4a730f053bdcbfa..4e01278c3f3d55f3 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -817,9 +817,9 @@ static void __noreturn enter_kernel(unsigned long kernel_addr,
  * return.  On failure, it will exit to the firmware via efi_exit() instead of
  * returning.
  */
-asmlinkage unsigned long efi_main(efi_handle_t handle,
-				  efi_system_table_t *sys_table_arg,
-				  struct boot_params *boot_params)
+void __noreturn efi_stub_entry(efi_handle_t handle,
+			       efi_system_table_t *sys_table_arg,
+			       struct boot_params *boot_params)
 {
 	unsigned long bzimage_addr = (unsigned long)startup_32;
 	unsigned long buffer_start, buffer_end;
@@ -965,7 +965,19 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
 
 	enter_kernel(bzimage_addr, boot_params);
 fail:
-	efi_err("efi_main() failed!\n");
+	efi_err("efi_stub_entry() failed!\n");
 
 	efi_exit(handle, status);
 }
+
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#ifndef CONFIG_EFI_MIXED
+extern __alias(efi_stub_entry)
+void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+		      struct boot_params *boot_params);
+
+extern __alias(efi_stub_entry)
+void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+		      struct boot_params *boot_params);
+#endif
+#endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 03/20] x86/decompressor: Avoid magic offsets for EFI handover entrypoint
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 02/20] x86/efistub: Simplify and clean up handover entry code Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 04/20] x86/efistub: Clear BSS in EFI handover protocol entrypoint Ard Biesheuvel
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

The native 32-bit or 64-bit EFI handover protocol entrypoint offset
relative to the respective startup_32/64 address is described in
boot_params as handover_offset, so that the special Linux/x86 aware EFI
loader can find it there.

When mixed mode is enabled, this single field has to describe this
offset for both the 32-bit and 64-bit entrypoints, so their respective
relative offsets have to be identical. Given that startup_32 and
startup_64 are 0x200 bytes apart, and the EFI handover entrypoint
resides at a fixed offset, the 32-bit and 64-bit versions of those
entrypoints must be exactly 0x200 bytes apart as well.

Currently, hard-coded fixed offsets are used to ensure this, but it is
sufficient to emit the 64-bit entrypoint 0x200 bytes after the 32-bit
one, wherever it happens to reside. This allows this code (which is now
EFI mixed mode specific) to be moved into efi_mixed.S and out of the
startup code in head_64.S.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/efi_mixed.S | 20 +++++++++++++++++++-
 arch/x86/boot/compressed/head_64.S   | 18 ------------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index dcc562c8f7f35162..9308b595f6f0a5de 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -140,6 +140,16 @@ SYM_FUNC_START(__efi64_thunk)
 SYM_FUNC_END(__efi64_thunk)
 
 	.code32
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+SYM_FUNC_START(efi32_stub_entry)
+	add	$0x4, %esp		/* Discard return address */
+	popl	%ecx
+	popl	%edx
+	popl	%esi
+	jmp	efi32_entry
+SYM_FUNC_END(efi32_stub_entry)
+#endif
+
 /*
  * EFI service pointer must be in %edi.
  *
@@ -220,7 +230,7 @@ SYM_FUNC_END(efi_enter32)
  * stub may still exit and return to the firmware using the Exit() EFI boot
  * service.]
  */
-SYM_FUNC_START(efi32_entry)
+SYM_FUNC_START_LOCAL(efi32_entry)
 	call	1f
 1:	pop	%ebx
 
@@ -320,6 +330,14 @@ SYM_FUNC_START(efi32_pe_entry)
 	RET
 SYM_FUNC_END(efi32_pe_entry)
 
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+	.org	efi32_stub_entry + 0x200
+	.code64
+SYM_FUNC_START_NOALIGN(efi64_stub_entry)
+	jmp	efi_stub_entry
+SYM_FUNC_END(efi64_stub_entry)
+#endif
+
 	.section ".rodata"
 	/* EFI loaded image protocol GUID */
 	.balign 4
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 71c1f40a7ac067b9..9f90661744741210 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -294,17 +294,6 @@ SYM_FUNC_START(startup_32)
 	lret
 SYM_FUNC_END(startup_32)
 
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
-	.org 0x190
-SYM_FUNC_START(efi32_stub_entry)
-	add	$0x4, %esp		/* Discard return address */
-	popl	%ecx
-	popl	%edx
-	popl	%esi
-	jmp	efi32_entry
-SYM_FUNC_END(efi32_stub_entry)
-#endif
-
 	.code64
 	.org 0x200
 SYM_CODE_START(startup_64)
@@ -523,13 +512,6 @@ trampoline_return:
 	jmp	*%rax
 SYM_CODE_END(startup_64)
 
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
-	.org 0x390
-SYM_FUNC_START(efi64_stub_entry)
-	jmp	efi_stub_entry
-SYM_FUNC_END(efi64_stub_entry)
-#endif
-
 	.text
 SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 04/20] x86/efistub: Clear BSS in EFI handover protocol entrypoint
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 03/20] x86/decompressor: Avoid magic offsets for EFI handover entrypoint Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT Ard Biesheuvel
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

The so-called EFI handover protocol is value-add from the distros that
permits a loader to simply copy a PE kernel image into memory and call
an alternative entrypoint that is described by an embedded boot_params
structure.

Most implementations of this protocol do not bother to check the PE
header for minimum alignment, section placement, etc, and therefore also
don't clear the image's BSS, or even allocate enough memory for it.

Allocating more memory on the fly is rather difficult, but at least
clear the BSS region explicitly when entering in this manner, so that
the EFI stub code does not get confused by global variables that were
not zero-initialized correctly.

When booting in mixed mode, this BSS clearing must occur before any
global state is created, so clear it in the 32-bit asm entry point.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/efi_mixed.S    | 14 +++++++++++++-
 drivers/firmware/efi/libstub/x86-stub.c | 13 +++++++++++--
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 9308b595f6f0a5de..8a02a151806df14c 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -142,6 +142,18 @@ SYM_FUNC_END(__efi64_thunk)
 	.code32
 #ifdef CONFIG_EFI_HANDOVER_PROTOCOL
 SYM_FUNC_START(efi32_stub_entry)
+	call	1f
+1:	popl	%ecx
+
+	/* Clear BSS */
+	xorl	%eax, %eax
+	leal	(_bss - 1b)(%ecx), %edi
+	leal	(_ebss - 1b)(%ecx), %ecx
+	subl	%edi, %ecx
+	shrl	$2, %ecx
+	cld
+	rep	stosl
+
 	add	$0x4, %esp		/* Discard return address */
 	popl	%ecx
 	popl	%edx
@@ -334,7 +346,7 @@ SYM_FUNC_END(efi32_pe_entry)
 	.org	efi32_stub_entry + 0x200
 	.code64
 SYM_FUNC_START_NOALIGN(efi64_stub_entry)
-	jmp	efi_stub_entry
+	jmp	efi_handover_entry
 SYM_FUNC_END(efi64_stub_entry)
 #endif
 
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 4e01278c3f3d55f3..517cd68ea86cb7f4 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -971,12 +971,21 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 }
 
 #ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+void efi_handover_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+			struct boot_params *boot_params)
+{
+	extern char _bss[], _ebss[];
+
+	memset(_bss, 0, _ebss - _bss);
+	efi_stub_entry(handle, sys_table_arg, boot_params);
+}
+
 #ifndef CONFIG_EFI_MIXED
-extern __alias(efi_stub_entry)
+extern __alias(efi_handover_entry)
 void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
 		      struct boot_params *boot_params);
 
-extern __alias(efi_stub_entry)
+extern __alias(efi_handover_entry)
 void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
 		      struct boot_params *boot_params);
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 04/20] x86/efistub: Clear BSS in EFI handover protocol entrypoint Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-21 11:08   ` Borislav Petkov
  2023-06-07  7:23 ` [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register Ard Biesheuvel
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

The 32-bit decompressor does not actually use a global offset table
(GOT), but as is common for 32-bit position independent code, it uses
the magic symbol _GLOBAL_OFFSET_TABLE_ as an anchor from which to derive
the actual runtime addresses of other symbols, using special @GOTOFF
symbol references that are resolved at link time, and populated with the
distance between the address of the magic _GLOBAL_OFFSET_TABLE_ anchor
and the address of the symbol in question.

This means _GLOBAL_OFFSET_TABLE_ is the only symbol whose actual runtime
address needs to be determined explicitly, which is one of the first
things that happens in startup_32. However, it does so by taking the
absolute address via the immediate field of an ADD instruction (plus a
small offset), which seems to defeat the point.

Fortunately, the assembler knows that _GLOBAL_OFFSET_TABLE_ is magic,
and emits a special relative relocation instead, and so the resulting
code works as expected. However, this is not obvious for someone reading
the code, and the use of LEA with an explicit relative addend is more
idiomatic so use that instead.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_32.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8876ffe30e9a4819..3530465b5b85ccf3 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -58,7 +58,7 @@ SYM_FUNC_START(startup_32)
 	leal	(BP_scratch+4)(%esi), %esp
 	call	1f
 1:	popl	%edx
-	addl	$_GLOBAL_OFFSET_TABLE_+(.-1b), %edx
+	leal	(_GLOBAL_OFFSET_TABLE_ - 1b)(%edx), %edx
 
 	/* Load new GDT */
 	leal	gdt@GOTOFF(%edx), %eax
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (4 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-07-10  9:06   ` Borislav Petkov
  2023-06-07  7:23 ` [PATCH v5 07/20] x86/decompressor: Call trampoline as a normal function Ard Biesheuvel
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Instead of pushing and popping %RSI several times to preserve the struct
boot_params pointer across the execution of the startup code, move it
into a callee save register before the first call into C, and copy it
back when needed.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S | 34 +++++++-------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 9f90661744741210..2d1b0ee94929f7ec 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -405,10 +405,14 @@ SYM_CODE_START(startup_64)
 	lretq
 
 .Lon_kernel_cs:
+	/*
+	 * RSI holds a pointer to a boot_params structure provided by the
+	 * loader, and this needs to be preserved across C function calls. So
+	 * move it into a callee saved register.
+	 */
+	movq	%rsi, %r15
 
-	pushq	%rsi
 	call	load_stage1_idt
-	popq	%rsi
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 	/*
@@ -421,10 +425,8 @@ SYM_CODE_START(startup_64)
 	 * detection/setup to ensure that has been done in advance of any dependent
 	 * code.
 	 */
-	pushq	%rsi
-	movq	%rsi, %rdi		/* real mode address */
+	movq	%r15, %rdi		/* pass struct boot_params pointer */
 	call	sev_enable
-	popq	%rsi
 #endif
 
 	/*
@@ -437,13 +439,9 @@ SYM_CODE_START(startup_64)
 	 *   - Non zero RDX means trampoline needs to enable 5-level
 	 *     paging.
 	 *
-	 * RSI holds real mode data and needs to be preserved across
-	 * this function call.
 	 */
-	pushq	%rsi
-	movq	%rsi, %rdi		/* real mode address */
+	movq	%r15, %rdi		/* pass struct boot_params pointer */
 	call	paging_prepare
-	popq	%rsi
 
 	/* Save the trampoline address in RCX */
 	movq	%rax, %rcx
@@ -468,14 +466,9 @@ trampoline_return:
 	 *
 	 * RDI is address of the page table to use instead of page table
 	 * in trampoline memory (if required).
-	 *
-	 * RSI holds real mode data and needs to be preserved across
-	 * this function call.
 	 */
-	pushq	%rsi
 	leaq	rva(top_pgtable)(%rbx), %rdi
 	call	cleanup_trampoline
-	popq	%rsi
 
 	/* Zero EFLAGS */
 	pushq	$0
@@ -485,7 +478,6 @@ trampoline_return:
  * Copy the compressed kernel to the end of our buffer
  * where decompression in place becomes safe.
  */
-	pushq	%rsi
 	leaq	(_bss-8)(%rip), %rsi
 	leaq	rva(_bss-8)(%rbx), %rdi
 	movl	$(_bss - startup_32), %ecx
@@ -493,7 +485,6 @@ trampoline_return:
 	std
 	rep	movsq
 	cld
-	popq	%rsi
 
 	/*
 	 * The GDT may get overwritten either during the copy we just did or
@@ -525,30 +516,27 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	shrq	$3, %rcx
 	rep	stosq
 
-	pushq	%rsi
 	call	load_stage2_idt
 
 	/* Pass boot_params to initialize_identity_maps() */
-	movq	(%rsp), %rdi
+	movq	%r15, %rdi		/* pass struct boot_params pointer */
 	call	initialize_identity_maps
-	popq	%rsi
 
 /*
  * Do the extraction, and jump to the new kernel..
  */
-	pushq	%rsi			/* Save the real mode argument */
-	movq	%rsi, %rdi		/* real mode address */
+	movq	%r15, %rdi		/* pass struct boot_params pointer */
 	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
 	movl	input_len(%rip), %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
 	movl	output_len(%rip), %r9d	/* decompressed length, end of relocs */
 	call	extract_kernel		/* returns kernel entry point in %rax */
-	popq	%rsi
 
 /*
  * Jump to the decompressed kernel.
  */
+	movq	%r15, %rsi
 	jmp	*%rax
 SYM_FUNC_END(.Lrelocated)
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 07/20] x86/decompressor: Call trampoline as a normal function
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (5 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline Ard Biesheuvel
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Move the long return to switch to 32-bit mode into the trampoline code
so it can be called as an ordinary function. This will allow it to be
called directly from C code in a subsequent patch.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S | 25 +++++++++-----------
 arch/x86/boot/compressed/pgtable.h |  2 +-
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2d1b0ee94929f7ec..af45ddd8297a4a07 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -446,18 +446,9 @@ SYM_CODE_START(startup_64)
 	/* Save the trampoline address in RCX */
 	movq	%rax, %rcx
 
-	/*
-	 * Load the address of trampoline_return() into RDI.
-	 * It will be used by the trampoline to return to the main code.
-	 */
-	leaq	trampoline_return(%rip), %rdi
-
-	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
-	pushq	$__KERNEL32_CS
 	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
-	pushq	%rax
-	lretq
-trampoline_return:
+	call	*%rax
+
 	/* Restore the stack, the 32-bit trampoline uses its own stack */
 	leaq	rva(boot_stack_end)(%rbx), %rsp
 
@@ -540,16 +531,22 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	jmp	*%rax
 SYM_FUNC_END(.Lrelocated)
 
-	.code32
 /*
  * This is the 32-bit trampoline that will be copied over to low memory.
  *
- * RDI contains the return address (might be above 4G).
  * ECX contains the base address of the trampoline memory.
  * Non zero RDX means trampoline needs to enable 5-level paging.
  */
 SYM_CODE_START(trampoline_32bit_src)
-	/* Set up data and stack segments */
+	popq	%rdi
+	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
+	pushq	$__KERNEL32_CS
+	leaq	0f(%rip), %rax
+	pushq	%rax
+	lretq
+
+	.code32
+0:	/* Set up data and stack segments */
 	movl	$__KERNEL_DS, %eax
 	movl	%eax, %ds
 	movl	%eax, %ss
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index cc9b2529a08634b4..91dbb99203fbce2d 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,7 +6,7 @@
 #define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
 
 #define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE	0x80
+#define TRAMPOLINE_32BIT_CODE_SIZE	0xA0
 
 #define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (6 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 07/20] x86/decompressor: Call trampoline as a normal function Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07 19:38   ` Yunhong Jiang
  2023-06-07  7:23 ` [PATCH v5 09/20] x86/decompressor: Avoid the need for a stack in the 32-bit trampoline Ard Biesheuvel
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Update the trampoline code so its arguments are passed via RDI and RSI,
which matches the ordinary SysV calling convention for x86_64. This will
allow this code to be called directly from C.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S | 30 +++++++++-----------
 arch/x86/boot/compressed/pgtable.h |  2 +-
 2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index af45ddd8297a4a07..a387cd80964e1a1e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -443,9 +443,9 @@ SYM_CODE_START(startup_64)
 	movq	%r15, %rdi		/* pass struct boot_params pointer */
 	call	paging_prepare
 
-	/* Save the trampoline address in RCX */
-	movq	%rax, %rcx
-
+	/* Pass the trampoline address and boolean flag as args #1 and #2 */
+	movq	%rax, %rdi
+	movq	%rdx, %rsi
 	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
 	call	*%rax
 
@@ -534,11 +534,11 @@ SYM_FUNC_END(.Lrelocated)
 /*
  * This is the 32-bit trampoline that will be copied over to low memory.
  *
- * ECX contains the base address of the trampoline memory.
- * Non zero RDX means trampoline needs to enable 5-level paging.
+ * EDI contains the base address of the trampoline memory.
+ * Non-zero ESI means trampoline needs to enable 5-level paging.
  */
 SYM_CODE_START(trampoline_32bit_src)
-	popq	%rdi
+	popq	%r8
 	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
 	pushq	$__KERNEL32_CS
 	leaq	0f(%rip), %rax
@@ -552,7 +552,7 @@ SYM_CODE_START(trampoline_32bit_src)
 	movl	%eax, %ss
 
 	/* Set up new stack */
-	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
+	leal	TRAMPOLINE_32BIT_STACK_END(%edi), %esp
 
 	/* Disable paging */
 	movl	%cr0, %eax
@@ -560,7 +560,7 @@ SYM_CODE_START(trampoline_32bit_src)
 	movl	%eax, %cr0
 
 	/* Check what paging mode we want to be in after the trampoline */
-	testl	%edx, %edx
+	testl	%esi, %esi
 	jz	1f
 
 	/* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
@@ -575,21 +575,17 @@ SYM_CODE_START(trampoline_32bit_src)
 	jz	3f
 2:
 	/* Point CR3 to the trampoline's new top level page table */
-	leal	TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+	leal	TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
 	movl	%eax, %cr3
 3:
 	/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
-	pushl	%ecx
-	pushl	%edx
 	movl	$MSR_EFER, %ecx
 	rdmsr
 	btsl	$_EFER_LME, %eax
 	/* Avoid writing EFER if no change was made (for TDX guest) */
 	jc	1f
 	wrmsr
-1:	popl	%edx
-	popl	%ecx
-
+1:
 #ifdef CONFIG_X86_MCE
 	/*
 	 * Preserve CR4.MCE if the kernel will enable #MC support.
@@ -606,14 +602,14 @@ SYM_CODE_START(trampoline_32bit_src)
 
 	/* Enable PAE and LA57 (if required) paging modes */
 	orl	$X86_CR4_PAE, %eax
-	testl	%edx, %edx
+	testl	%esi, %esi
 	jz	1f
 	orl	$X86_CR4_LA57, %eax
 1:
 	movl	%eax, %cr4
 
 	/* Calculate address of paging_enabled() once we are executing in the trampoline */
-	leal	.Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
+	leal	.Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%edi), %eax
 
 	/* Prepare the stack for far return to Long Mode */
 	pushl	$__KERNEL_CS
@@ -630,7 +626,7 @@ SYM_CODE_END(trampoline_32bit_src)
 	.code64
 SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
 	/* Return from the trampoline */
-	jmp	*%rdi
+	jmp	*%r8
 SYM_FUNC_END(.Lpaging_enabled)
 
 	/*
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 91dbb99203fbce2d..4e8cef135226bcbb 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -14,7 +14,7 @@
 
 extern unsigned long *trampoline_32bit;
 
-extern void trampoline_32bit_src(void *return_ptr);
+extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
 
 #endif /* __ASSEMBLER__ */
 #endif /* BOOT_COMPRESSED_PAGETABLE_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 09/20] x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (7 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code Ard Biesheuvel
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

The 32-bit trampoline no longer uses the stack for anything except
performing a far return back to long mode. Currently, this stack is
placed in the same page that carries the trampoline code, which means
this page must be mapped writable and executable, and the stack is
therefore executable as well.

Replace the far return with a far jump, so that the return address can
be pre-calculated and patched into the code before it is called. This
removes the need for a stack entirely, and in a later patch, this will
be taken advantage of by removing writable permissions from (and adding
executable permissions to) this code page explicitly when booting via
the EFI stub.

Not touching the stack pointer also makes it more straight-forward to
call the trampoline code as an ordinary 64-bit function from C code.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S    | 48 +++++++++-----------
 arch/x86/boot/compressed/pgtable.h    |  6 +--
 arch/x86/boot/compressed/pgtable_64.c | 12 ++++-
 3 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a387cd80964e1a1e..cdefafd456c70335 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -449,9 +449,6 @@ SYM_CODE_START(startup_64)
 	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
 	call	*%rax
 
-	/* Restore the stack, the 32-bit trampoline uses its own stack */
-	leaq	rva(boot_stack_end)(%rbx), %rsp
-
 	/*
 	 * cleanup_trampoline() would restore trampoline memory.
 	 *
@@ -537,24 +534,22 @@ SYM_FUNC_END(.Lrelocated)
  * EDI contains the base address of the trampoline memory.
  * Non-zero ESI means trampoline needs to enable 5-level paging.
  */
+	.section ".rodata", "a", @progbits
 SYM_CODE_START(trampoline_32bit_src)
-	popq	%r8
 	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
 	pushq	$__KERNEL32_CS
 	leaq	0f(%rip), %rax
 	pushq	%rax
 	lretq
 
+	/*
+	 * The 32-bit code below will do a far jump back to long mode and end
+	 * up here after reconfiguring the number of paging levels.
+	 */
+.Lret:	retq
+
 	.code32
-0:	/* Set up data and stack segments */
-	movl	$__KERNEL_DS, %eax
-	movl	%eax, %ds
-	movl	%eax, %ss
-
-	/* Set up new stack */
-	leal	TRAMPOLINE_32BIT_STACK_END(%edi), %esp
-
-	/* Disable paging */
+0:	/* Disable paging */
 	movl	%cr0, %eax
 	btrl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
@@ -608,26 +603,25 @@ SYM_CODE_START(trampoline_32bit_src)
 1:
 	movl	%eax, %cr4
 
-	/* Calculate address of paging_enabled() once we are executing in the trampoline */
-	leal	.Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%edi), %eax
-
-	/* Prepare the stack for far return to Long Mode */
-	pushl	$__KERNEL_CS
-	pushl	%eax
-
 	/* Enable paging again. */
 	movl	%cr0, %eax
 	btsl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
 
-	lret
+	/*
+	 * Return to the 64-bit calling code using LJMP rather than LRET, to
+	 * avoid the need for a 32-bit addressable stack. The destination
+	 * address will be adjusted after the template code is copied into a
+	 * 32-bit addressable buffer.
+	 */
+.Ljmp:	ljmpl	$__KERNEL_CS, $(.Lret - trampoline_32bit_src)
 SYM_CODE_END(trampoline_32bit_src)
 
-	.code64
-SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
-	/* Return from the trampoline */
-	jmp	*%r8
-SYM_FUNC_END(.Lpaging_enabled)
+/*
+ * This symbol is placed right after trampoline_32bit_src() so its address can
+ * be used to infer the size of the trampoline code.
+ */
+SYM_DATA(trampoline_ljmp_imm_offset, .word  .Ljmp + 1 - trampoline_32bit_src)
 
 	/*
          * The trampoline code has a size limit.
@@ -636,7 +630,7 @@ SYM_FUNC_END(.Lpaging_enabled)
 	 */
 	.org	trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
 
-	.code32
+	.text
 SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
 	/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
 1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 4e8cef135226bcbb..131488f50af55d0a 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,9 +6,7 @@
 #define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
 
 #define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE	0xA0
-
-#define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE	0x80
 
 #ifndef __ASSEMBLER__
 
@@ -16,5 +14,7 @@ extern unsigned long *trampoline_32bit;
 
 extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
 
+extern const u16 trampoline_ljmp_imm_offset;
+
 #endif /* __ASSEMBLER__ */
 #endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111bf8c0..d66639c961b8eeda 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -109,6 +109,7 @@ static unsigned long find_trampoline_placement(void)
 struct paging_config paging_prepare(void *rmode)
 {
 	struct paging_config paging_config = {};
+	void *tramp_code;
 
 	/* Initialize boot_params. Required for cmdline_find_option_bool(). */
 	boot_params = rmode;
@@ -143,9 +144,18 @@ struct paging_config paging_prepare(void *rmode)
 	memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
 
 	/* Copy trampoline code in place */
-	memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+	tramp_code = memcpy(trampoline_32bit +
+			TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
 			&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
 
+	/*
+	 * Avoid the need for a stack in the 32-bit trampoline code, by using
+	 * LJMP rather than LRET to return back to long mode. LJMP takes an
+	 * immediate absolute address, which needs to be adjusted based on the
+	 * placement of the trampoline.
+	 */
+	*(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+
 	/*
 	 * The code below prepares page table in trampoline memory.
 	 *
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (8 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 09/20] x86/decompressor: Avoid the need for a stack in the 32-bit trampoline Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07 18:09   ` Yunhong Jiang
  2023-06-07  7:23 ` [PATCH v5 11/20] x86/decompressor: Only call the trampoline when changing paging levels Ard Biesheuvel
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Instead of returning to the asm calling code to invoke the trampoline,
call it straight from the C code that sets the scene. That way, the
struct return type is no longer needed for returning two values, and the
call can be made conditional more cleanly in a subsequent patch.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S    | 20 +++-----------
 arch/x86/boot/compressed/pgtable_64.c | 28 ++++++++------------
 2 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index cdefafd456c70335..3d4da7e5270c8d4d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -430,24 +430,12 @@ SYM_CODE_START(startup_64)
 #endif
 
 	/*
-	 * paging_prepare() sets up the trampoline and checks if we need to
-	 * enable 5-level paging.
-	 *
-	 * paging_prepare() returns a two-quadword structure which lands
-	 * into RDX:RAX:
-	 *   - Address of the trampoline is returned in RAX.
-	 *   - Non zero RDX means trampoline needs to enable 5-level
-	 *     paging.
-	 *
+	 * set_paging_levels() updates the number of paging levels using a
+	 * trampoline in 32-bit addressable memory if the current number does
+	 * not match the desired number.
 	 */
 	movq	%r15, %rdi		/* pass struct boot_params pointer */
-	call	paging_prepare
-
-	/* Pass the trampoline address and boolean flag as args #1 and #2 */
-	movq	%rax, %rdi
-	movq	%rdx, %rsi
-	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
-	call	*%rax
+	call	set_paging_levels
 
 	/*
 	 * cleanup_trampoline() would restore trampoline memory.
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index d66639c961b8eeda..1d28ad95ea839531 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -16,11 +16,6 @@ unsigned int __section(".data") pgdir_shift = 39;
 unsigned int __section(".data") ptrs_per_p4d = 1;
 #endif
 
-struct paging_config {
-	unsigned long trampoline_start;
-	unsigned long l5_required;
-};
-
 /* Buffer to preserve trampoline memory */
 static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
 
@@ -106,10 +101,10 @@ static unsigned long find_trampoline_placement(void)
 	return bios_start - TRAMPOLINE_32BIT_SIZE;
 }
 
-struct paging_config paging_prepare(void *rmode)
+asmlinkage void set_paging_levels(void *rmode)
 {
-	struct paging_config paging_config = {};
-	void *tramp_code;
+	void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+	bool l5_required = false;
 
 	/* Initialize boot_params. Required for cmdline_find_option_bool(). */
 	boot_params = rmode;
@@ -130,12 +125,10 @@ struct paging_config paging_prepare(void *rmode)
 			!cmdline_find_option_bool("no5lvl") &&
 			native_cpuid_eax(0) >= 7 &&
 			(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
-		paging_config.l5_required = 1;
+		l5_required = true;
 	}
 
-	paging_config.trampoline_start = find_trampoline_placement();
-
-	trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
+	trampoline_32bit = (unsigned long *)find_trampoline_placement();
 
 	/* Preserve trampoline memory */
 	memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
@@ -144,7 +137,7 @@ struct paging_config paging_prepare(void *rmode)
 	memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
 
 	/* Copy trampoline code in place */
-	tramp_code = memcpy(trampoline_32bit +
+	toggle_la57 = memcpy(trampoline_32bit +
 			TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
 			&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
 
@@ -154,7 +147,8 @@ struct paging_config paging_prepare(void *rmode)
 	 * immediate absolute address, which needs to be adjusted based on the
 	 * placement of the trampoline.
 	 */
-	*(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+	*(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
+						(unsigned long)toggle_la57;
 
 	/*
 	 * The code below prepares page table in trampoline memory.
@@ -170,10 +164,10 @@ struct paging_config paging_prepare(void *rmode)
 	 * We are not going to use the page table in trampoline memory if we
 	 * are already in the desired paging mode.
 	 */
-	if (paging_config.l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+	if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
 		goto out;
 
-	if (paging_config.l5_required) {
+	if (l5_required) {
 		/*
 		 * For 4- to 5-level paging transition, set up current CR3 as
 		 * the first and the only entry in a new top-level page table.
@@ -196,7 +190,7 @@ struct paging_config paging_prepare(void *rmode)
 	}
 
 out:
-	return paging_config;
+	toggle_la57(trampoline_32bit, l5_required);
 }
 
 void cleanup_trampoline(void *pgtable)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 11/20] x86/decompressor: Only call the trampoline when changing paging levels
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (9 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 12/20] x86/decompressor: Merge trampoline cleanup with switching code Ard Biesheuvel
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Since the current and desired number of paging levels are known when the
trampoline is being prepared, avoid calling the trampoline at all if it
is clear that calling it is not going to result in a change to the
number of paging levels. Given that the CPU is already running in long
mode, the PAE and LA57 settings are necessarily consistent with the
currently active page tables - the only difference is that CR4.MCE will
always be preserved in this case, but it will be cleared by the real
kernel startup code if CONFIG_X86_MCE is not enabled.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S    | 21 +-------------------
 arch/x86/boot/compressed/pgtable_64.c | 18 +++++++----------
 2 files changed, 8 insertions(+), 31 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 3d4da7e5270c8d4d..577173be8ec805cd 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -387,10 +387,6 @@ SYM_CODE_START(startup_64)
 	 * For the trampoline, we need the top page table to reside in lower
 	 * memory as we don't have a way to load 64-bit values into CR3 in
 	 * 32-bit mode.
-	 *
-	 * We go though the trampoline even if we don't have to: if we're
-	 * already in a desired paging mode. This way the trampoline code gets
-	 * tested on every boot.
 	 */
 
 	/* Make sure we have GDT with 32-bit code segment */
@@ -542,25 +538,10 @@ SYM_CODE_START(trampoline_32bit_src)
 	btrl	$X86_CR0_PG_BIT, %eax
 	movl	%eax, %cr0
 
-	/* Check what paging mode we want to be in after the trampoline */
-	testl	%esi, %esi
-	jz	1f
-
-	/* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
-	movl	%cr4, %eax
-	testl	$X86_CR4_LA57, %eax
-	jnz	3f
-	jmp	2f
-1:
-	/* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
-	movl	%cr4, %eax
-	testl	$X86_CR4_LA57, %eax
-	jz	3f
-2:
 	/* Point CR3 to the trampoline's new top level page table */
 	leal	TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
 	movl	%eax, %cr3
-3:
+
 	/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
 	movl	$MSR_EFER, %ecx
 	rdmsr
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 1d28ad95ea839531..5b15d823e7010650 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -128,6 +128,13 @@ asmlinkage void set_paging_levels(void *rmode)
 		l5_required = true;
 	}
 
+	/*
+	 * The trampoline will not be used if the paging mode is already set to
+	 * the desired one.
+	 */
+	if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+		return;
+
 	trampoline_32bit = (unsigned long *)find_trampoline_placement();
 
 	/* Preserve trampoline memory */
@@ -155,18 +162,8 @@ asmlinkage void set_paging_levels(void *rmode)
 	 *
 	 * The new page table will be used by trampoline code for switching
 	 * from 4- to 5-level paging or vice versa.
-	 *
-	 * If switching is not required, the page table is unused: trampoline
-	 * code wouldn't touch CR3.
 	 */
 
-	/*
-	 * We are not going to use the page table in trampoline memory if we
-	 * are already in the desired paging mode.
-	 */
-	if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
-		goto out;
-
 	if (l5_required) {
 		/*
 		 * For 4- to 5-level paging transition, set up current CR3 as
@@ -189,7 +186,6 @@ asmlinkage void set_paging_levels(void *rmode)
 		       (void *)src, PAGE_SIZE);
 	}
 
-out:
 	toggle_la57(trampoline_32bit, l5_required);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 12/20] x86/decompressor: Merge trampoline cleanup with switching code
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (10 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 11/20] x86/decompressor: Only call the trampoline when changing paging levels Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub Ard Biesheuvel
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Now that the trampoline setup code and the actual invocation of it are
all done from the C routine, the trampoline cleanup can be merged into
it as well, instead of returning to asm just to call another C function.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_64.S    | 13 +++------
 arch/x86/boot/compressed/pgtable_64.c | 28 ++++++++------------
 2 files changed, 15 insertions(+), 26 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 577173be8ec805cd..408c7824b647ff51 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -429,19 +429,14 @@ SYM_CODE_START(startup_64)
 	 * set_paging_levels() updates the number of paging levels using a
 	 * trampoline in 32-bit addressable memory if the current number does
 	 * not match the desired number.
+	 *
+	 * RSI is the relocated address of the page table to use instead of
+	 * page table in trampoline memory (if required).
 	 */
 	movq	%r15, %rdi		/* pass struct boot_params pointer */
+	leaq	rva(top_pgtable)(%rbx), %rsi
 	call	set_paging_levels
 
-	/*
-	 * cleanup_trampoline() would restore trampoline memory.
-	 *
-	 * RDI is address of the page table to use instead of page table
-	 * in trampoline memory (if required).
-	 */
-	leaq	rva(top_pgtable)(%rbx), %rdi
-	call	cleanup_trampoline
-
 	/* Zero EFLAGS */
 	pushq	$0
 	popfq
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 5b15d823e7010650..b9a8aa1ff6d7515c 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -101,9 +101,10 @@ static unsigned long find_trampoline_placement(void)
 	return bios_start - TRAMPOLINE_32BIT_SIZE;
 }
 
-asmlinkage void set_paging_levels(void *rmode)
+asmlinkage void set_paging_levels(void *rmode, void *pgtable)
 {
 	void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+	void *trampoline_pgtable;
 	bool l5_required = false;
 
 	/* Initialize boot_params. Required for cmdline_find_option_bool(). */
@@ -133,7 +134,7 @@ asmlinkage void set_paging_levels(void *rmode)
 	 * the desired one.
 	 */
 	if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
-		return;
+		goto out;
 
 	trampoline_32bit = (unsigned long *)find_trampoline_placement();
 
@@ -163,6 +164,8 @@ asmlinkage void set_paging_levels(void *rmode)
 	 * The new page table will be used by trampoline code for switching
 	 * from 4- to 5-level paging or vice versa.
 	 */
+	trampoline_pgtable = trampoline_32bit +
+			     TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long);
 
 	if (l5_required) {
 		/*
@@ -182,31 +185,21 @@ asmlinkage void set_paging_levels(void *rmode)
 		 * may be above 4G.
 		 */
 		src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
-		memcpy(trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
-		       (void *)src, PAGE_SIZE);
+		memcpy(trampoline_pgtable, (void *)src, PAGE_SIZE);
 	}
 
 	toggle_la57(trampoline_32bit, l5_required);
-}
-
-void cleanup_trampoline(void *pgtable)
-{
-	void *trampoline_pgtable;
-
-	trampoline_pgtable = trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long);
 
 	/*
-	 * Move the top level page table out of trampoline memory,
-	 * if it's there.
+	 * Move the top level page table out of trampoline memory.
 	 */
-	if ((void *)__native_read_cr3() == trampoline_pgtable) {
-		memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
-		native_write_cr3((unsigned long)pgtable);
-	}
+	memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
+	native_write_cr3((unsigned long)pgtable);
 
 	/* Restore trampoline memory */
 	memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE);
 
+out:
 	/* Initialize variables for 5-level paging */
 #ifdef CONFIG_X86_5LEVEL
 	if (__read_cr4() & X86_CR4_LA57) {
@@ -215,4 +208,5 @@ void cleanup_trampoline(void *pgtable)
 		ptrs_per_p4d = 512;
 	}
 #endif
+	return;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (11 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 12/20] x86/decompressor: Merge trampoline cleanup with switching code Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07 20:19   ` Yunhong Jiang
  2023-06-07  7:23 ` [PATCH v5 14/20] x86/efistub: Prefer EFI memory attributes protocol over DXE services Ard Biesheuvel
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

In preparation for updating the EFI stub boot flow to avoid the bare
metal decompressor code altogether, implement the support code for
switching between 4 and 5 levels of paging before jumping to the kernel
proper.

This reuses the newly refactored trampoline that the bare metal
decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
memory and remap it with the appropriate permissions. Given that the
bare metal decompressor will no longer call into the trampoline if the
number of paging levels is already set correctly, it is no longer needed
to remove NX restrictions from the memory range where this trampoline
may end up.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/libstub/Makefile          |  1 +
 drivers/firmware/efi/libstub/efi-stub-helper.c |  2 +
 drivers/firmware/efi/libstub/efistub.h         |  1 +
 drivers/firmware/efi/libstub/x86-5lvl.c        | 95 ++++++++++++++++++++
 drivers/firmware/efi/libstub/x86-stub.c        | 40 +++------
 drivers/firmware/efi/libstub/x86-stub.h        | 17 ++++
 6 files changed, 130 insertions(+), 26 deletions(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 16d64a34d1e19465..ae8874401a9f1490 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)	+= efi-stub.o string.o intrinsics.o systable.o \
 lib-$(CONFIG_ARM)		+= arm32-stub.o
 lib-$(CONFIG_ARM64)		+= arm64.o arm64-stub.o smbios.o
 lib-$(CONFIG_X86)		+= x86-stub.o
+lib-$(CONFIG_X86_64)		+= x86-5lvl.o
 lib-$(CONFIG_RISCV)		+= riscv.o riscv-stub.o
 lib-$(CONFIG_LOONGARCH)		+= loongarch.o loongarch-stub.o
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 1e0203d74691ffcc..51779279fbff21b5 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
 			efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
 		} else if (!strcmp(param, "noinitrd")) {
 			efi_noinitrd = true;
+		} else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
+			efi_no5lvl = true;
 		} else if (!strcmp(param, "efi") && val) {
 			efi_nochunk = parse_option_str(val, "nochunk");
 			efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 6aa38a1bf1265d83..06b7abc92ced9e18 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -33,6 +33,7 @@
 #define EFI_ALLOC_LIMIT		ULONG_MAX
 #endif
 
+extern bool efi_no5lvl;
 extern bool efi_nochunk;
 extern bool efi_nokaslr;
 extern int efi_loglevel;
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
new file mode 100644
index 0000000000000000..2428578a3ae08be7
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/efi.h>
+
+#include <asm/boot.h>
+#include <asm/desc.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+#include "x86-stub.h"
+
+bool efi_no5lvl;
+
+static void (*la57_toggle)(void *trampoline, bool enable_5lvl);
+
+static const struct desc_struct gdt[] = {
+	[GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+	[GDT_ENTRY_KERNEL_CS]   = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+};
+
+/*
+ * Enabling (or disabling) 5 level paging is tricky, because it can only be
+ * done from 32-bit mode with paging disabled. This means not only that the
+ * code itself must be running from 32-bit addressable physical memory, but
+ * also that the root page table must be 32-bit addressable, as programming
+ * a 64-bit value into CR3 when running in 32-bit mode is not supported.
+ */
+efi_status_t efi_setup_5level_paging(void)
+{
+	u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
+	efi_status_t status;
+	u8 *la57_code;
+
+	if (!efi_is_64bit())
+		return EFI_SUCCESS;
+
+	/* check for 5 level paging support */
+	if (native_cpuid_eax(0) < 7 ||
+	    !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+		return EFI_SUCCESS;
+
+	/* allocate some 32-bit addressable memory for code and a page table */
+	status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
+				    U32_MAX);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
+	memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
+
+	/*
+	 * To avoid the need to allocate a 32-bit addressable stack, the
+	 * trampoline uses a LJMP instruction to switch back to long mode.
+	 * LJMP takes an absolute destination address, which needs to be
+	 * fixed up at runtime.
+	 */
+	*(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
+
+	efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
+
+	return EFI_SUCCESS;
+}
+
+void efi_5level_switch(void)
+{
+	bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+	bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+	bool need_toggle = want_la57 ^ have_la57;
+	u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
+	u64 *cr3 = (u64 *)__native_read_cr3();
+	u64 *new_cr3;
+
+	if (!la57_toggle || !need_toggle)
+		return;
+
+	if (!have_la57) {
+		/*
+		 * 5 level paging will be enabled, so a root level page needs
+		 * to be allocated from the 32-bit addressable physical region,
+		 * with its first entry referring to the existing hierarchy.
+		 */
+		new_cr3 = memset(pgt, 0, PAGE_SIZE);
+		new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
+	} else {
+		/* take the new root table pointer from the current entry #0 */
+		new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+
+		/* copy the new root table if it is not 32-bit addressable */
+		if ((u64)new_cr3 > U32_MAX)
+			new_cr3 = memcpy(pgt, new_cr3, PAGE_SIZE);
+	}
+
+	native_load_gdt(&(struct desc_ptr){ sizeof(gdt) - 1, (u64)gdt });
+
+	la57_toggle(new_cr3, want_la57);
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 517cd68ea86cb7f4..f48a2e795d885af8 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -17,6 +17,7 @@
 #include <asm/boot.h>
 
 #include "efistub.h"
+#include "x86-stub.h"
 
 /* Maximum physical address for 64-bit kernel with 4-level paging */
 #define MAXMEM_X86_64_4LEVEL (1ull << 46)
@@ -223,8 +224,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
 	}
 }
 
-static void
-adjust_memory_range_protection(unsigned long start, unsigned long size)
+void efi_adjust_memory_range_protection(unsigned long start,
+					unsigned long size)
 {
 	efi_status_t status;
 	efi_gcd_memory_space_desc_t desc;
@@ -278,35 +279,14 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
 	}
 }
 
-/*
- * Trampoline takes 2 pages and can be loaded in first megabyte of memory
- * with its end placed between 128k and 640k where BIOS might start.
- * (see arch/x86/boot/compressed/pgtable_64.c)
- *
- * We cannot find exact trampoline placement since memory map
- * can be modified by UEFI, and it can alter the computed address.
- */
-
-#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
-#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-
 void startup_32(struct boot_params *boot_params);
 
 static void
 setup_memory_protection(unsigned long image_base, unsigned long image_size)
 {
-	/*
-	 * Allow execution of possible trampoline used
-	 * for switching between 4- and 5-level page tables
-	 * and relocated kernel image.
-	 */
-
-	adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
-				       TRAMPOLINE_PLACEMENT_SIZE);
-
 #ifdef CONFIG_64BIT
 	if (image_base != (unsigned long)startup_32)
-		adjust_memory_range_protection(image_base, image_size);
+		efi_adjust_memory_range_protection(image_base, image_size);
 #else
 	/*
 	 * Clear protection flags on a whole range of possible
@@ -316,8 +296,8 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
 	 * need to remove possible protection on relocated image
 	 * itself disregarding further relocations.
 	 */
-	adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
-				       KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
+	efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
+					   KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
 #endif
 }
 
@@ -839,6 +819,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 		efi_dxe_table = NULL;
 	}
 
+	status = efi_setup_5level_paging();
+	if (status != EFI_SUCCESS) {
+		efi_err("efi_setup_5level_paging() failed!\n");
+		goto fail;
+	}
+
 	/*
 	 * If the kernel isn't already loaded at a suitable address,
 	 * relocate it.
@@ -959,6 +945,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 		goto fail;
 	}
 
+	efi_5level_switch();
+
 	if (IS_ENABLED(CONFIG_X86_64))
 		/* add offset of startup_64() */
 		bzimage_addr += 0x200;
diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
new file mode 100644
index 0000000000000000..37c5a36b9d8cf9b2
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-stub.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/efi.h>
+
+extern void trampoline_32bit_src(void *, bool);
+extern const u16 trampoline_ljmp_imm_offset;
+
+void efi_adjust_memory_range_protection(unsigned long start,
+					unsigned long size);
+
+#ifdef CONFIG_X86_64
+efi_status_t efi_setup_5level_paging(void);
+void efi_5level_switch(void);
+#else
+static inline efi_status_t efi_setup_5level_paging(void) { return EFI_SUCCESS; }
+static inline void efi_5level_switch(void) {}
+#endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 14/20] x86/efistub: Prefer EFI memory attributes protocol over DXE services
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (12 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 15/20] decompress: Use 8 byte alignment Ard Biesheuvel
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Currently, the EFI stub relies on DXE services in some cases to clear
non-execute restrictions from page allocations that need to be
executable. This is dodgy, because DXE services are not specified by
UEFI but by PI, and they are not intended for consumption by OS loaders.
However, no alternative existed at the time.

Now, there is a new UEFI protocol that should be used instead, so if it
exists, prefer it over the DXE services calls.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/libstub/x86-stub.c | 29 ++++++++++++++------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f48a2e795d885af8..abcd5703e9f3f980 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -26,6 +26,7 @@ const efi_system_table_t *efi_system_table;
 const efi_dxe_services_table_t *efi_dxe_table;
 u32 image_offset __section(".data");
 static efi_loaded_image_t *image = NULL;
+static efi_memory_attribute_protocol_t *memattr;
 
 typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t;
 union sev_memory_acceptance_protocol {
@@ -233,12 +234,18 @@ void efi_adjust_memory_range_protection(unsigned long start,
 	unsigned long rounded_start, rounded_end;
 	unsigned long unprotect_start, unprotect_size;
 
-	if (efi_dxe_table == NULL)
-		return;
-
 	rounded_start = rounddown(start, EFI_PAGE_SIZE);
 	rounded_end = roundup(start + size, EFI_PAGE_SIZE);
 
+	if (memattr != NULL) {
+		efi_call_proto(memattr, clear_memory_attributes, rounded_start,
+			       rounded_end - rounded_start, EFI_MEMORY_XP);
+		return;
+	}
+
+	if (efi_dxe_table == NULL)
+		return;
+
 	/*
 	 * Don't modify memory region attributes, they are
 	 * already suitable, to lower the possibility to
@@ -801,6 +808,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 			       efi_system_table_t *sys_table_arg,
 			       struct boot_params *boot_params)
 {
+	efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
 	unsigned long bzimage_addr = (unsigned long)startup_32;
 	unsigned long buffer_start, buffer_end;
 	struct setup_header *hdr = &boot_params->hdr;
@@ -812,13 +820,18 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 	if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
 		efi_exit(handle, EFI_INVALID_PARAMETER);
 
-	efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
-	if (efi_dxe_table &&
-	    efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
-		efi_warn("Ignoring DXE services table: invalid signature\n");
-		efi_dxe_table = NULL;
+	if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+		efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+		if (efi_dxe_table &&
+		    efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+			efi_warn("Ignoring DXE services table: invalid signature\n");
+			efi_dxe_table = NULL;
+		}
 	}
 
+	/* grab the memory attributes protocol if it exists */
+	efi_bs_call(locate_protocol, &guid, NULL, (void **)&memattr);
+
 	status = efi_setup_5level_paging();
 	if (status != EFI_SUCCESS) {
 		efi_err("efi_setup_5level_paging() failed!\n");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 15/20] decompress: Use 8 byte alignment
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (13 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 14/20] x86/efistub: Prefer EFI memory attributes protocol over DXE services Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 16/20] x86/decompressor: Move global symbol references to C code Ard Biesheuvel
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

The ZSTD decompressor requires malloc() allocations to be 8 byte
aligned, so ensure that this the case.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/decompress/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h
index 9192986b1a731323..ac862422df158bef 100644
--- a/include/linux/decompress/mm.h
+++ b/include/linux/decompress/mm.h
@@ -48,7 +48,7 @@ MALLOC_VISIBLE void *malloc(int size)
 	if (!malloc_ptr)
 		malloc_ptr = free_mem_ptr;
 
-	malloc_ptr = (malloc_ptr + 3) & ~3;     /* Align */
+	malloc_ptr = (malloc_ptr + 7) & ~7;     /* Align */
 
 	p = (void *)malloc_ptr;
 	malloc_ptr += size;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 16/20] x86/decompressor: Move global symbol references to C code
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (14 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 15/20] decompress: Use 8 byte alignment Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 17/20] x86/decompressor: Factor out kernel decompression and relocation Ard Biesheuvel
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

It is no longer necessary to be cautious when referring to global
variables in the position independent decompressor code, now that it is
built using PIE codegen and makes an assertion in the linker script that
no GOT entries exist (which would require adjustment for the actual
runtime load address of the decompressor binary).

This means global variables can be referenced directly from C code,
instead of having to pass their runtime addresses into C routines from
asm code, which needs to happen at each call site. Do so for the code
that will be called directly from the EFI stub after a subsequent patch,
and avoid the need to duplicate this logic a third time.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/head_32.S |  8 --------
 arch/x86/boot/compressed/head_64.S |  8 +-------
 arch/x86/boot/compressed/misc.c    | 16 +++++++++-------
 3 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 3530465b5b85ccf3..beee858058df4403 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -168,13 +168,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
  */
 	/* push arguments for extract_kernel: */
 
-	pushl	output_len@GOTOFF(%ebx)	/* decompressed length, end of relocs */
 	pushl	%ebp			/* output address */
-	pushl	input_len@GOTOFF(%ebx)	/* input_len */
-	leal	input_data@GOTOFF(%ebx), %eax
-	pushl	%eax			/* input_data */
-	leal	boot_heap@GOTOFF(%ebx), %eax
-	pushl	%eax			/* heap area */
 	pushl	%esi			/* real mode pointer */
 	call	extract_kernel		/* returns kernel entry point in %eax */
 	addl	$24, %esp
@@ -202,8 +196,6 @@ SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
  */
 	.bss
 	.balign 4
-boot_heap:
-	.fill BOOT_HEAP_SIZE, 1, 0
 boot_stack:
 	.fill BOOT_STACK_SIZE, 1, 0
 boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 408c7824b647ff51..556ee504325ae50e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -493,11 +493,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
  * Do the extraction, and jump to the new kernel..
  */
 	movq	%r15, %rdi		/* pass struct boot_params pointer */
-	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
-	leaq	input_data(%rip), %rdx  /* input_data */
-	movl	input_len(%rip), %ecx	/* input_len */
-	movq	%rbp, %r8		/* output target address */
-	movl	output_len(%rip), %r9d	/* decompressed length, end of relocs */
+	movq	%rbp, %rsi		/* output target address */
 	call	extract_kernel		/* returns kernel entry point in %rax */
 
 /*
@@ -639,8 +635,6 @@ SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
  */
 	.bss
 	.balign 4
-SYM_DATA_LOCAL(boot_heap,	.fill BOOT_HEAP_SIZE, 1, 0)
-
 SYM_DATA_START_LOCAL(boot_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
 	.balign 16
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94b7abcf624b3b55..0b29a3183df42246 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,6 +330,11 @@ static size_t parse_elf(void *output)
 	return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
 }
 
+static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
+
+extern unsigned char input_data[];
+extern unsigned int input_len, output_len;
+
 /*
  * The compressed kernel image (ZO), has been moved so that its position
  * is against the end of the buffer used to hold the uncompressed kernel
@@ -347,14 +352,11 @@ static size_t parse_elf(void *output)
  *             |-------uncompressed kernel image---------|
  *
  */
-asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
-				  unsigned char *input_data,
-				  unsigned long input_len,
-				  unsigned char *output,
-				  unsigned long output_len)
+asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
 {
 	const unsigned long kernel_total_size = VO__end - VO__text;
 	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+	memptr heap = (memptr)boot_heap;
 	unsigned long needed_size;
 	size_t entry_offset;
 
@@ -412,7 +414,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	 * entries. This ensures the full mapped area is usable RAM
 	 * and doesn't include any reserved areas.
 	 */
-	needed_size = max(output_len, kernel_total_size);
+	needed_size = max((unsigned long)output_len, kernel_total_size);
 #ifdef CONFIG_X86_64
 	needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
 #endif
@@ -443,7 +445,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 #ifdef CONFIG_X86_64
 	if (heap > 0x3fffffffffffUL)
 		error("Destination address too large");
-	if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
+	if (virt_addr + needed_size > KERNEL_IMAGE_SIZE)
 		error("Destination virtual address is beyond the kernel mapping area");
 #else
 	if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 17/20] x86/decompressor: Factor out kernel decompression and relocation
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (15 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 16/20] x86/decompressor: Move global symbol references to C code Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 18/20] efi/libstub: Add limit argument to efi_random_alloc() Ard Biesheuvel
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Factor out the decompressor sequence that invokes the decompressor,
parses the ELF and applies the relocations so that it can be called
directly from the EFI stub.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/misc.c | 28 ++++++++++++++++----
 arch/x86/include/asm/boot.h     |  8 ++++++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 0b29a3183df42246..93220a2a276b0ffb 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,11 +330,33 @@ static size_t parse_elf(void *output)
 	return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
 }
 
+const unsigned long kernel_total_size = VO__end - VO__text;
+
 static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
 
 extern unsigned char input_data[];
 extern unsigned int input_len, output_len;
 
+unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
+				void (*error)(char *x))
+{
+	unsigned long entry;
+
+	if (!free_mem_ptr) {
+		free_mem_ptr     = (unsigned long)boot_heap;
+		free_mem_end_ptr = (unsigned long)boot_heap + sizeof(boot_heap);
+	}
+
+	if (__decompress(input_data, input_len, NULL, NULL, outbuf, output_len,
+			 NULL, error) < 0)
+		return ULONG_MAX;
+
+	entry = parse_elf(outbuf);
+	handle_relocations(outbuf, output_len, virt_addr);
+
+	return entry;
+}
+
 /*
  * The compressed kernel image (ZO), has been moved so that its position
  * is against the end of the buffer used to hold the uncompressed kernel
@@ -354,7 +376,6 @@ extern unsigned int input_len, output_len;
  */
 asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
 {
-	const unsigned long kernel_total_size = VO__end - VO__text;
 	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
 	memptr heap = (memptr)boot_heap;
 	unsigned long needed_size;
@@ -463,10 +484,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
 		accept_memory(__pa(output), __pa(output) + needed_size);
 	}
 
-	__decompress(input_data, input_len, NULL, NULL, output, output_len,
-			NULL, error);
-	entry_offset = parse_elf(output);
-	handle_relocations(output, output_len, virt_addr);
+	entry_offset = decompress_kernel(output, virt_addr, error);
 
 	debug_putstr("done.\nBooting the kernel (entry_offset: 0x");
 	debug_puthex(entry_offset);
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3160d..4ae14339cb8cc72d 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -62,4 +62,12 @@
 # define BOOT_STACK_SIZE	0x1000
 #endif
 
+#ifndef __ASSEMBLY__
+extern unsigned int output_len;
+extern const unsigned long kernel_total_size;
+
+unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
+				void (*error)(char *x));
+#endif
+
 #endif /* _ASM_X86_BOOT_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 18/20] efi/libstub: Add limit argument to efi_random_alloc()
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (16 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 17/20] x86/decompressor: Factor out kernel decompression and relocation Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware Ard Biesheuvel
  2023-06-07  7:23 ` [PATCH v5 20/20] x86/efistub: Avoid legacy decompressor when doing EFI boot Ard Biesheuvel
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

x86 will need to limit the kernel memory allocation to the lowest 512
MiB of memory, to match the behavior of the existing bare metal KASLR
physical randomization logic. So in preparation for that, add a limit
parameter to efi_random_alloc() and wire it up.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/firmware/efi/libstub/arm64-stub.c  |  2 +-
 drivers/firmware/efi/libstub/efistub.h     |  2 +-
 drivers/firmware/efi/libstub/randomalloc.c | 10 ++++++----
 drivers/firmware/efi/libstub/zboot.c       |  2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 770b8ecb73984c61..8c40fc89f5f99209 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -106,7 +106,7 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
 		 */
 		status = efi_random_alloc(*reserve_size, min_kimg_align,
 					  reserve_addr, phys_seed,
-					  EFI_LOADER_CODE);
+					  EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
 		if (status != EFI_SUCCESS)
 			efi_warn("efi_random_alloc() failed: 0x%lx\n", status);
 	} else {
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 06b7abc92ced9e18..9823f6fb3e01f718 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -956,7 +956,7 @@ efi_status_t efi_get_random_bytes(unsigned long size, u8 *out);
 
 efi_status_t efi_random_alloc(unsigned long size, unsigned long align,
 			      unsigned long *addr, unsigned long random_seed,
-			      int memory_type);
+			      int memory_type, unsigned long alloc_limit);
 
 efi_status_t efi_random_get_seed(void);
 
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index 32c7a54923b4c127..674a064b8f7adc68 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -16,7 +16,8 @@
  */
 static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
 					 unsigned long size,
-					 unsigned long align_shift)
+					 unsigned long align_shift,
+					 u64 alloc_limit)
 {
 	unsigned long align = 1UL << align_shift;
 	u64 first_slot, last_slot, region_end;
@@ -29,7 +30,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
 		return 0;
 
 	region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1,
-			 (u64)EFI_ALLOC_LIMIT);
+			 alloc_limit);
 	if (region_end < size)
 		return 0;
 
@@ -54,7 +55,8 @@ efi_status_t efi_random_alloc(unsigned long size,
 			      unsigned long align,
 			      unsigned long *addr,
 			      unsigned long random_seed,
-			      int memory_type)
+			      int memory_type,
+			      unsigned long alloc_limit)
 {
 	unsigned long total_slots = 0, target_slot;
 	unsigned long total_mirrored_slots = 0;
@@ -76,7 +78,7 @@ efi_status_t efi_random_alloc(unsigned long size,
 		efi_memory_desc_t *md = (void *)map->map + map_offset;
 		unsigned long slots;
 
-		slots = get_entry_num_slots(md, size, ilog2(align));
+		slots = get_entry_num_slots(md, size, ilog2(align), alloc_limit);
 		MD_NUM_SLOTS(md) = slots;
 		total_slots += slots;
 		if (md->attribute & EFI_MEMORY_MORE_RELIABLE)
diff --git a/drivers/firmware/efi/libstub/zboot.c b/drivers/firmware/efi/libstub/zboot.c
index e5d7fa1f1d8fd160..bdb17eac0cb401be 100644
--- a/drivers/firmware/efi/libstub/zboot.c
+++ b/drivers/firmware/efi/libstub/zboot.c
@@ -119,7 +119,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
 		}
 
 		status = efi_random_alloc(alloc_size, min_kimg_align, &image_base,
-					  seed, EFI_LOADER_CODE);
+					  seed, EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
 		if (status != EFI_SUCCESS) {
 			efi_err("Failed to allocate memory\n");
 			goto free_cmdline;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (17 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 18/20] efi/libstub: Add limit argument to efi_random_alloc() Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  2023-06-07 16:07   ` Tom Lendacky
  2023-06-07  7:23 ` [PATCH v5 20/20] x86/efistub: Avoid legacy decompressor when doing EFI boot Ard Biesheuvel
  19 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

Before refactoring the EFI stub boot flow to avoid the legacy bare metal
decompressor, duplicate the SNP feature check in the EFI stub before
handing over to the kernel proper.

The SNP feature check can be performed while running under the EFI boot
services, which means we can fail gracefully and return an error to the
bootloader if the loaded kernel does not implement support for all the
features that the hypervisor enabled.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/sev.c          | 71 +++++++++++---------
 arch/x86/include/asm/sev.h              |  4 ++
 drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
 3 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 09dc8c187b3cc752..9593bc80c9c6b89d 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -367,6 +367,11 @@ static void enforce_vmpl0(void)
  */
 #define SNP_FEATURES_PRESENT (0)
 
+u64 snp_get_unsupported_features(u64 status)
+{
+	return status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+}
+
 void snp_check_features(void)
 {
 	u64 unsupported;
@@ -380,7 +385,7 @@ void snp_check_features(void)
 	 * EXIT_INFO_2 of the GHCB protocol so that those features can be reported
 	 * as part of the guest boot failure.
 	 */
-	unsupported = sev_status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+	unsupported = snp_get_unsupported_features(sev_status);
 	if (unsupported) {
 		if (ghcb_version < 2 || (!boot_ghcb && !early_setup_ghcb()))
 			sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
@@ -390,10 +395,42 @@ void snp_check_features(void)
 	}
 }
 
-void sev_enable(struct boot_params *bp)
+u64 sev_get_status(void)
 {
 	unsigned int eax, ebx, ecx, edx;
 	struct msr m;
+
+	/* Check for the SME/SEV support leaf */
+	eax = 0x80000000;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	if (eax < 0x8000001f)
+		return 0;
+
+	/*
+	 * Check for the SME/SEV feature:
+	 *   CPUID Fn8000_001F[EAX]
+	 *   - Bit 0 - Secure Memory Encryption support
+	 *   - Bit 1 - Secure Encrypted Virtualization support
+	 *   CPUID Fn8000_001F[EBX]
+	 *   - Bits 5:0 - Pagetable bit position used to indicate encryption
+	 */
+	eax = 0x8000001f;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	/* Check whether SEV is supported */
+	if (!(eax & BIT(1)))
+		return 0;
+
+	/* Set the SME mask if this is an SEV guest. */
+	sme_me_mask = BIT_ULL(ebx & 0x3f);
+
+	boot_rdmsr(MSR_AMD64_SEV, &m);
+	return m.q;
+}
+
+void sev_enable(struct boot_params *bp)
+{
 	bool snp;
 
 	/*
@@ -410,37 +447,13 @@ void sev_enable(struct boot_params *bp)
 	 */
 	snp = snp_init(bp);
 
-	/* Check for the SME/SEV support leaf */
-	eax = 0x80000000;
-	ecx = 0;
-	native_cpuid(&eax, &ebx, &ecx, &edx);
-	if (eax < 0x8000001f)
-		return;
-
-	/*
-	 * Check for the SME/SEV feature:
-	 *   CPUID Fn8000_001F[EAX]
-	 *   - Bit 0 - Secure Memory Encryption support
-	 *   - Bit 1 - Secure Encrypted Virtualization support
-	 *   CPUID Fn8000_001F[EBX]
-	 *   - Bits 5:0 - Pagetable bit position used to indicate encryption
-	 */
-	eax = 0x8000001f;
-	ecx = 0;
-	native_cpuid(&eax, &ebx, &ecx, &edx);
-	/* Check whether SEV is supported */
-	if (!(eax & BIT(1))) {
+	sev_status = sev_get_status();
+	if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
 		if (snp)
 			error("SEV-SNP support indicated by CC blob, but not CPUID.");
 		return;
 	}
 
-	/* Set the SME mask if this is an SEV guest. */
-	boot_rdmsr(MSR_AMD64_SEV, &m);
-	sev_status = m.q;
-	if (!(sev_status & MSR_AMD64_SEV_ENABLED))
-		return;
-
 	/* Negotiate the GHCB protocol version. */
 	if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
 		if (!sev_es_negotiate_protocol())
@@ -460,8 +473,6 @@ void sev_enable(struct boot_params *bp)
 
 	if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
 		error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
-
-	sme_me_mask = BIT_ULL(ebx & 0x3f);
 }
 
 /* Search for Confidential Computing blob in the EFI config table. */
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 86e1296e87f513b7..081c39b0e8d0d208 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -207,6 +207,8 @@ bool snp_init(struct boot_params *bp);
 void __init __noreturn snp_abort(void);
 int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
 void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+u64 snp_get_unsupported_features(u64 status);
+u64 sev_get_status(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -232,6 +234,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
 }
 
 static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
+static inline u64 sev_get_status(void) { return 0; }
 #endif
 
 #endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index abcd5703e9f3f980..1015ef883f5850a4 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,6 +15,7 @@
 #include <asm/setup.h>
 #include <asm/desc.h>
 #include <asm/boot.h>
+#include <asm/sev.h>
 
 #include "efistub.h"
 #include "x86-stub.h"
@@ -790,6 +791,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
 	return EFI_SUCCESS;
 }
 
+static bool have_unsupported_snp_features(void)
+{
+	u64 unsupported;
+
+	unsupported = snp_get_unsupported_features(sev_get_status());
+	if (unsupported) {
+		efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
+			unsupported);
+		return true;
+	}
+	return false;
+}
+
 static void __noreturn enter_kernel(unsigned long kernel_addr,
 				    struct boot_params *boot_params)
 {
@@ -820,6 +834,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 	if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
 		efi_exit(handle, EFI_INVALID_PARAMETER);
 
+	if (have_unsupported_snp_features())
+		efi_exit(handle, EFI_UNSUPPORTED);
+
 	if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
 		efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
 		if (efi_dxe_table &&
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v5 20/20] x86/efistub: Avoid legacy decompressor when doing EFI boot
  2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
                   ` (18 preceding siblings ...)
  2023-06-07  7:23 ` [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware Ard Biesheuvel
@ 2023-06-07  7:23 ` Ard Biesheuvel
  19 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07  7:23 UTC (permalink / raw)
  To: linux-efi
  Cc: linux-kernel, Ard Biesheuvel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

The bare metal decompressor code was never really intended to run in a
hosted environment such as the EFI boot services, and does a few things
that are problematic in the context of EFI boot now that the logo
requirements are getting tighter.

In particular, the decompressor moves its own executable image around in
memory, and relies on demand paging to populate the identity mappings,
and these things are difficult to support in a context where memory is
not permitted to be mapped writable and executable at the same time or,
at the very least, is mapped non-executable by default, and needs
special treatment for this restriction to be lifted.

Since EFI already maps all of memory 1:1, it is unnecessary to create
new page tables or handle page faults when decompressing the kernel.
That means there is also no need to replace the special exception
handlers for SEV. Generally, there is little need to do anything that
the decompressor does beyond

- initialize SEV encryption, if needed,
- perform the 4/5 level paging switch, if needed,
- decompress the kernel
- relocate the kernel

So do all of this from the EFI stub code, and avoid the bare metal
decompressor altogether.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/Makefile       |   5 +
 arch/x86/boot/compressed/efi_mixed.S    |  55 -------
 arch/x86/boot/compressed/head_32.S      |  13 --
 arch/x86/boot/compressed/head_64.S      |  27 ----
 arch/x86/include/asm/efi.h              |   7 +-
 arch/x86/include/asm/sev.h              |   2 +
 drivers/firmware/efi/libstub/x86-stub.c | 167 +++++++++-----------
 7 files changed, 84 insertions(+), 192 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index b13a580210867ffb..535608fe72e11265 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -74,6 +74,11 @@ LDFLAGS_vmlinux += -z noexecstack
 ifeq ($(CONFIG_LD_IS_BFD),y)
 LDFLAGS_vmlinux += $(call ld-option,--no-warn-rwx-segments)
 endif
+ifeq ($(CONFIG_EFI_STUB),y)
+# ensure that the static EFI stub library will be pulled in, even if it is
+# never referenced explicitly from the startup code
+LDFLAGS_vmlinux += -u efi_pe_entry
+endif
 LDFLAGS_vmlinux += -T
 
 hostprogs	:= mkpiggy
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 8a02a151806df14c..f4e22ef774ab6b4a 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -269,10 +269,6 @@ SYM_FUNC_START_LOCAL(efi32_entry)
 	jmp	startup_32
 SYM_FUNC_END(efi32_entry)
 
-#define ST32_boottime		60 // offsetof(efi_system_table_32_t, boottime)
-#define BS32_handle_protocol	88 // offsetof(efi_boot_services_32_t, handle_protocol)
-#define LI32_image_base		32 // offsetof(efi_loaded_image_32_t, image_base)
-
 /*
  * efi_status_t efi32_pe_entry(efi_handle_t image_handle,
  *			       efi_system_table_32_t *sys_table)
@@ -280,8 +276,6 @@ SYM_FUNC_END(efi32_entry)
 SYM_FUNC_START(efi32_pe_entry)
 	pushl	%ebp
 	movl	%esp, %ebp
-	pushl	%eax				// dummy push to allocate loaded_image
-
 	pushl	%ebx				// save callee-save registers
 	pushl	%edi
 
@@ -290,48 +284,8 @@ SYM_FUNC_START(efi32_pe_entry)
 	movl	$0x80000003, %eax		// EFI_UNSUPPORTED
 	jnz	2f
 
-	call	1f
-1:	pop	%ebx
-
-	/* Get the loaded image protocol pointer from the image handle */
-	leal	-4(%ebp), %eax
-	pushl	%eax				// &loaded_image
-	leal	(loaded_image_proto - 1b)(%ebx), %eax
-	pushl	%eax				// pass the GUID address
-	pushl	8(%ebp)				// pass the image handle
-
-	/*
-	 * Note the alignment of the stack frame.
-	 *   sys_table
-	 *   handle             <-- 16-byte aligned on entry by ABI
-	 *   return address
-	 *   frame pointer
-	 *   loaded_image       <-- local variable
-	 *   saved %ebx		<-- 16-byte aligned here
-	 *   saved %edi
-	 *   &loaded_image
-	 *   &loaded_image_proto
-	 *   handle             <-- 16-byte aligned for call to handle_protocol
-	 */
-
-	movl	12(%ebp), %eax			// sys_table
-	movl	ST32_boottime(%eax), %eax	// sys_table->boottime
-	call	*BS32_handle_protocol(%eax)	// sys_table->boottime->handle_protocol
-	addl	$12, %esp			// restore argument space
-	testl	%eax, %eax
-	jnz	2f
-
 	movl	8(%ebp), %ecx			// image_handle
 	movl	12(%ebp), %edx			// sys_table
-	movl	-4(%ebp), %esi			// loaded_image
-	movl	LI32_image_base(%esi), %esi	// loaded_image->image_base
-	leal	(startup_32 - 1b)(%ebx), %ebp	// runtime address of startup_32
-	/*
-	 * We need to set the image_offset variable here since startup_32() will
-	 * use it before we get to the 64-bit efi_pe_entry() in C code.
-	 */
-	subl	%esi, %ebp			// calculate image_offset
-	movl	%ebp, (image_offset - 1b)(%ebx)	// save image_offset
 	xorl	%esi, %esi
 	jmp	efi32_entry			// pass %ecx, %edx, %esi
 						// no other registers remain live
@@ -350,15 +304,6 @@ SYM_FUNC_START_NOALIGN(efi64_stub_entry)
 SYM_FUNC_END(efi64_stub_entry)
 #endif
 
-	.section ".rodata"
-	/* EFI loaded image protocol GUID */
-	.balign 4
-SYM_DATA_START_LOCAL(loaded_image_proto)
-	.long	0x5b1b31a1
-	.word	0x9562, 0x11d2
-	.byte	0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b
-SYM_DATA_END(loaded_image_proto)
-
 	.data
 	.balign	8
 SYM_DATA_START_LOCAL(efi32_boot_gdt)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index beee858058df4403..cd9587fcd5084f22 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -84,19 +84,6 @@ SYM_FUNC_START(startup_32)
 
 #ifdef CONFIG_RELOCATABLE
 	leal	startup_32@GOTOFF(%edx), %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32() will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry() will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- *	image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
-	subl    image_offset@GOTOFF(%edx), %ebx
-#endif
-
 	movl	BP_kernel_alignment(%esi), %eax
 	decl	%eax
 	addl    %eax, %ebx
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 556ee504325ae50e..1027e28ced8f2d52 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -146,19 +146,6 @@ SYM_FUNC_START(startup_32)
 
 #ifdef CONFIG_RELOCATABLE
 	movl	%ebp, %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- *	image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
-	subl    rva(image_offset)(%ebp), %ebx
-#endif
-
 	movl	BP_kernel_alignment(%esi), %eax
 	decl	%eax
 	addl	%eax, %ebx
@@ -335,20 +322,6 @@ SYM_CODE_START(startup_64)
 	/* Start with the delta to where the kernel will run at. */
 #ifdef CONFIG_RELOCATABLE
 	leaq	startup_32(%rip) /* - $startup_32 */, %rbp
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- *	image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
-	movl    image_offset(%rip), %eax
-	subq	%rax, %rbp
-#endif
-
 	movl	BP_kernel_alignment(%rsi), %eax
 	decl	%eax
 	addq	%rax, %rbp
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8b4be7cecdb8eb73..b0994ae3bc23f84d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,6 +90,8 @@ static inline void efi_fpu_end(void)
 }
 
 #ifdef CONFIG_X86_32
+#define EFI_X86_KERNEL_ALLOC_LIMIT		(SZ_512M - 1)
+
 #define arch_efi_call_virt_setup()					\
 ({									\
 	efi_fpu_begin();						\
@@ -103,8 +105,7 @@ static inline void efi_fpu_end(void)
 })
 
 #else /* !CONFIG_X86_32 */
-
-#define EFI_LOADER_SIGNATURE	"EL64"
+#define EFI_X86_KERNEL_ALLOC_LIMIT		EFI_ALLOC_LIMIT
 
 extern asmlinkage u64 __efi_call(void *fp, ...);
 
@@ -218,6 +219,8 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,
 
 #ifdef CONFIG_EFI_MIXED
 
+#define EFI_ALLOC_LIMIT		(efi_is_64bit() ? ULONG_MAX : U32_MAX)
+
 #define ARCH_HAS_EFISTUB_WRAPPERS
 
 static inline bool efi_is_64bit(void)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 081c39b0e8d0d208..2cc2ef57e1241a87 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -161,6 +161,7 @@ static __always_inline void sev_es_nmi_complete(void)
 		__sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern void sev_enable(struct boot_params *bp);
 
 static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs)
 {
@@ -215,6 +216,7 @@ static inline void sev_es_ist_exit(void) { }
 static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
 static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline void sev_enable(struct boot_params *bp) { }
 static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
 static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
 static inline void setup_ghcb(void) { }
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 1015ef883f5850a4..ae1a9b4c1af6191f 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,17 +15,14 @@
 #include <asm/setup.h>
 #include <asm/desc.h>
 #include <asm/boot.h>
+#include <asm/kaslr.h>
 #include <asm/sev.h>
 
 #include "efistub.h"
 #include "x86-stub.h"
 
-/* Maximum physical address for 64-bit kernel with 4-level paging */
-#define MAXMEM_X86_64_4LEVEL (1ull << 46)
-
 const efi_system_table_t *efi_system_table;
 const efi_dxe_services_table_t *efi_dxe_table;
-u32 image_offset __section(".data");
 static efi_loaded_image_t *image = NULL;
 static efi_memory_attribute_protocol_t *memattr;
 
@@ -287,28 +284,6 @@ void efi_adjust_memory_range_protection(unsigned long start,
 	}
 }
 
-void startup_32(struct boot_params *boot_params);
-
-static void
-setup_memory_protection(unsigned long image_base, unsigned long image_size)
-{
-#ifdef CONFIG_64BIT
-	if (image_base != (unsigned long)startup_32)
-		efi_adjust_memory_range_protection(image_base, image_size);
-#else
-	/*
-	 * Clear protection flags on a whole range of possible
-	 * addresses used for KASLR. We don't need to do that
-	 * on x86_64, since KASLR/extraction is performed after
-	 * dedicated identity page tables are built and we only
-	 * need to remove possible protection on relocated image
-	 * itself disregarding further relocations.
-	 */
-	efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
-					   KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
-#endif
-}
-
 static void setup_unaccepted_memory(void)
 {
 	efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID;
@@ -334,9 +309,7 @@ static void setup_unaccepted_memory(void)
 
 static const efi_char16_t apple[] = L"Apple";
 
-static void setup_quirks(struct boot_params *boot_params,
-			 unsigned long image_base,
-			 unsigned long image_size)
+static void setup_quirks(struct boot_params *boot_params)
 {
 	efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
 		efi_table_attr(efi_system_table, fw_vendor);
@@ -345,9 +318,6 @@ static void setup_quirks(struct boot_params *boot_params,
 		if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
 			retrieve_apple_device_properties(boot_params);
 	}
-
-	if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
-		setup_memory_protection(image_base, image_size);
 }
 
 /*
@@ -500,7 +470,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
 	}
 
 	image_base = efi_table_attr(image, image_base);
-	image_offset = (void *)startup_32 - image_base;
 
 	status = efi_allocate_pages(sizeof(struct boot_params),
 				    (unsigned long *)&boot_params, ULONG_MAX);
@@ -804,6 +773,61 @@ static bool have_unsupported_snp_features(void)
 	return false;
 }
 
+static void efi_get_seed(void *seed, int size)
+{
+	efi_get_random_bytes(size, seed);
+
+	/*
+	 * This only updates seed[0] when running on 32-bit, but in that case,
+	 * seed[1] is not used anyway, as there is no virtual KASLR on 32-bit.
+	 */
+	*(unsigned long *)seed ^= kaslr_get_random_long("EFI");
+}
+
+static void error(char *str)
+{
+	efi_warn("Decompression failed: %s\n", str);
+}
+
+static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
+{
+	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+	unsigned long addr, alloc_size, entry;
+	efi_status_t status;
+	u32 seed[2] = {};
+
+	/* determine the required size of the allocation */
+	alloc_size = ALIGN(max((unsigned long)output_len, kernel_total_size),
+			   MIN_KERNEL_ALIGN);
+
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
+		u64 range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - kernel_total_size;
+
+		efi_get_seed(seed, sizeof(seed));
+
+		virt_addr += (range * seed[1]) >> 32;
+		virt_addr &= ~(CONFIG_PHYSICAL_ALIGN - 1);
+	}
+
+	status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
+				  seed[0], EFI_LOADER_CODE,
+				  EFI_X86_KERNEL_ALLOC_LIMIT);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	entry = decompress_kernel((void *)addr, virt_addr, error);
+	if (entry == ULONG_MAX) {
+		efi_free(alloc_size, addr);
+		return EFI_LOAD_ERROR;
+	}
+
+	*kernel_entry = addr + entry;
+
+	efi_adjust_memory_range_protection(addr, kernel_total_size);
+
+	return EFI_SUCCESS;
+}
+
 static void __noreturn enter_kernel(unsigned long kernel_addr,
 				    struct boot_params *boot_params)
 {
@@ -823,10 +847,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 			       struct boot_params *boot_params)
 {
 	efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
-	unsigned long bzimage_addr = (unsigned long)startup_32;
-	unsigned long buffer_start, buffer_end;
 	struct setup_header *hdr = &boot_params->hdr;
 	const struct linux_efi_initrd *initrd = NULL;
+	unsigned long kernel_entry;
 	efi_status_t status;
 
 	efi_system_table = sys_table_arg;
@@ -855,60 +878,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 		goto fail;
 	}
 
-	/*
-	 * If the kernel isn't already loaded at a suitable address,
-	 * relocate it.
-	 *
-	 * It must be loaded above LOAD_PHYSICAL_ADDR.
-	 *
-	 * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
-	 * is defined as the macro MAXMEM, but unfortunately that is not a
-	 * compile-time constant if 5-level paging is configured, so we instead
-	 * define our own macro for use here.
-	 *
-	 * For 32-bit, the maximum address is complicated to figure out, for
-	 * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
-	 * KASLR uses.
-	 *
-	 * Also relocate it if image_offset is zero, i.e. the kernel wasn't
-	 * loaded by LoadImage, but rather by a bootloader that called the
-	 * handover entry. The reason we must always relocate in this case is
-	 * to handle the case of systemd-boot booting a unified kernel image,
-	 * which is a PE executable that contains the bzImage and an initrd as
-	 * COFF sections. The initrd section is placed after the bzImage
-	 * without ensuring that there are at least init_size bytes available
-	 * for the bzImage, and thus the compressed kernel's startup code may
-	 * overwrite the initrd unless it is moved out of the way.
-	 */
-
-	buffer_start = ALIGN(bzimage_addr - image_offset,
-			     hdr->kernel_alignment);
-	buffer_end = buffer_start + hdr->init_size;
-
-	if ((buffer_start < LOAD_PHYSICAL_ADDR)				     ||
-	    (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE)    ||
-	    (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
-	    (image_offset == 0)) {
-		extern char _bss[];
-
-		status = efi_relocate_kernel(&bzimage_addr,
-					     (unsigned long)_bss - bzimage_addr,
-					     hdr->init_size,
-					     hdr->pref_address,
-					     hdr->kernel_alignment,
-					     LOAD_PHYSICAL_ADDR);
-		if (status != EFI_SUCCESS) {
-			efi_err("efi_relocate_kernel() failed!\n");
-			goto fail;
-		}
-		/*
-		 * Now that we've copied the kernel elsewhere, we no longer
-		 * have a set up block before startup_32(), so reset image_offset
-		 * to zero in case it was set earlier.
-		 */
-		image_offset = 0;
-	}
-
 #ifdef CONFIG_CMDLINE_BOOL
 	status = efi_parse_options(CONFIG_CMDLINE);
 	if (status != EFI_SUCCESS) {
@@ -926,6 +895,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 		}
 	}
 
+	status = efi_decompress_kernel(&kernel_entry);
+	if (status != EFI_SUCCESS) {
+		efi_err("Failed to decompress kernel\n");
+		goto fail;
+	}
+
 	/*
 	 * At this point, an initrd may already have been loaded by the
 	 * bootloader and passed via bootparams. We permit an initrd loaded
@@ -965,7 +940,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 
 	setup_efi_pci(boot_params);
 
-	setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+	setup_quirks(boot_params);
 
 	setup_unaccepted_memory();
 
@@ -975,13 +950,15 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
 		goto fail;
 	}
 
+	/*
+	 * Call the SEV init code while still running with the firmware's
+	 * GDT/IDT, so #VC exceptions will be handled by EFI.
+	 */
+	sev_enable(boot_params);
+
 	efi_5level_switch();
 
-	if (IS_ENABLED(CONFIG_X86_64))
-		/* add offset of startup_64() */
-		bzimage_addr += 0x200;
-
-	enter_kernel(bzimage_addr, boot_params);
+	enter_kernel(kernel_entry, boot_params);
 fail:
 	efi_err("efi_stub_entry() failed!\n");
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware
  2023-06-07  7:23 ` [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware Ard Biesheuvel
@ 2023-06-07 16:07   ` Tom Lendacky
  2023-06-07 16:51     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Lendacky @ 2023-06-07 16:07 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-efi
  Cc: linux-kernel, Evgeniy Baskov, Borislav Petkov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On 6/7/23 02:23, Ard Biesheuvel wrote:
> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
> decompressor, duplicate the SNP feature check in the EFI stub before
> handing over to the kernel proper.
> 
> The SNP feature check can be performed while running under the EFI boot
> services, which means we can fail gracefully and return an error to the
> bootloader if the loaded kernel does not implement support for all the
> features that the hypervisor enabled.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>   arch/x86/boot/compressed/sev.c          | 71 +++++++++++---------
>   arch/x86/include/asm/sev.h              |  4 ++
>   drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
>   3 files changed, 62 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 09dc8c187b3cc752..9593bc80c9c6b89d 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c

...

> -void sev_enable(struct boot_params *bp)
> +u64 sev_get_status(void)
>   {
>   	unsigned int eax, ebx, ecx, edx;
>   	struct msr m;
> +
> +	/* Check for the SME/SEV support leaf */
> +	eax = 0x80000000;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (eax < 0x8000001f)
> +		return 0;
> +
> +	/*
> +	 * Check for the SME/SEV feature:
> +	 *   CPUID Fn8000_001F[EAX]
> +	 *   - Bit 0 - Secure Memory Encryption support
> +	 *   - Bit 1 - Secure Encrypted Virtualization support
> +	 *   CPUID Fn8000_001F[EBX]
> +	 *   - Bits 5:0 - Pagetable bit position used to indicate encryption
> +	 */
> +	eax = 0x8000001f;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	/* Check whether SEV is supported */
> +	if (!(eax & BIT(1)))
> +		return 0;
> +
> +	/* Set the SME mask if this is an SEV guest. */
> +	sme_me_mask = BIT_ULL(ebx & 0x3f);
> +
> +	boot_rdmsr(MSR_AMD64_SEV, &m);
> +	return m.q;
> +}
> +
> +void sev_enable(struct boot_params *bp)
> +{
>   	bool snp;
>   
>   	/*
> @@ -410,37 +447,13 @@ void sev_enable(struct boot_params *bp)
>   	 */
>   	snp = snp_init(bp);
>   
> -	/* Check for the SME/SEV support leaf */
> -	eax = 0x80000000;
> -	ecx = 0;
> -	native_cpuid(&eax, &ebx, &ecx, &edx);
> -	if (eax < 0x8000001f)
> -		return;
> -
> -	/*
> -	 * Check for the SME/SEV feature:
> -	 *   CPUID Fn8000_001F[EAX]
> -	 *   - Bit 0 - Secure Memory Encryption support
> -	 *   - Bit 1 - Secure Encrypted Virtualization support
> -	 *   CPUID Fn8000_001F[EBX]
> -	 *   - Bits 5:0 - Pagetable bit position used to indicate encryption
> -	 */
> -	eax = 0x8000001f;
> -	ecx = 0;
> -	native_cpuid(&eax, &ebx, &ecx, &edx);
> -	/* Check whether SEV is supported */
> -	if (!(eax & BIT(1))) {
> +	sev_status = sev_get_status();
> +	if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
>   		if (snp)
>   			error("SEV-SNP support indicated by CC blob, but not CPUID.");

This ends up checking the CPUID path because if SEV isn't advertised in 
CPUID the returned status value is 0. But it also checks the SEV_STATUS 
MSR as well. So I think you can remove the SNP / SEV_STATUS check at the 
end of this function (since that check is identical to this now) and just 
update the message to indicate not CPUID or SEV status MSR.

The sme_me_mask should probably be cleared at this point before returning, 
too. Or, alternately, in sev_get_status(), you can update the setting of 
sme_me_mask to based on MSR_AMD64_SEV_ENABLED being set in the SEV_STATUS MSR.

>   		return;
>   	}
>   
> -	/* Set the SME mask if this is an SEV guest. */
> -	boot_rdmsr(MSR_AMD64_SEV, &m);
> -	sev_status = m.q;
> -	if (!(sev_status & MSR_AMD64_SEV_ENABLED))
> -		return;
> -
>   	/* Negotiate the GHCB protocol version. */
>   	if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
>   		if (!sev_es_negotiate_protocol())
> @@ -460,8 +473,6 @@ void sev_enable(struct boot_params *bp)
>   
>   	if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
>   		error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
> -
> -	sme_me_mask = BIT_ULL(ebx & 0x3f);
>   }
>   
>   /* Search for Confidential Computing blob in the EFI config table. */
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 86e1296e87f513b7..081c39b0e8d0d208 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -207,6 +207,8 @@ bool snp_init(struct boot_params *bp);
>   void __init __noreturn snp_abort(void);
>   int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
>   void snp_accept_memory(phys_addr_t start, phys_addr_t end);
> +u64 snp_get_unsupported_features(u64 status);
> +u64 sev_get_status(void);
>   #else
>   static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>   static inline void sev_es_ist_exit(void) { }
> @@ -232,6 +234,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
>   }
>   
>   static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
> +static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
> +static inline u64 sev_get_status(void) { return 0; }
>   #endif
>   
>   #endif
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index abcd5703e9f3f980..1015ef883f5850a4 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -15,6 +15,7 @@
>   #include <asm/setup.h>
>   #include <asm/desc.h>
>   #include <asm/boot.h>
> +#include <asm/sev.h>
>   
>   #include "efistub.h"
>   #include "x86-stub.h"
> @@ -790,6 +791,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
>   	return EFI_SUCCESS;
>   }
>   
> +static bool have_unsupported_snp_features(void)
> +{
> +	u64 unsupported;
> +
> +	unsupported = snp_get_unsupported_features(sev_get_status());

This will also set sme_me_mask, but I think that is ok, since on error 
things will terminate, otherwise sev_enable() should update appropriately 
later.

Thanks,
Tom

> +	if (unsupported) {
> +		efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
> +			unsupported);
> +		return true;
> +	}
> +	return false;
> +}
> +
>   static void __noreturn enter_kernel(unsigned long kernel_addr,
>   				    struct boot_params *boot_params)
>   {
> @@ -820,6 +834,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
>   	if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
>   		efi_exit(handle, EFI_INVALID_PARAMETER);
>   
> +	if (have_unsupported_snp_features())
> +		efi_exit(handle, EFI_UNSUPPORTED);
> +
>   	if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
>   		efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>   		if (efi_dxe_table &&

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware
  2023-06-07 16:07   ` Tom Lendacky
@ 2023-06-07 16:51     ` Ard Biesheuvel
  2023-06-07 17:29       ` Tom Lendacky
  0 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07 16:51 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Wed, 7 Jun 2023 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 6/7/23 02:23, Ard Biesheuvel wrote:
> > Before refactoring the EFI stub boot flow to avoid the legacy bare metal
> > decompressor, duplicate the SNP feature check in the EFI stub before
> > handing over to the kernel proper.
> >
> > The SNP feature check can be performed while running under the EFI boot
> > services, which means we can fail gracefully and return an error to the
> > bootloader if the loaded kernel does not implement support for all the
> > features that the hypervisor enabled.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >   arch/x86/boot/compressed/sev.c          | 71 +++++++++++---------
> >   arch/x86/include/asm/sev.h              |  4 ++
> >   drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
> >   3 files changed, 62 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> > index 09dc8c187b3cc752..9593bc80c9c6b89d 100644
> > --- a/arch/x86/boot/compressed/sev.c
> > +++ b/arch/x86/boot/compressed/sev.c
>
> ...
>
> > -void sev_enable(struct boot_params *bp)
> > +u64 sev_get_status(void)
> >   {
> >       unsigned int eax, ebx, ecx, edx;
> >       struct msr m;
> > +
> > +     /* Check for the SME/SEV support leaf */
> > +     eax = 0x80000000;
> > +     ecx = 0;
> > +     native_cpuid(&eax, &ebx, &ecx, &edx);
> > +     if (eax < 0x8000001f)
> > +             return 0;
> > +
> > +     /*
> > +      * Check for the SME/SEV feature:
> > +      *   CPUID Fn8000_001F[EAX]
> > +      *   - Bit 0 - Secure Memory Encryption support
> > +      *   - Bit 1 - Secure Encrypted Virtualization support
> > +      *   CPUID Fn8000_001F[EBX]
> > +      *   - Bits 5:0 - Pagetable bit position used to indicate encryption
> > +      */
> > +     eax = 0x8000001f;
> > +     ecx = 0;
> > +     native_cpuid(&eax, &ebx, &ecx, &edx);
> > +     /* Check whether SEV is supported */
> > +     if (!(eax & BIT(1)))
> > +             return 0;
> > +
> > +     /* Set the SME mask if this is an SEV guest. */
> > +     sme_me_mask = BIT_ULL(ebx & 0x3f);
> > +
> > +     boot_rdmsr(MSR_AMD64_SEV, &m);
> > +     return m.q;
> > +}
> > +
> > +void sev_enable(struct boot_params *bp)
> > +{
> >       bool snp;
> >
> >       /*
> > @@ -410,37 +447,13 @@ void sev_enable(struct boot_params *bp)
> >        */
> >       snp = snp_init(bp);
> >
> > -     /* Check for the SME/SEV support leaf */
> > -     eax = 0x80000000;
> > -     ecx = 0;
> > -     native_cpuid(&eax, &ebx, &ecx, &edx);
> > -     if (eax < 0x8000001f)
> > -             return;
> > -
> > -     /*
> > -      * Check for the SME/SEV feature:
> > -      *   CPUID Fn8000_001F[EAX]
> > -      *   - Bit 0 - Secure Memory Encryption support
> > -      *   - Bit 1 - Secure Encrypted Virtualization support
> > -      *   CPUID Fn8000_001F[EBX]
> > -      *   - Bits 5:0 - Pagetable bit position used to indicate encryption
> > -      */
> > -     eax = 0x8000001f;
> > -     ecx = 0;
> > -     native_cpuid(&eax, &ebx, &ecx, &edx);
> > -     /* Check whether SEV is supported */
> > -     if (!(eax & BIT(1))) {
> > +     sev_status = sev_get_status();
> > +     if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
> >               if (snp)
> >                       error("SEV-SNP support indicated by CC blob, but not CPUID.");
>
> This ends up checking the CPUID path because if SEV isn't advertised in
> CPUID the returned status value is 0. But it also checks the SEV_STATUS
> MSR as well. So I think you can remove the SNP / SEV_STATUS check at the
> end of this function (since that check is identical to this now) and just
> update the message to indicate not CPUID or SEV status MSR.
>

But that one checks for MSR_AMD64_SEV_SNP_ENABLED not
MSR_AMD64_SEV_ENABLED. Does that matter at all?

> The sme_me_mask should probably be cleared at this point before returning,
> too. Or, alternately, in sev_get_status(), you can update the setting of
> sme_me_mask to based on MSR_AMD64_SEV_ENABLED being set in the SEV_STATUS MSR.
>

I'll go for the latter, seems cleaner not to touch it in that case.

--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -422,10 +422,12 @@ u64 sev_get_status(void)
        if (!(eax & BIT(1)))
                return 0;

-       /* Set the SME mask if this is an SEV guest. */
-       sme_me_mask = BIT_ULL(ebx & 0x3f);
-
        boot_rdmsr(MSR_AMD64_SEV, &m);
+
+       /* Set the SME mask if this is an SEV guest. */
+       if (m.q & MSR_AMD64_SEV_ENABLED)
+               sme_me_mask = BIT_ULL(ebx & 0x3f);
+
        return m.q;
 }


> >   /* Search for Confidential Computing blob in the EFI config table. */
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index 86e1296e87f513b7..081c39b0e8d0d208 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -207,6 +207,8 @@ bool snp_init(struct boot_params *bp);
> >   void __init __noreturn snp_abort(void);
> >   int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
> >   void snp_accept_memory(phys_addr_t start, phys_addr_t end);
> > +u64 snp_get_unsupported_features(u64 status);
> > +u64 sev_get_status(void);
> >   #else
> >   static inline void sev_es_ist_enter(struct pt_regs *regs) { }
> >   static inline void sev_es_ist_exit(void) { }
> > @@ -232,6 +234,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
> >   }
> >
> >   static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
> > +static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
> > +static inline u64 sev_get_status(void) { return 0; }
> >   #endif
> >
> >   #endif
> > diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> > index abcd5703e9f3f980..1015ef883f5850a4 100644
> > --- a/drivers/firmware/efi/libstub/x86-stub.c
> > +++ b/drivers/firmware/efi/libstub/x86-stub.c
> > @@ -15,6 +15,7 @@
> >   #include <asm/setup.h>
> >   #include <asm/desc.h>
> >   #include <asm/boot.h>
> > +#include <asm/sev.h>
> >
> >   #include "efistub.h"
> >   #include "x86-stub.h"
> > @@ -790,6 +791,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
> >       return EFI_SUCCESS;
> >   }
> >
> > +static bool have_unsupported_snp_features(void)
> > +{
> > +     u64 unsupported;
> > +
> > +     unsupported = snp_get_unsupported_features(sev_get_status());
>
> This will also set sme_me_mask, but I think that is ok, since on error
> things will terminate, otherwise sev_enable() should update appropriately
> later.
>

OK

> > +     if (unsupported) {
> > +             efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
> > +                     unsupported);
> > +             return true;
> > +     }
> > +     return false;
> > +}
> > +
> >   static void __noreturn enter_kernel(unsigned long kernel_addr,
> >                                   struct boot_params *boot_params)
> >   {
> > @@ -820,6 +834,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
> >       if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
> >               efi_exit(handle, EFI_INVALID_PARAMETER);
> >
> > +     if (have_unsupported_snp_features())
> > +             efi_exit(handle, EFI_UNSUPPORTED);
> > +
> >       if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> >               efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> >               if (efi_dxe_table &&

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware
  2023-06-07 16:51     ` Ard Biesheuvel
@ 2023-06-07 17:29       ` Tom Lendacky
  0 siblings, 0 replies; 42+ messages in thread
From: Tom Lendacky @ 2023-06-07 17:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On 6/7/23 11:51, Ard Biesheuvel wrote:
> On Wed, 7 Jun 2023 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 6/7/23 02:23, Ard Biesheuvel wrote:
>>> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
>>> decompressor, duplicate the SNP feature check in the EFI stub before
>>> handing over to the kernel proper.
>>>
>>> The SNP feature check can be performed while running under the EFI boot
>>> services, which means we can fail gracefully and return an error to the
>>> bootloader if the loaded kernel does not implement support for all the
>>> features that the hypervisor enabled.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> ---
>>>    arch/x86/boot/compressed/sev.c          | 71 +++++++++++---------
>>>    arch/x86/include/asm/sev.h              |  4 ++
>>>    drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
>>>    3 files changed, 62 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
>>> index 09dc8c187b3cc752..9593bc80c9c6b89d 100644
>>> --- a/arch/x86/boot/compressed/sev.c
>>> +++ b/arch/x86/boot/compressed/sev.c
>>
>> ...
>>
>>> -void sev_enable(struct boot_params *bp)
>>> +u64 sev_get_status(void)
>>>    {
>>>        unsigned int eax, ebx, ecx, edx;
>>>        struct msr m;
>>> +
>>> +     /* Check for the SME/SEV support leaf */
>>> +     eax = 0x80000000;
>>> +     ecx = 0;
>>> +     native_cpuid(&eax, &ebx, &ecx, &edx);
>>> +     if (eax < 0x8000001f)
>>> +             return 0;
>>> +
>>> +     /*
>>> +      * Check for the SME/SEV feature:
>>> +      *   CPUID Fn8000_001F[EAX]
>>> +      *   - Bit 0 - Secure Memory Encryption support
>>> +      *   - Bit 1 - Secure Encrypted Virtualization support
>>> +      *   CPUID Fn8000_001F[EBX]
>>> +      *   - Bits 5:0 - Pagetable bit position used to indicate encryption
>>> +      */
>>> +     eax = 0x8000001f;
>>> +     ecx = 0;
>>> +     native_cpuid(&eax, &ebx, &ecx, &edx);
>>> +     /* Check whether SEV is supported */
>>> +     if (!(eax & BIT(1)))
>>> +             return 0;
>>> +
>>> +     /* Set the SME mask if this is an SEV guest. */
>>> +     sme_me_mask = BIT_ULL(ebx & 0x3f);
>>> +
>>> +     boot_rdmsr(MSR_AMD64_SEV, &m);
>>> +     return m.q;
>>> +}
>>> +
>>> +void sev_enable(struct boot_params *bp)
>>> +{
>>>        bool snp;
>>>
>>>        /*
>>> @@ -410,37 +447,13 @@ void sev_enable(struct boot_params *bp)
>>>         */
>>>        snp = snp_init(bp);
>>>
>>> -     /* Check for the SME/SEV support leaf */
>>> -     eax = 0x80000000;
>>> -     ecx = 0;
>>> -     native_cpuid(&eax, &ebx, &ecx, &edx);
>>> -     if (eax < 0x8000001f)
>>> -             return;
>>> -
>>> -     /*
>>> -      * Check for the SME/SEV feature:
>>> -      *   CPUID Fn8000_001F[EAX]
>>> -      *   - Bit 0 - Secure Memory Encryption support
>>> -      *   - Bit 1 - Secure Encrypted Virtualization support
>>> -      *   CPUID Fn8000_001F[EBX]
>>> -      *   - Bits 5:0 - Pagetable bit position used to indicate encryption
>>> -      */
>>> -     eax = 0x8000001f;
>>> -     ecx = 0;
>>> -     native_cpuid(&eax, &ebx, &ecx, &edx);
>>> -     /* Check whether SEV is supported */
>>> -     if (!(eax & BIT(1))) {
>>> +     sev_status = sev_get_status();
>>> +     if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
>>>                if (snp)
>>>                        error("SEV-SNP support indicated by CC blob, but not CPUID.");
>>
>> This ends up checking the CPUID path because if SEV isn't advertised in
>> CPUID the returned status value is 0. But it also checks the SEV_STATUS
>> MSR as well. So I think you can remove the SNP / SEV_STATUS check at the
>> end of this function (since that check is identical to this now) and just
>> update the message to indicate not CPUID or SEV status MSR.
>>
> 
> But that one checks for MSR_AMD64_SEV_SNP_ENABLED not
> MSR_AMD64_SEV_ENABLED. Does that matter at all?

Ugh, my bad, I misread that last check. Ignore my comment.

> 
>> The sme_me_mask should probably be cleared at this point before returning,
>> too. Or, alternately, in sev_get_status(), you can update the setting of
>> sme_me_mask to based on MSR_AMD64_SEV_ENABLED being set in the SEV_STATUS MSR.
>>
> 
> I'll go for the latter, seems cleaner not to touch it in that case.

Sounds good.

Thanks
Tom

> 
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -422,10 +422,12 @@ u64 sev_get_status(void)
>          if (!(eax & BIT(1)))
>                  return 0;
> 
> -       /* Set the SME mask if this is an SEV guest. */
> -       sme_me_mask = BIT_ULL(ebx & 0x3f);
> -
>          boot_rdmsr(MSR_AMD64_SEV, &m);
> +
> +       /* Set the SME mask if this is an SEV guest. */
> +       if (m.q & MSR_AMD64_SEV_ENABLED)
> +               sme_me_mask = BIT_ULL(ebx & 0x3f);
> +
>          return m.q;
>   }
> 
> 
>>>    /* Search for Confidential Computing blob in the EFI config table. */
>>> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
>>> index 86e1296e87f513b7..081c39b0e8d0d208 100644
>>> --- a/arch/x86/include/asm/sev.h
>>> +++ b/arch/x86/include/asm/sev.h
>>> @@ -207,6 +207,8 @@ bool snp_init(struct boot_params *bp);
>>>    void __init __noreturn snp_abort(void);
>>>    int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
>>>    void snp_accept_memory(phys_addr_t start, phys_addr_t end);
>>> +u64 snp_get_unsupported_features(u64 status);
>>> +u64 sev_get_status(void);
>>>    #else
>>>    static inline void sev_es_ist_enter(struct pt_regs *regs) { }
>>>    static inline void sev_es_ist_exit(void) { }
>>> @@ -232,6 +234,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
>>>    }
>>>
>>>    static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
>>> +static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
>>> +static inline u64 sev_get_status(void) { return 0; }
>>>    #endif
>>>
>>>    #endif
>>> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
>>> index abcd5703e9f3f980..1015ef883f5850a4 100644
>>> --- a/drivers/firmware/efi/libstub/x86-stub.c
>>> +++ b/drivers/firmware/efi/libstub/x86-stub.c
>>> @@ -15,6 +15,7 @@
>>>    #include <asm/setup.h>
>>>    #include <asm/desc.h>
>>>    #include <asm/boot.h>
>>> +#include <asm/sev.h>
>>>
>>>    #include "efistub.h"
>>>    #include "x86-stub.h"
>>> @@ -790,6 +791,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
>>>        return EFI_SUCCESS;
>>>    }
>>>
>>> +static bool have_unsupported_snp_features(void)
>>> +{
>>> +     u64 unsupported;
>>> +
>>> +     unsupported = snp_get_unsupported_features(sev_get_status());
>>
>> This will also set sme_me_mask, but I think that is ok, since on error
>> things will terminate, otherwise sev_enable() should update appropriately
>> later.
>>
> 
> OK
> 
>>> +     if (unsupported) {
>>> +             efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
>>> +                     unsupported);
>>> +             return true;
>>> +     }
>>> +     return false;
>>> +}
>>> +
>>>    static void __noreturn enter_kernel(unsigned long kernel_addr,
>>>                                    struct boot_params *boot_params)
>>>    {
>>> @@ -820,6 +834,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
>>>        if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
>>>                efi_exit(handle, EFI_INVALID_PARAMETER);
>>>
>>> +     if (have_unsupported_snp_features())
>>> +             efi_exit(handle, EFI_UNSUPPORTED);
>>> +
>>>        if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
>>>                efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>>>                if (efi_dxe_table &&

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code
  2023-06-07  7:23 ` [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code Ard Biesheuvel
@ 2023-06-07 18:09   ` Yunhong Jiang
  2023-06-08  8:04     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Yunhong Jiang @ 2023-06-07 18:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 09:23:32AM +0200, Ard Biesheuvel wrote:
> Instead of returning to the asm calling code to invoke the trampoline,
> call it straight from the C code that sets the scene. That way, the
> struct return type is no longer needed for returning two values, and the
> call can be made conditional more cleanly in a subsequent patch.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/boot/compressed/head_64.S    | 20 +++-----------
>  arch/x86/boot/compressed/pgtable_64.c | 28 ++++++++------------
>  2 files changed, 15 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index cdefafd456c70335..3d4da7e5270c8d4d 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -430,24 +430,12 @@ SYM_CODE_START(startup_64)
>  #endif
>  
>  	/*
> -	 * paging_prepare() sets up the trampoline and checks if we need to
> -	 * enable 5-level paging.
> -	 *
> -	 * paging_prepare() returns a two-quadword structure which lands
> -	 * into RDX:RAX:
> -	 *   - Address of the trampoline is returned in RAX.
> -	 *   - Non zero RDX means trampoline needs to enable 5-level
> -	 *     paging.
> -	 *
> +	 * set_paging_levels() updates the number of paging levels using a
> +	 * trampoline in 32-bit addressable memory if the current number does
> +	 * not match the desired number.
>  	 */
>  	movq	%r15, %rdi		/* pass struct boot_params pointer */
> -	call	paging_prepare
> -
> -	/* Pass the trampoline address and boolean flag as args #1 and #2 */
> -	movq	%rax, %rdi
> -	movq	%rdx, %rsi
> -	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
> -	call	*%rax
> +	call	set_paging_levels
>  
>  	/*
>  	 * cleanup_trampoline() would restore trampoline memory.
> diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> index d66639c961b8eeda..1d28ad95ea839531 100644
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -16,11 +16,6 @@ unsigned int __section(".data") pgdir_shift = 39;
>  unsigned int __section(".data") ptrs_per_p4d = 1;
>  #endif
>  
> -struct paging_config {
> -	unsigned long trampoline_start;
> -	unsigned long l5_required;
> -};
> -
>  /* Buffer to preserve trampoline memory */
>  static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
>  
> @@ -106,10 +101,10 @@ static unsigned long find_trampoline_placement(void)
>  	return bios_start - TRAMPOLINE_32BIT_SIZE;
>  }
>  
> -struct paging_config paging_prepare(void *rmode)
> +asmlinkage void set_paging_levels(void *rmode)

Can you please change the refer to paging_prepare() in the comments above also?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code
  2023-06-07  7:23 ` [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code Ard Biesheuvel
@ 2023-06-07 18:53   ` Borislav Petkov
  2023-06-07 19:39     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2023-06-07 18:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 09:23:23AM +0200, Ard Biesheuvel wrote:
> -	return bzimage_addr;
> +	if (IS_ENABLED(CONFIG_X86_64))
> +		/* add offset of startup_64() */
> +		bzimage_addr += 0x200;

Uh, magic.

Well, there's this:

arch/x86/boot/compressed/head_64.S:

        .code64
        .org 0x200
SYM_CODE_START(startup_64)
        /*
         * 64bit entry is 0x200 and it is ABI so immutable!
         * We come here either from startup_32 or directly from a
         * 64bit bootloader.


Looking at Documentation/arch/x86/boot.rst, we actually say in the
xloadflags section:

  Bit 0 (read): XLF_KERNEL_64

        - If 1, this kernel has the legacy 64-bit entry point at 0x200.

and header.S sets that:

xloadflags:
#ifdef CONFIG_X86_64
# define XLF0 XLF_KERNEL_64                     /* 64-bit kernel */

so you checking CONFIG_X86_64 is probably ok.

It might be cleaner, though, if you test XLF_KERNEL_64 directly and act
accordingly, although, do I understand it correctly, that the EFI
libstub goes together with the kernel it was built for so the checks
would be doing the same thing...? I.e., the libstub cannot be somehow
"glued" with another kernel or so, which doesn't set CONFIG_X86_64.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline
  2023-06-07  7:23 ` [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline Ard Biesheuvel
@ 2023-06-07 19:38   ` Yunhong Jiang
  2023-06-07 20:07     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Yunhong Jiang @ 2023-06-07 19:38 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 09:23:30AM +0200, Ard Biesheuvel wrote:
> Update the trampoline code so its arguments are passed via RDI and RSI,
> which matches the ordinary SysV calling convention for x86_64. This will
> allow this code to be called directly from C.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/boot/compressed/head_64.S | 30 +++++++++-----------
>  arch/x86/boot/compressed/pgtable.h |  2 +-
>  2 files changed, 14 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index af45ddd8297a4a07..a387cd80964e1a1e 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -443,9 +443,9 @@ SYM_CODE_START(startup_64)
>  	movq	%r15, %rdi		/* pass struct boot_params pointer */
>  	call	paging_prepare
>  
> -	/* Save the trampoline address in RCX */
> -	movq	%rax, %rcx
> -
> +	/* Pass the trampoline address and boolean flag as args #1 and #2 */
> +	movq	%rax, %rdi
> +	movq	%rdx, %rsi
>  	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
>  	call	*%rax
>  
> @@ -534,11 +534,11 @@ SYM_FUNC_END(.Lrelocated)
>  /*
>   * This is the 32-bit trampoline that will be copied over to low memory.
>   *
> - * ECX contains the base address of the trampoline memory.
> - * Non zero RDX means trampoline needs to enable 5-level paging.
> + * EDI contains the base address of the trampoline memory.
> + * Non-zero ESI means trampoline needs to enable 5-level paging.
>   */
>  SYM_CODE_START(trampoline_32bit_src)

After the whole patchset, this function now only switch the paging level, is my
understanding correct? After all, it's converted to toggle_la57 directly in the
followed patches. If that's the case, would it makes sense to rename it
correspondingly?

Also, to align with the toggle_la57, would we make the first parameter as just
page table, instead of trampoline memory address?

> -	popq	%rdi
> +	popq	%r8
>  	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
>  	pushq	$__KERNEL32_CS
>  	leaq	0f(%rip), %rax
> @@ -552,7 +552,7 @@ SYM_CODE_START(trampoline_32bit_src)
>  	movl	%eax, %ss
>  
>  	/* Set up new stack */
> -	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
> +	leal	TRAMPOLINE_32BIT_STACK_END(%edi), %esp
>  
>  	/* Disable paging */
>  	movl	%cr0, %eax
> @@ -560,7 +560,7 @@ SYM_CODE_START(trampoline_32bit_src)
>  	movl	%eax, %cr0
>  
>  	/* Check what paging mode we want to be in after the trampoline */
> -	testl	%edx, %edx
> +	testl	%esi, %esi
>  	jz	1f
>  
>  	/* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
> @@ -575,21 +575,17 @@ SYM_CODE_START(trampoline_32bit_src)
>  	jz	3f
>  2:
>  	/* Point CR3 to the trampoline's new top level page table */
> -	leal	TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
> +	leal	TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
>  	movl	%eax, %cr3
>  3:
>  	/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
> -	pushl	%ecx
> -	pushl	%edx
>  	movl	$MSR_EFER, %ecx
>  	rdmsr
>  	btsl	$_EFER_LME, %eax
>  	/* Avoid writing EFER if no change was made (for TDX guest) */
>  	jc	1f
>  	wrmsr
> -1:	popl	%edx
> -	popl	%ecx
> -
> +1:
>  #ifdef CONFIG_X86_MCE
>  	/*
>  	 * Preserve CR4.MCE if the kernel will enable #MC support.
> @@ -606,14 +602,14 @@ SYM_CODE_START(trampoline_32bit_src)
>  
>  	/* Enable PAE and LA57 (if required) paging modes */
>  	orl	$X86_CR4_PAE, %eax
> -	testl	%edx, %edx
> +	testl	%esi, %esi
>  	jz	1f
>  	orl	$X86_CR4_LA57, %eax
>  1:
>  	movl	%eax, %cr4
>  
>  	/* Calculate address of paging_enabled() once we are executing in the trampoline */
> -	leal	.Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
> +	leal	.Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%edi), %eax
>  
>  	/* Prepare the stack for far return to Long Mode */
>  	pushl	$__KERNEL_CS
> @@ -630,7 +626,7 @@ SYM_CODE_END(trampoline_32bit_src)
>  	.code64
>  SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
>  	/* Return from the trampoline */
> -	jmp	*%rdi
> +	jmp	*%r8
>  SYM_FUNC_END(.Lpaging_enabled)
>  
>  	/*
> diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
> index 91dbb99203fbce2d..4e8cef135226bcbb 100644
> --- a/arch/x86/boot/compressed/pgtable.h
> +++ b/arch/x86/boot/compressed/pgtable.h
> @@ -14,7 +14,7 @@
>  
>  extern unsigned long *trampoline_32bit;
>  
> -extern void trampoline_32bit_src(void *return_ptr);
> +extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
>  
>  #endif /* __ASSEMBLER__ */
>  #endif /* BOOT_COMPRESSED_PAGETABLE_H */
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code
  2023-06-07 18:53   ` Borislav Petkov
@ 2023-06-07 19:39     ` Ard Biesheuvel
  0 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07 19:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Wed, 7 Jun 2023 at 20:53, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jun 07, 2023 at 09:23:23AM +0200, Ard Biesheuvel wrote:
> > -     return bzimage_addr;
> > +     if (IS_ENABLED(CONFIG_X86_64))
> > +             /* add offset of startup_64() */
> > +             bzimage_addr += 0x200;
>
> Uh, magic.
>
> Well, there's this:
>
> arch/x86/boot/compressed/head_64.S:
>
>         .code64
>         .org 0x200
> SYM_CODE_START(startup_64)
>         /*
>          * 64bit entry is 0x200 and it is ABI so immutable!
>          * We come here either from startup_32 or directly from a
>          * 64bit bootloader.
>
>
> Looking at Documentation/arch/x86/boot.rst, we actually say in the
> xloadflags section:
>
>   Bit 0 (read): XLF_KERNEL_64
>
>         - If 1, this kernel has the legacy 64-bit entry point at 0x200.
>
> and header.S sets that:
>
> xloadflags:
> #ifdef CONFIG_X86_64
> # define XLF0 XLF_KERNEL_64                     /* 64-bit kernel */
>
> so you checking CONFIG_X86_64 is probably ok.
>
> It might be cleaner, though, if you test XLF_KERNEL_64 directly and act
> accordingly, although, do I understand it correctly, that the EFI
> libstub goes together with the kernel it was built for so the checks
> would be doing the same thing...? I.e., the libstub cannot be somehow
> "glued" with another kernel or so, which doesn't set CONFIG_X86_64.
>

This code gets removed again later in the series, so i didn't bother
with any of this.

I think checking those XLF flags from code is something to avoid in
any case - it is part of the file representation, and consumed by
loaders. I am not sure we can assume that they are always accessible
from the running code (and EFI is not guaranteed to load the first
sector as this is the file header not the payload)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline
  2023-06-07 19:38   ` Yunhong Jiang
@ 2023-06-07 20:07     ` Ard Biesheuvel
  0 siblings, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07 20:07 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, 7 Jun 2023 at 21:38, Yunhong Jiang
<yunhong.jiang@linux.intel.com> wrote:
>
> On Wed, Jun 07, 2023 at 09:23:30AM +0200, Ard Biesheuvel wrote:
> > Update the trampoline code so its arguments are passed via RDI and RSI,
> > which matches the ordinary SysV calling convention for x86_64. This will
> > allow this code to be called directly from C.
> >
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/boot/compressed/head_64.S | 30 +++++++++-----------
> >  arch/x86/boot/compressed/pgtable.h |  2 +-
> >  2 files changed, 14 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index af45ddd8297a4a07..a387cd80964e1a1e 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -443,9 +443,9 @@ SYM_CODE_START(startup_64)
> >       movq    %r15, %rdi              /* pass struct boot_params pointer */
> >       call    paging_prepare
> >
> > -     /* Save the trampoline address in RCX */
> > -     movq    %rax, %rcx
> > -
> > +     /* Pass the trampoline address and boolean flag as args #1 and #2 */
> > +     movq    %rax, %rdi
> > +     movq    %rdx, %rsi
> >       leaq    TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
> >       call    *%rax
> >
> > @@ -534,11 +534,11 @@ SYM_FUNC_END(.Lrelocated)
> >  /*
> >   * This is the 32-bit trampoline that will be copied over to low memory.
> >   *
> > - * ECX contains the base address of the trampoline memory.
> > - * Non zero RDX means trampoline needs to enable 5-level paging.
> > + * EDI contains the base address of the trampoline memory.
> > + * Non-zero ESI means trampoline needs to enable 5-level paging.
> >   */
> >  SYM_CODE_START(trampoline_32bit_src)
>
> After the whole patchset, this function now only switch the paging level, is my
> understanding correct? After all, it's converted to toggle_la57 directly in the
> followed patches. If that's the case, would it makes sense to rename it
> correspondingly?
>

This is template code that is copied to a 32-bit addressable buffer
and called there.

> Also, to align with the toggle_la57, would we make the first parameter as just
> page table, instead of trampoline memory address?
>

Sure, but instead of rewriting this code from scratch, I decided to
make incremental changes to it, for easier review and bisect.

Of course, we could change the name, change the prototype, and another
thing we might do is drop the second argument, which is redundant now
that we always toggle and never preserve the existing value of LA57.

However, this was not necessary for making the code reusable from the
EFI stub, so I left it for further cleanup.

> > -     popq    %rdi
> > +     popq    %r8
> >       /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
> >       pushq   $__KERNEL32_CS
> >       leaq    0f(%rip), %rax
> > @@ -552,7 +552,7 @@ SYM_CODE_START(trampoline_32bit_src)
> >       movl    %eax, %ss
> >
> >       /* Set up new stack */
> > -     leal    TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
> > +     leal    TRAMPOLINE_32BIT_STACK_END(%edi), %esp
> >
> >       /* Disable paging */
> >       movl    %cr0, %eax
> > @@ -560,7 +560,7 @@ SYM_CODE_START(trampoline_32bit_src)
> >       movl    %eax, %cr0
> >
> >       /* Check what paging mode we want to be in after the trampoline */
> > -     testl   %edx, %edx
> > +     testl   %esi, %esi
> >       jz      1f
> >
> >       /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
> > @@ -575,21 +575,17 @@ SYM_CODE_START(trampoline_32bit_src)
> >       jz      3f
> >  2:
> >       /* Point CR3 to the trampoline's new top level page table */
> > -     leal    TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
> > +     leal    TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
> >       movl    %eax, %cr3
> >  3:
> >       /* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
> > -     pushl   %ecx
> > -     pushl   %edx
> >       movl    $MSR_EFER, %ecx
> >       rdmsr
> >       btsl    $_EFER_LME, %eax
> >       /* Avoid writing EFER if no change was made (for TDX guest) */
> >       jc      1f
> >       wrmsr
> > -1:   popl    %edx
> > -     popl    %ecx
> > -
> > +1:
> >  #ifdef CONFIG_X86_MCE
> >       /*
> >        * Preserve CR4.MCE if the kernel will enable #MC support.
> > @@ -606,14 +602,14 @@ SYM_CODE_START(trampoline_32bit_src)
> >
> >       /* Enable PAE and LA57 (if required) paging modes */
> >       orl     $X86_CR4_PAE, %eax
> > -     testl   %edx, %edx
> > +     testl   %esi, %esi
> >       jz      1f
> >       orl     $X86_CR4_LA57, %eax
> >  1:
> >       movl    %eax, %cr4
> >
> >       /* Calculate address of paging_enabled() once we are executing in the trampoline */
> > -     leal    .Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
> > +     leal    .Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%edi), %eax
> >
> >       /* Prepare the stack for far return to Long Mode */
> >       pushl   $__KERNEL_CS
> > @@ -630,7 +626,7 @@ SYM_CODE_END(trampoline_32bit_src)
> >       .code64
> >  SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
> >       /* Return from the trampoline */
> > -     jmp     *%rdi
> > +     jmp     *%r8
> >  SYM_FUNC_END(.Lpaging_enabled)
> >
> >       /*
> > diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
> > index 91dbb99203fbce2d..4e8cef135226bcbb 100644
> > --- a/arch/x86/boot/compressed/pgtable.h
> > +++ b/arch/x86/boot/compressed/pgtable.h
> > @@ -14,7 +14,7 @@
> >
> >  extern unsigned long *trampoline_32bit;
> >
> > -extern void trampoline_32bit_src(void *return_ptr);
> > +extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
> >
> >  #endif /* __ASSEMBLER__ */
> >  #endif /* BOOT_COMPRESSED_PAGETABLE_H */
> > --
> > 2.39.2
> >

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
  2023-06-07  7:23 ` [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub Ard Biesheuvel
@ 2023-06-07 20:19   ` Yunhong Jiang
  2023-06-07 20:31     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Yunhong Jiang @ 2023-06-07 20:19 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 09:23:35AM +0200, Ard Biesheuvel wrote:
> In preparation for updating the EFI stub boot flow to avoid the bare
> metal decompressor code altogether, implement the support code for
> switching between 4 and 5 levels of paging before jumping to the kernel
> proper.
> 
> This reuses the newly refactored trampoline that the bare metal
> decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
> memory and remap it with the appropriate permissions. Given that the
> bare metal decompressor will no longer call into the trampoline if the
> number of paging levels is already set correctly, it is no longer needed
> to remove NX restrictions from the memory range where this trampoline
> may end up.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  drivers/firmware/efi/libstub/Makefile          |  1 +
>  drivers/firmware/efi/libstub/efi-stub-helper.c |  2 +
>  drivers/firmware/efi/libstub/efistub.h         |  1 +
>  drivers/firmware/efi/libstub/x86-5lvl.c        | 95 ++++++++++++++++++++
>  drivers/firmware/efi/libstub/x86-stub.c        | 40 +++------
>  drivers/firmware/efi/libstub/x86-stub.h        | 17 ++++
>  6 files changed, 130 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 16d64a34d1e19465..ae8874401a9f1490 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)	+= efi-stub.o string.o intrinsics.o systable.o \
>  lib-$(CONFIG_ARM)		+= arm32-stub.o
>  lib-$(CONFIG_ARM64)		+= arm64.o arm64-stub.o smbios.o
>  lib-$(CONFIG_X86)		+= x86-stub.o
> +lib-$(CONFIG_X86_64)		+= x86-5lvl.o
>  lib-$(CONFIG_RISCV)		+= riscv.o riscv-stub.o
>  lib-$(CONFIG_LOONGARCH)		+= loongarch.o loongarch-stub.o
>  
> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 1e0203d74691ffcc..51779279fbff21b5 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
>  			efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
>  		} else if (!strcmp(param, "noinitrd")) {
>  			efi_noinitrd = true;
> +		} else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
> +			efi_no5lvl = true;
>  		} else if (!strcmp(param, "efi") && val) {
>  			efi_nochunk = parse_option_str(val, "nochunk");
>  			efi_novamap |= parse_option_str(val, "novamap");
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 6aa38a1bf1265d83..06b7abc92ced9e18 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -33,6 +33,7 @@
>  #define EFI_ALLOC_LIMIT		ULONG_MAX
>  #endif
>  
> +extern bool efi_no5lvl;
>  extern bool efi_nochunk;
>  extern bool efi_nokaslr;
>  extern int efi_loglevel;
> diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> new file mode 100644
> index 0000000000000000..2428578a3ae08be7
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/efi.h>
> +
> +#include <asm/boot.h>
> +#include <asm/desc.h>
> +#include <asm/efi.h>
> +
> +#include "efistub.h"
> +#include "x86-stub.h"
> +
> +bool efi_no5lvl;
> +
> +static void (*la57_toggle)(void *trampoline, bool enable_5lvl);

As an ack to my comments to another patch, would it makes more sense to rename
the trampoline parameter to newcr3 and pass the address of the new page table,
instead of the trampoline start address?

> +
> +static const struct desc_struct gdt[] = {
> +	[GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
> +	[GDT_ENTRY_KERNEL_CS]   = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
> +};
> +
> +/*
> + * Enabling (or disabling) 5 level paging is tricky, because it can only be
> + * done from 32-bit mode with paging disabled. This means not only that the
> + * code itself must be running from 32-bit addressable physical memory, but
> + * also that the root page table must be 32-bit addressable, as programming
> + * a 64-bit value into CR3 when running in 32-bit mode is not supported.
> + */
> +efi_status_t efi_setup_5level_paging(void)
> +{
> +	u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
> +	efi_status_t status;
> +	u8 *la57_code;
> +
> +	if (!efi_is_64bit())
> +		return EFI_SUCCESS;
> +
> +	/* check for 5 level paging support */
> +	if (native_cpuid_eax(0) < 7 ||
> +	    !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> +		return EFI_SUCCESS;
> +
Do we need to check the need_toggle here instead of at efi_5level_switch and
skip the whole setup if no need to switch the paging level? Sorry if I missed
any point.

> +	/* allocate some 32-bit addressable memory for code and a page table */
> +	status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
> +				    U32_MAX);
> +	if (status != EFI_SUCCESS)
> +		return status;
> +
> +	la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
> +	memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
> +
> +	/*
> +	 * To avoid the need to allocate a 32-bit addressable stack, the
> +	 * trampoline uses a LJMP instruction to switch back to long mode.
> +	 * LJMP takes an absolute destination address, which needs to be
> +	 * fixed up at runtime.
> +	 */
> +	*(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
> +
> +	efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
> +
> +	return EFI_SUCCESS;
> +}
> +
> +void efi_5level_switch(void)
> +{
> +	bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
> +	bool have_la57 = native_read_cr4() & X86_CR4_LA57;
> +	bool need_toggle = want_la57 ^ have_la57;
> +	u64 *pgt = (void *)la57_toggle + PAGE_SIZE;

Not sure if we can decouple this address assumption of the pgt and la57_toggle,
and keep the pgt as a variable, like la57_toggle, setup by
efi_setup_5level_paging() too.
Asking because with the Intel X86-S
(https://cdrdv2-public.intel.com/776648/x86s-EAS-v1-4-17-23-1.pdf), no
tramopline code is needed since the 4/5 level paging switch does not require
paging disabling. Of course, it's ok to keep this as is, and we can change
late when we begin working on X86-S support.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
  2023-06-07 20:19   ` Yunhong Jiang
@ 2023-06-07 20:31     ` Ard Biesheuvel
  2023-06-08  0:43       ` Yunhong Jiang
  0 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-07 20:31 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, 7 Jun 2023 at 22:19, Yunhong Jiang
<yunhong.jiang@linux.intel.com> wrote:
>
> On Wed, Jun 07, 2023 at 09:23:35AM +0200, Ard Biesheuvel wrote:
> > In preparation for updating the EFI stub boot flow to avoid the bare
> > metal decompressor code altogether, implement the support code for
> > switching between 4 and 5 levels of paging before jumping to the kernel
> > proper.
> >
> > This reuses the newly refactored trampoline that the bare metal
> > decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
> > memory and remap it with the appropriate permissions. Given that the
> > bare metal decompressor will no longer call into the trampoline if the
> > number of paging levels is already set correctly, it is no longer needed
> > to remove NX restrictions from the memory range where this trampoline
> > may end up.
> >
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  drivers/firmware/efi/libstub/Makefile          |  1 +
> >  drivers/firmware/efi/libstub/efi-stub-helper.c |  2 +
> >  drivers/firmware/efi/libstub/efistub.h         |  1 +
> >  drivers/firmware/efi/libstub/x86-5lvl.c        | 95 ++++++++++++++++++++
> >  drivers/firmware/efi/libstub/x86-stub.c        | 40 +++------
> >  drivers/firmware/efi/libstub/x86-stub.h        | 17 ++++
> >  6 files changed, 130 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 16d64a34d1e19465..ae8874401a9f1490 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)      += efi-stub.o string.o intrinsics.o systable.o \
> >  lib-$(CONFIG_ARM)            += arm32-stub.o
> >  lib-$(CONFIG_ARM64)          += arm64.o arm64-stub.o smbios.o
> >  lib-$(CONFIG_X86)            += x86-stub.o
> > +lib-$(CONFIG_X86_64)         += x86-5lvl.o
> >  lib-$(CONFIG_RISCV)          += riscv.o riscv-stub.o
> >  lib-$(CONFIG_LOONGARCH)              += loongarch.o loongarch-stub.o
> >
> > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > index 1e0203d74691ffcc..51779279fbff21b5 100644
> > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > @@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
> >                       efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
> >               } else if (!strcmp(param, "noinitrd")) {
> >                       efi_noinitrd = true;
> > +             } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
> > +                     efi_no5lvl = true;
> >               } else if (!strcmp(param, "efi") && val) {
> >                       efi_nochunk = parse_option_str(val, "nochunk");
> >                       efi_novamap |= parse_option_str(val, "novamap");
> > diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> > index 6aa38a1bf1265d83..06b7abc92ced9e18 100644
> > --- a/drivers/firmware/efi/libstub/efistub.h
> > +++ b/drivers/firmware/efi/libstub/efistub.h
> > @@ -33,6 +33,7 @@
> >  #define EFI_ALLOC_LIMIT              ULONG_MAX
> >  #endif
> >
> > +extern bool efi_no5lvl;
> >  extern bool efi_nochunk;
> >  extern bool efi_nokaslr;
> >  extern int efi_loglevel;
> > diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> > new file mode 100644
> > index 0000000000000000..2428578a3ae08be7
> > --- /dev/null
> > +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> > @@ -0,0 +1,95 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +#include <linux/efi.h>
> > +
> > +#include <asm/boot.h>
> > +#include <asm/desc.h>
> > +#include <asm/efi.h>
> > +
> > +#include "efistub.h"
> > +#include "x86-stub.h"
> > +
> > +bool efi_no5lvl;
> > +
> > +static void (*la57_toggle)(void *trampoline, bool enable_5lvl);
>
> As an ack to my comments to another patch, would it makes more sense to rename
> the trampoline parameter to newcr3 and pass the address of the new page table,
> instead of the trampoline start address?
>

Perhaps, but please realise that my goal here was not to invent an API
from scratch. There was existing code that I made minimal changes to
in order to be able to reuse it.

If this needs further changes, you can always send follow-up patches.

> > +
> > +static const struct desc_struct gdt[] = {
> > +     [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
> > +     [GDT_ENTRY_KERNEL_CS]   = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
> > +};
> > +
> > +/*
> > + * Enabling (or disabling) 5 level paging is tricky, because it can only be
> > + * done from 32-bit mode with paging disabled. This means not only that the
> > + * code itself must be running from 32-bit addressable physical memory, but
> > + * also that the root page table must be 32-bit addressable, as programming
> > + * a 64-bit value into CR3 when running in 32-bit mode is not supported.
> > + */
> > +efi_status_t efi_setup_5level_paging(void)
> > +{
> > +     u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
> > +     efi_status_t status;
> > +     u8 *la57_code;
> > +
> > +     if (!efi_is_64bit())
> > +             return EFI_SUCCESS;
> > +
> > +     /* check for 5 level paging support */
> > +     if (native_cpuid_eax(0) < 7 ||
> > +         !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> > +             return EFI_SUCCESS;
> > +
> Do we need to check the need_toggle here instead of at efi_5level_switch and
> skip the whole setup if no need to switch the paging level? Sorry if I missed
> any point.
>

No. There are reasons why firmware might run with 5 levels, and switch
to 4 levels at ExitBootServices() time.

> > +     /* allocate some 32-bit addressable memory for code and a page table */
> > +     status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
> > +                                 U32_MAX);
> > +     if (status != EFI_SUCCESS)
> > +             return status;
> > +
> > +     la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
> > +     memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
> > +
> > +     /*
> > +      * To avoid the need to allocate a 32-bit addressable stack, the
> > +      * trampoline uses a LJMP instruction to switch back to long mode.
> > +      * LJMP takes an absolute destination address, which needs to be
> > +      * fixed up at runtime.
> > +      */
> > +     *(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
> > +
> > +     efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
> > +
> > +     return EFI_SUCCESS;
> > +}
> > +
> > +void efi_5level_switch(void)
> > +{
> > +     bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
> > +     bool have_la57 = native_read_cr4() & X86_CR4_LA57;
> > +     bool need_toggle = want_la57 ^ have_la57;
> > +     u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
>
> Not sure if we can decouple this address assumption of the pgt and la57_toggle,
> and keep the pgt as a variable, like la57_toggle, setup by
> efi_setup_5level_paging() too.
> Asking because with the Intel X86-S
> (https://cdrdv2-public.intel.com/776648/x86s-EAS-v1-4-17-23-1.pdf), no
> tramopline code is needed since the 4/5 level paging switch does not require
> paging disabling. Of course, it's ok to keep this as is, and we can change
> late when we begin working on X86-S support.

We can make further changes as needed. The current interface is based
on the existing code.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
  2023-06-07 20:31     ` Ard Biesheuvel
@ 2023-06-08  0:43       ` Yunhong Jiang
  2023-06-08  6:34         ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Yunhong Jiang @ 2023-06-08  0:43 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 10:31:43PM +0200, Ard Biesheuvel wrote:
> On Wed, 7 Jun 2023 at 22:19, Yunhong Jiang
> <yunhong.jiang@linux.intel.com> wrote:
> >
> > On Wed, Jun 07, 2023 at 09:23:35AM +0200, Ard Biesheuvel wrote:
> > > In preparation for updating the EFI stub boot flow to avoid the bare
> > > metal decompressor code altogether, implement the support code for
> > > switching between 4 and 5 levels of paging before jumping to the kernel
> > > proper.
> > >
> > > This reuses the newly refactored trampoline that the bare metal
> > > decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
> > > memory and remap it with the appropriate permissions. Given that the
> > > bare metal decompressor will no longer call into the trampoline if the
> > > number of paging levels is already set correctly, it is no longer needed
> > > to remove NX restrictions from the memory range where this trampoline
> > > may end up.
> > >
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  drivers/firmware/efi/libstub/Makefile          |  1 +
> > >  drivers/firmware/efi/libstub/efi-stub-helper.c |  2 +
> > >  drivers/firmware/efi/libstub/efistub.h         |  1 +
> > >  drivers/firmware/efi/libstub/x86-5lvl.c        | 95 ++++++++++++++++++++
> > >  drivers/firmware/efi/libstub/x86-stub.c        | 40 +++------
> > >  drivers/firmware/efi/libstub/x86-stub.h        | 17 ++++
> > >  6 files changed, 130 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > > index 16d64a34d1e19465..ae8874401a9f1490 100644
> > > --- a/drivers/firmware/efi/libstub/Makefile
> > > +++ b/drivers/firmware/efi/libstub/Makefile
> > > @@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)      += efi-stub.o string.o intrinsics.o systable.o \
> > >  lib-$(CONFIG_ARM)            += arm32-stub.o
> > >  lib-$(CONFIG_ARM64)          += arm64.o arm64-stub.o smbios.o
> > >  lib-$(CONFIG_X86)            += x86-stub.o
> > > +lib-$(CONFIG_X86_64)         += x86-5lvl.o
> > >  lib-$(CONFIG_RISCV)          += riscv.o riscv-stub.o
> > >  lib-$(CONFIG_LOONGARCH)              += loongarch.o loongarch-stub.o
> > >
> > > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > index 1e0203d74691ffcc..51779279fbff21b5 100644
> > > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > @@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
> > >                       efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
> > >               } else if (!strcmp(param, "noinitrd")) {
> > >                       efi_noinitrd = true;
> > > +             } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
> > > +                     efi_no5lvl = true;
> > >               } else if (!strcmp(param, "efi") && val) {
> > >                       efi_nochunk = parse_option_str(val, "nochunk");
> > >                       efi_novamap |= parse_option_str(val, "novamap");
> > > diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> > > index 6aa38a1bf1265d83..06b7abc92ced9e18 100644
> > > --- a/drivers/firmware/efi/libstub/efistub.h
> > > +++ b/drivers/firmware/efi/libstub/efistub.h
> > > @@ -33,6 +33,7 @@
> > >  #define EFI_ALLOC_LIMIT              ULONG_MAX
> > >  #endif
> > >
> > > +extern bool efi_no5lvl;
> > >  extern bool efi_nochunk;
> > >  extern bool efi_nokaslr;
> > >  extern int efi_loglevel;
> > > diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> > > new file mode 100644
> > > index 0000000000000000..2428578a3ae08be7
> > > --- /dev/null
> > > +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> > > @@ -0,0 +1,95 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +#include <linux/efi.h>
> > > +
> > > +#include <asm/boot.h>
> > > +#include <asm/desc.h>
> > > +#include <asm/efi.h>
> > > +
> > > +#include "efistub.h"
> > > +#include "x86-stub.h"
> > > +
> > > +bool efi_no5lvl;
> > > +
> > > +static void (*la57_toggle)(void *trampoline, bool enable_5lvl);
> >
> > As an ack to my comments to another patch, would it makes more sense to rename
> > the trampoline parameter to newcr3 and pass the address of the new page table,
> > instead of the trampoline start address?
> >
> 
> Perhaps, but please realise that my goal here was not to invent an API
> from scratch. There was existing code that I made minimal changes to
> in order to be able to reuse it.
> 
> If this needs further changes, you can always send follow-up patches.

Sure, will do that as follow up patches.
> 
> > > +
> > > +static const struct desc_struct gdt[] = {
> > > +     [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
> > > +     [GDT_ENTRY_KERNEL_CS]   = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
> > > +};
> > > +
> > > +/*
> > > + * Enabling (or disabling) 5 level paging is tricky, because it can only be
> > > + * done from 32-bit mode with paging disabled. This means not only that the
> > > + * code itself must be running from 32-bit addressable physical memory, but
> > > + * also that the root page table must be 32-bit addressable, as programming
> > > + * a 64-bit value into CR3 when running in 32-bit mode is not supported.
> > > + */
> > > +efi_status_t efi_setup_5level_paging(void)
> > > +{
> > > +     u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
> > > +     efi_status_t status;
> > > +     u8 *la57_code;
> > > +
> > > +     if (!efi_is_64bit())
> > > +             return EFI_SUCCESS;
> > > +
> > > +     /* check for 5 level paging support */
> > > +     if (native_cpuid_eax(0) < 7 ||
> > > +         !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> > > +             return EFI_SUCCESS;
> > > +
> > Do we need to check the need_toggle here instead of at efi_5level_switch and
> > skip the whole setup if no need to switch the paging level? Sorry if I missed
> > any point.
> >
> 
> No. There are reasons why firmware might run with 5 levels, and switch
> to 4 levels at ExitBootServices() time.

The need_toggle check at efi_5level_switch(), "need_toggle = want_la57 ^
have_la57", should cover this scenario, right? If we check need_toggle on
efi_setup_5level_paging() and it's false, then we don't need the setup in
efi_setup_5level_paging(), right? I don't see the  la57_toggle() called on other
places.

Or I misunderstand your response?

> 
> > > +     /* allocate some 32-bit addressable memory for code and a page table */
> > > +     status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
> > > +                                 U32_MAX);
> > > +     if (status != EFI_SUCCESS)
> > > +             return status;
> > > +
> > > +     la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
> > > +     memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
> > > +
> > > +     /*
> > > +      * To avoid the need to allocate a 32-bit addressable stack, the
> > > +      * trampoline uses a LJMP instruction to switch back to long mode.
> > > +      * LJMP takes an absolute destination address, which needs to be
> > > +      * fixed up at runtime.
> > > +      */
> > > +     *(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
> > > +
> > > +     efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
> > > +
> > > +     return EFI_SUCCESS;
> > > +}
> > > +
> > > +void efi_5level_switch(void)
> > > +{
> > > +     bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
> > > +     bool have_la57 = native_read_cr4() & X86_CR4_LA57;
> > > +     bool need_toggle = want_la57 ^ have_la57;
> > > +     u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
> >
> > Not sure if we can decouple this address assumption of the pgt and la57_toggle,
> > and keep the pgt as a variable, like la57_toggle, setup by
> > efi_setup_5level_paging() too.
> > Asking because with the Intel X86-S
> > (https://cdrdv2-public.intel.com/776648/x86s-EAS-v1-4-17-23-1.pdf), no
> > tramopline code is needed since the 4/5 level paging switch does not require
> > paging disabling. Of course, it's ok to keep this as is, and we can change
> > late when we begin working on X86-S support.
> 
> We can make further changes as needed. The current interface is based
> on the existing code.

Sure. Will do the change in future.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
  2023-06-08  0:43       ` Yunhong Jiang
@ 2023-06-08  6:34         ` Ard Biesheuvel
  2023-06-08 16:10           ` Yunhong Jiang
  0 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-08  6:34 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Thu, 8 Jun 2023 at 02:43, Yunhong Jiang
<yunhong.jiang@linux.intel.com> wrote:
>
> On Wed, Jun 07, 2023 at 10:31:43PM +0200, Ard Biesheuvel wrote:
> > On Wed, 7 Jun 2023 at 22:19, Yunhong Jiang
> > <yunhong.jiang@linux.intel.com> wrote:
> > >
> > > On Wed, Jun 07, 2023 at 09:23:35AM +0200, Ard Biesheuvel wrote:
...
> > > > +efi_status_t efi_setup_5level_paging(void)
> > > > +{
> > > > +     u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
> > > > +     efi_status_t status;
> > > > +     u8 *la57_code;
> > > > +
> > > > +     if (!efi_is_64bit())
> > > > +             return EFI_SUCCESS;
> > > > +
> > > > +     /* check for 5 level paging support */
> > > > +     if (native_cpuid_eax(0) < 7 ||
> > > > +         !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> > > > +             return EFI_SUCCESS;
> > > > +
> > > Do we need to check the need_toggle here instead of at efi_5level_switch and
> > > skip the whole setup if no need to switch the paging level? Sorry if I missed
> > > any point.
> > >
> >
> > No. There are reasons why firmware might run with 5 levels, and switch
> > to 4 levels at ExitBootServices() time.
>
> The need_toggle check at efi_5level_switch(), "need_toggle = want_la57 ^
> have_la57", should cover this scenario, right? If we check need_toggle on
> efi_setup_5level_paging() and it's false, then we don't need the setup in
> efi_setup_5level_paging(), right? I don't see the  la57_toggle() called on other
> places.
>
> Or I misunderstand your response?
>

The actual, current number of paging levels could be 5 during
efi_setup_5level_paging() and 4 during efi_5level_switch(). So whether
we need to toggle can only be decided at switch time, at which point
we can no longer allocate memory. So the allocation logic in
efi_setup_5level_paging() cannot depend on the actual number of
levels, only on whether or not 5 level paging is supported at all (in
which case a switch is never needed)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code
  2023-06-07 18:09   ` Yunhong Jiang
@ 2023-06-08  8:04     ` Ard Biesheuvel
  2023-06-08 18:15       ` Yunhong Jiang
  0 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-08  8:04 UTC (permalink / raw)
  To: Yunhong Jiang
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Wed, 7 Jun 2023 at 20:09, Yunhong Jiang
<yunhong.jiang@linux.intel.com> wrote:
>
> On Wed, Jun 07, 2023 at 09:23:32AM +0200, Ard Biesheuvel wrote:
> > Instead of returning to the asm calling code to invoke the trampoline,
> > call it straight from the C code that sets the scene. That way, the
> > struct return type is no longer needed for returning two values, and the
> > call can be made conditional more cleanly in a subsequent patch.
> >
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/boot/compressed/head_64.S    | 20 +++-----------
> >  arch/x86/boot/compressed/pgtable_64.c | 28 ++++++++------------
> >  2 files changed, 15 insertions(+), 33 deletions(-)
> >
...
> > diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> > index d66639c961b8eeda..1d28ad95ea839531 100644
> > --- a/arch/x86/boot/compressed/pgtable_64.c
> > +++ b/arch/x86/boot/compressed/pgtable_64.c
> > @@ -16,11 +16,6 @@ unsigned int __section(".data") pgdir_shift = 39;
> >  unsigned int __section(".data") ptrs_per_p4d = 1;
> >  #endif
> >
> > -struct paging_config {
> > -     unsigned long trampoline_start;
> > -     unsigned long l5_required;
> > -};
> > -
> >  /* Buffer to preserve trampoline memory */
> >  static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
> >
> > @@ -106,10 +101,10 @@ static unsigned long find_trampoline_placement(void)
> >       return bios_start - TRAMPOLINE_32BIT_SIZE;
> >  }
> >
> > -struct paging_config paging_prepare(void *rmode)
> > +asmlinkage void set_paging_levels(void *rmode)
>
> Can you please change the refer to paging_prepare() in the comments above also?

You mean the below, right?

--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -24,7 +24,7 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
  * purposes.
  *
  * Avoid putting the pointer into .bss as it will be cleared between
- * paging_prepare() and extract_kernel().
+ * set_paging_levels() and extract_kernel().
  */
 unsigned long *trampoline_32bit __section(".data");

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
  2023-06-08  6:34         ` Ard Biesheuvel
@ 2023-06-08 16:10           ` Yunhong Jiang
  0 siblings, 0 replies; 42+ messages in thread
From: Yunhong Jiang @ 2023-06-08 16:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Thu, Jun 08, 2023 at 08:34:38AM +0200, Ard Biesheuvel wrote:
> On Thu, 8 Jun 2023 at 02:43, Yunhong Jiang
> <yunhong.jiang@linux.intel.com> wrote:
> >
> > On Wed, Jun 07, 2023 at 10:31:43PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 7 Jun 2023 at 22:19, Yunhong Jiang
> > > <yunhong.jiang@linux.intel.com> wrote:
> > > >
> > > > On Wed, Jun 07, 2023 at 09:23:35AM +0200, Ard Biesheuvel wrote:
> ...
> > > > > +efi_status_t efi_setup_5level_paging(void)
> > > > > +{
> > > > > +     u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
> > > > > +     efi_status_t status;
> > > > > +     u8 *la57_code;
> > > > > +
> > > > > +     if (!efi_is_64bit())
> > > > > +             return EFI_SUCCESS;
> > > > > +
> > > > > +     /* check for 5 level paging support */
> > > > > +     if (native_cpuid_eax(0) < 7 ||
> > > > > +         !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> > > > > +             return EFI_SUCCESS;
> > > > > +
> > > > Do we need to check the need_toggle here instead of at efi_5level_switch and
> > > > skip the whole setup if no need to switch the paging level? Sorry if I missed
> > > > any point.
> > > >
> > >
> > > No. There are reasons why firmware might run with 5 levels, and switch
> > > to 4 levels at ExitBootServices() time.
> >
> > The need_toggle check at efi_5level_switch(), "need_toggle = want_la57 ^
> > have_la57", should cover this scenario, right? If we check need_toggle on
> > efi_setup_5level_paging() and it's false, then we don't need the setup in
> > efi_setup_5level_paging(), right? I don't see the  la57_toggle() called on other
> > places.
> >
> > Or I misunderstand your response?
> >
> 
> The actual, current number of paging levels could be 5 during
> efi_setup_5level_paging() and 4 during efi_5level_switch(). So whether
> we need to toggle can only be decided at switch time, at which point
> we can no longer allocate memory. So the allocation logic in
> efi_setup_5level_paging() cannot depend on the actual number of
> levels, only on whether or not 5 level paging is supported at all (in
> which case a switch is never needed)

Oh, I didn't realize that. Thank you for the clarification.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code
  2023-06-08  8:04     ` Ard Biesheuvel
@ 2023-06-08 18:15       ` Yunhong Jiang
  0 siblings, 0 replies; 42+ messages in thread
From: Yunhong Jiang @ 2023-06-08 18:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Borislav Petkov,
	Andy Lutomirski, Dave Hansen, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Alexey Khoroshilov, Peter Jones, Gerd Hoffmann,
	Dave Young, Mario Limonciello, Kees Cook, Tom Lendacky,
	Kirill A . Shutemov, Linus Torvalds, Joerg Roedel

On Thu, Jun 08, 2023 at 10:04:43AM +0200, Ard Biesheuvel wrote:
> On Wed, 7 Jun 2023 at 20:09, Yunhong Jiang
> <yunhong.jiang@linux.intel.com> wrote:
> >
> > On Wed, Jun 07, 2023 at 09:23:32AM +0200, Ard Biesheuvel wrote:
> > > Instead of returning to the asm calling code to invoke the trampoline,
> > > call it straight from the C code that sets the scene. That way, the
> > > struct return type is no longer needed for returning two values, and the
> > > call can be made conditional more cleanly in a subsequent patch.
> > >
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/x86/boot/compressed/head_64.S    | 20 +++-----------
> > >  arch/x86/boot/compressed/pgtable_64.c | 28 ++++++++------------
> > >  2 files changed, 15 insertions(+), 33 deletions(-)
> > >
> ...
> > > diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> > > index d66639c961b8eeda..1d28ad95ea839531 100644
> > > --- a/arch/x86/boot/compressed/pgtable_64.c
> > > +++ b/arch/x86/boot/compressed/pgtable_64.c
> > > @@ -16,11 +16,6 @@ unsigned int __section(".data") pgdir_shift = 39;
> > >  unsigned int __section(".data") ptrs_per_p4d = 1;
> > >  #endif
> > >
> > > -struct paging_config {
> > > -     unsigned long trampoline_start;
> > > -     unsigned long l5_required;
> > > -};
> > > -
> > >  /* Buffer to preserve trampoline memory */
> > >  static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
> > >
> > > @@ -106,10 +101,10 @@ static unsigned long find_trampoline_placement(void)
> > >       return bios_start - TRAMPOLINE_32BIT_SIZE;
> > >  }
> > >
> > > -struct paging_config paging_prepare(void *rmode)
> > > +asmlinkage void set_paging_levels(void *rmode)
> >
> > Can you please change the refer to paging_prepare() in the comments above also?
> 
> You mean the below, right?
> 
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -24,7 +24,7 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
>   * purposes.
>   *
>   * Avoid putting the pointer into .bss as it will be cleared between
> - * paging_prepare() and extract_kernel().
> + * set_paging_levels() and extract_kernel().
>   */
>  unsigned long *trampoline_32bit __section(".data");

Yes.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT
  2023-06-07  7:23 ` [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT Ard Biesheuvel
@ 2023-06-21 11:08   ` Borislav Petkov
  2023-06-23 14:00     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2023-06-21 11:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 09:23:27AM +0200, Ard Biesheuvel wrote:
> The 32-bit decompressor does not actually use a global offset table
> (GOT), but as is common for 32-bit position independent code, it uses
> the magic symbol _GLOBAL_OFFSET_TABLE_ as an anchor from which to derive
> the actual runtime addresses of other symbols, using special @GOTOFF
> symbol references that are resolved at link time, and populated with the
> distance between the address of the magic _GLOBAL_OFFSET_TABLE_ anchor
> and the address of the symbol in question.
> 
> This means _GLOBAL_OFFSET_TABLE_ is the only symbol whose actual runtime
> address needs to be determined explicitly, which is one of the first
> things that happens in startup_32. However, it does so by taking the
> absolute address via the immediate field of an ADD instruction (plus a
> small offset), which seems to defeat the point.
> 
> Fortunately, the assembler knows that _GLOBAL_OFFSET_TABLE_ is magic,
> and emits a special relative relocation instead, and so the resulting

Which special relocation do you mean?

This guy:

Relocation section '.rel.head.text' at offset 0x3a0 contains 12 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000010  00000d0a R_386_GOTPC       00000000   _GLOBAL_OFFSET_TABLE_

?

In any case, this thing came from

a2c4fc4d4e2c ("x86/boot: Remove run-time relocations from .head.text code")

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT
  2023-06-21 11:08   ` Borislav Petkov
@ 2023-06-23 14:00     ` Ard Biesheuvel
  2023-07-07 13:56       ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-06-23 14:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Wed, 21 Jun 2023 at 13:09, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jun 07, 2023 at 09:23:27AM +0200, Ard Biesheuvel wrote:
> > The 32-bit decompressor does not actually use a global offset table
> > (GOT), but as is common for 32-bit position independent code, it uses
> > the magic symbol _GLOBAL_OFFSET_TABLE_ as an anchor from which to derive
> > the actual runtime addresses of other symbols, using special @GOTOFF
> > symbol references that are resolved at link time, and populated with the
> > distance between the address of the magic _GLOBAL_OFFSET_TABLE_ anchor
> > and the address of the symbol in question.
> >
> > This means _GLOBAL_OFFSET_TABLE_ is the only symbol whose actual runtime
> > address needs to be determined explicitly, which is one of the first
> > things that happens in startup_32. However, it does so by taking the
> > absolute address via the immediate field of an ADD instruction (plus a
> > small offset), which seems to defeat the point.
> >
> > Fortunately, the assembler knows that _GLOBAL_OFFSET_TABLE_ is magic,
> > and emits a special relative relocation instead, and so the resulting
>
> Which special relocation do you mean?
>
> This guy:
>
> Relocation section '.rel.head.text' at offset 0x3a0 contains 12 entries:
>  Offset     Info    Type            Sym.Value  Sym. Name
> 00000010  00000d0a R_386_GOTPC       00000000   _GLOBAL_OFFSET_TABLE_
>
> ?

Yep.

if you assemble this

movl $_GLOBAL_OFFSET_TABLE_, %eax
movl $_GLOBAL_OFFSET_TABLE, %eax

you'll end up with

   0: b8 01 00 00 00        mov    $0x1,%eax
1: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
   5: b8 00 00 00 00        mov    $0x0,%eax
6: R_386_32 _GLOBAL_OFFSET_TABLE

So it is not possible to take the absolute address of
_GLOBAL_OFFSET_TABLE_ via an absolute relocation, you will always get
the relative offset instead.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT
  2023-06-23 14:00     ` Ard Biesheuvel
@ 2023-07-07 13:56       ` Borislav Petkov
  0 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2023-07-07 13:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Fri, Jun 23, 2023 at 04:00:30PM +0200, Ard Biesheuvel wrote:
> if you assemble this
> 
> movl $_GLOBAL_OFFSET_TABLE_, %eax
> movl $_GLOBAL_OFFSET_TABLE, %eax
> 
> you'll end up with
> 
>    0: b8 01 00 00 00        mov    $0x1,%eax
> 1: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
>    5: b8 00 00 00 00        mov    $0x0,%eax
> 6: R_386_32 _GLOBAL_OFFSET_TABLE
> 
> So it is not possible to take the absolute address of
> _GLOBAL_OFFSET_TABLE_ via an absolute relocation, you will always get
> the relative offset instead.

Right.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register
  2023-06-07  7:23 ` [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register Ard Biesheuvel
@ 2023-07-10  9:06   ` Borislav Petkov
  2023-07-10 21:55     ` Ard Biesheuvel
  0 siblings, 1 reply; 42+ messages in thread
From: Borislav Petkov @ 2023-07-10  9:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Wed, Jun 07, 2023 at 09:23:28AM +0200, Ard Biesheuvel wrote:
> Instead of pushing and popping %RSI several times to preserve the struct
> boot_params pointer across the execution of the startup code, move it
> into a callee save register before the first call into C, and copy it
> back when needed.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/boot/compressed/head_64.S | 34 +++++++-------------
>  1 file changed, 11 insertions(+), 23 deletions(-)

I like that.

We do a similar dance in arch/x86/kernel/head_64.S. Care to fix that
too, in a separate patch?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register
  2023-07-10  9:06   ` Borislav Petkov
@ 2023-07-10 21:55     ` Ard Biesheuvel
  2023-07-11  7:57       ` Borislav Petkov
  0 siblings, 1 reply; 42+ messages in thread
From: Ard Biesheuvel @ 2023-07-10 21:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Mon, 10 Jul 2023 at 11:07, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jun 07, 2023 at 09:23:28AM +0200, Ard Biesheuvel wrote:
> > Instead of pushing and popping %RSI several times to preserve the struct
> > boot_params pointer across the execution of the startup code, move it
> > into a callee save register before the first call into C, and copy it
> > back when needed.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/boot/compressed/head_64.S | 34 +++++++-------------
> >  1 file changed, 11 insertions(+), 23 deletions(-)
>
> I like that.
>
> We do a similar dance in arch/x86/kernel/head_64.S. Care to fix that
> too, in a separate patch?
>

I already did, actually, but I dropped it from this series because it
was getting too long, and not essential for the overall goal of the
changes.

https://lore.kernel.org/all/20230602101313.3557775-19-ardb@kernel.org/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register
  2023-07-10 21:55     ` Ard Biesheuvel
@ 2023-07-11  7:57       ` Borislav Petkov
  0 siblings, 0 replies; 42+ messages in thread
From: Borislav Petkov @ 2023-07-11  7:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-efi, linux-kernel, Evgeniy Baskov, Andy Lutomirski,
	Dave Hansen, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Alexey Khoroshilov, Peter Jones, Gerd Hoffmann, Dave Young,
	Mario Limonciello, Kees Cook, Tom Lendacky, Kirill A . Shutemov,
	Linus Torvalds, Joerg Roedel

On Mon, Jul 10, 2023 at 11:55:49PM +0200, Ard Biesheuvel wrote:
> I already did, actually, but I dropped it from this series because it
> was getting too long, and not essential for the overall goal of the
> changes.

Yeah, might wanna add it, though, for the simple reason that
compressed/head_64.S and kernel/head_64.S will otherwise differ in the
%rsi handling unnecessarily and people might wonder why.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-07-11  7:58 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code Ard Biesheuvel
2023-06-07 18:53   ` Borislav Petkov
2023-06-07 19:39     ` Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 02/20] x86/efistub: Simplify and clean up handover entry code Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 03/20] x86/decompressor: Avoid magic offsets for EFI handover entrypoint Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 04/20] x86/efistub: Clear BSS in EFI handover protocol entrypoint Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT Ard Biesheuvel
2023-06-21 11:08   ` Borislav Petkov
2023-06-23 14:00     ` Ard Biesheuvel
2023-07-07 13:56       ` Borislav Petkov
2023-06-07  7:23 ` [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register Ard Biesheuvel
2023-07-10  9:06   ` Borislav Petkov
2023-07-10 21:55     ` Ard Biesheuvel
2023-07-11  7:57       ` Borislav Petkov
2023-06-07  7:23 ` [PATCH v5 07/20] x86/decompressor: Call trampoline as a normal function Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline Ard Biesheuvel
2023-06-07 19:38   ` Yunhong Jiang
2023-06-07 20:07     ` Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 09/20] x86/decompressor: Avoid the need for a stack in the 32-bit trampoline Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code Ard Biesheuvel
2023-06-07 18:09   ` Yunhong Jiang
2023-06-08  8:04     ` Ard Biesheuvel
2023-06-08 18:15       ` Yunhong Jiang
2023-06-07  7:23 ` [PATCH v5 11/20] x86/decompressor: Only call the trampoline when changing paging levels Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 12/20] x86/decompressor: Merge trampoline cleanup with switching code Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub Ard Biesheuvel
2023-06-07 20:19   ` Yunhong Jiang
2023-06-07 20:31     ` Ard Biesheuvel
2023-06-08  0:43       ` Yunhong Jiang
2023-06-08  6:34         ` Ard Biesheuvel
2023-06-08 16:10           ` Yunhong Jiang
2023-06-07  7:23 ` [PATCH v5 14/20] x86/efistub: Prefer EFI memory attributes protocol over DXE services Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 15/20] decompress: Use 8 byte alignment Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 16/20] x86/decompressor: Move global symbol references to C code Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 17/20] x86/decompressor: Factor out kernel decompression and relocation Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 18/20] efi/libstub: Add limit argument to efi_random_alloc() Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware Ard Biesheuvel
2023-06-07 16:07   ` Tom Lendacky
2023-06-07 16:51     ` Ard Biesheuvel
2023-06-07 17:29       ` Tom Lendacky
2023-06-07  7:23 ` [PATCH v5 20/20] x86/efistub: Avoid legacy decompressor when doing EFI boot Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).