linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yunhong Jiang <yunhong.jiang@linux.intel.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Evgeniy Baskov <baskov@ispras.ru>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alexey Khoroshilov <khoroshilov@ispras.ru>,
	Peter Jones <pjones@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>, Dave Young <dyoung@redhat.com>,
	Mario Limonciello <mario.limonciello@amd.com>,
	Kees Cook <keescook@chromium.org>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Joerg Roedel <jroedel@suse.de>
Subject: Re: [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub
Date: Wed, 7 Jun 2023 13:19:24 -0700	[thread overview]
Message-ID: <20230607201924.GD3110@yjiang5-mobl.amr.corp.intel.com> (raw)
In-Reply-To: <20230607072342.4054036-14-ardb@kernel.org>

On Wed, Jun 07, 2023 at 09:23:35AM +0200, Ard Biesheuvel wrote:
> In preparation for updating the EFI stub boot flow to avoid the bare
> metal decompressor code altogether, implement the support code for
> switching between 4 and 5 levels of paging before jumping to the kernel
> proper.
> 
> This reuses the newly refactored trampoline that the bare metal
> decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
> memory and remap it with the appropriate permissions. Given that the
> bare metal decompressor will no longer call into the trampoline if the
> number of paging levels is already set correctly, it is no longer needed
> to remove NX restrictions from the memory range where this trampoline
> may end up.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  drivers/firmware/efi/libstub/Makefile          |  1 +
>  drivers/firmware/efi/libstub/efi-stub-helper.c |  2 +
>  drivers/firmware/efi/libstub/efistub.h         |  1 +
>  drivers/firmware/efi/libstub/x86-5lvl.c        | 95 ++++++++++++++++++++
>  drivers/firmware/efi/libstub/x86-stub.c        | 40 +++------
>  drivers/firmware/efi/libstub/x86-stub.h        | 17 ++++
>  6 files changed, 130 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 16d64a34d1e19465..ae8874401a9f1490 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)	+= efi-stub.o string.o intrinsics.o systable.o \
>  lib-$(CONFIG_ARM)		+= arm32-stub.o
>  lib-$(CONFIG_ARM64)		+= arm64.o arm64-stub.o smbios.o
>  lib-$(CONFIG_X86)		+= x86-stub.o
> +lib-$(CONFIG_X86_64)		+= x86-5lvl.o
>  lib-$(CONFIG_RISCV)		+= riscv.o riscv-stub.o
>  lib-$(CONFIG_LOONGARCH)		+= loongarch.o loongarch-stub.o
>  
> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 1e0203d74691ffcc..51779279fbff21b5 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
>  			efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
>  		} else if (!strcmp(param, "noinitrd")) {
>  			efi_noinitrd = true;
> +		} else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
> +			efi_no5lvl = true;
>  		} else if (!strcmp(param, "efi") && val) {
>  			efi_nochunk = parse_option_str(val, "nochunk");
>  			efi_novamap |= parse_option_str(val, "novamap");
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 6aa38a1bf1265d83..06b7abc92ced9e18 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -33,6 +33,7 @@
>  #define EFI_ALLOC_LIMIT		ULONG_MAX
>  #endif
>  
> +extern bool efi_no5lvl;
>  extern bool efi_nochunk;
>  extern bool efi_nokaslr;
>  extern int efi_loglevel;
> diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> new file mode 100644
> index 0000000000000000..2428578a3ae08be7
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/efi.h>
> +
> +#include <asm/boot.h>
> +#include <asm/desc.h>
> +#include <asm/efi.h>
> +
> +#include "efistub.h"
> +#include "x86-stub.h"
> +
> +bool efi_no5lvl;
> +
> +static void (*la57_toggle)(void *trampoline, bool enable_5lvl);

As an ack to my comments to another patch, would it makes more sense to rename
the trampoline parameter to newcr3 and pass the address of the new page table,
instead of the trampoline start address?

> +
> +static const struct desc_struct gdt[] = {
> +	[GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
> +	[GDT_ENTRY_KERNEL_CS]   = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
> +};
> +
> +/*
> + * Enabling (or disabling) 5 level paging is tricky, because it can only be
> + * done from 32-bit mode with paging disabled. This means not only that the
> + * code itself must be running from 32-bit addressable physical memory, but
> + * also that the root page table must be 32-bit addressable, as programming
> + * a 64-bit value into CR3 when running in 32-bit mode is not supported.
> + */
> +efi_status_t efi_setup_5level_paging(void)
> +{
> +	u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
> +	efi_status_t status;
> +	u8 *la57_code;
> +
> +	if (!efi_is_64bit())
> +		return EFI_SUCCESS;
> +
> +	/* check for 5 level paging support */
> +	if (native_cpuid_eax(0) < 7 ||
> +	    !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
> +		return EFI_SUCCESS;
> +
Do we need to check the need_toggle here instead of at efi_5level_switch and
skip the whole setup if no need to switch the paging level? Sorry if I missed
any point.

> +	/* allocate some 32-bit addressable memory for code and a page table */
> +	status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
> +				    U32_MAX);
> +	if (status != EFI_SUCCESS)
> +		return status;
> +
> +	la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
> +	memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
> +
> +	/*
> +	 * To avoid the need to allocate a 32-bit addressable stack, the
> +	 * trampoline uses a LJMP instruction to switch back to long mode.
> +	 * LJMP takes an absolute destination address, which needs to be
> +	 * fixed up at runtime.
> +	 */
> +	*(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
> +
> +	efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
> +
> +	return EFI_SUCCESS;
> +}
> +
> +void efi_5level_switch(void)
> +{
> +	bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
> +	bool have_la57 = native_read_cr4() & X86_CR4_LA57;
> +	bool need_toggle = want_la57 ^ have_la57;
> +	u64 *pgt = (void *)la57_toggle + PAGE_SIZE;

Not sure if we can decouple this address assumption of the pgt and la57_toggle,
and keep the pgt as a variable, like la57_toggle, setup by
efi_setup_5level_paging() too.
Asking because with the Intel X86-S
(https://cdrdv2-public.intel.com/776648/x86s-EAS-v1-4-17-23-1.pdf), no
tramopline code is needed since the 4/5 level paging switch does not require
paging disabling. Of course, it's ok to keep this as is, and we can change
late when we begin working on X86-S support.

  reply	other threads:[~2023-06-07 20:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-07  7:23 [PATCH v5 00/20] efi/x86: Avoid bare metal decompressor during EFI boot Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 01/20] x86/efistub: Branch straight to kernel entry point from C code Ard Biesheuvel
2023-06-07 18:53   ` Borislav Petkov
2023-06-07 19:39     ` Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 02/20] x86/efistub: Simplify and clean up handover entry code Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 03/20] x86/decompressor: Avoid magic offsets for EFI handover entrypoint Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 04/20] x86/efistub: Clear BSS in EFI handover protocol entrypoint Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 05/20] x86/decompressor: Use proper sequence to take the address of the GOT Ard Biesheuvel
2023-06-21 11:08   ` Borislav Petkov
2023-06-23 14:00     ` Ard Biesheuvel
2023-07-07 13:56       ` Borislav Petkov
2023-06-07  7:23 ` [PATCH v5 06/20] x86/decompressor: Store boot_params pointer in callee save register Ard Biesheuvel
2023-07-10  9:06   ` Borislav Petkov
2023-07-10 21:55     ` Ard Biesheuvel
2023-07-11  7:57       ` Borislav Petkov
2023-06-07  7:23 ` [PATCH v5 07/20] x86/decompressor: Call trampoline as a normal function Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 08/20] x86/decompressor: Use standard calling convention for trampoline Ard Biesheuvel
2023-06-07 19:38   ` Yunhong Jiang
2023-06-07 20:07     ` Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 09/20] x86/decompressor: Avoid the need for a stack in the 32-bit trampoline Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 10/20] x86/decompressor: Call trampoline directly from C code Ard Biesheuvel
2023-06-07 18:09   ` Yunhong Jiang
2023-06-08  8:04     ` Ard Biesheuvel
2023-06-08 18:15       ` Yunhong Jiang
2023-06-07  7:23 ` [PATCH v5 11/20] x86/decompressor: Only call the trampoline when changing paging levels Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 12/20] x86/decompressor: Merge trampoline cleanup with switching code Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 13/20] x86/efistub: Perform 4/5 level paging switch from the stub Ard Biesheuvel
2023-06-07 20:19   ` Yunhong Jiang [this message]
2023-06-07 20:31     ` Ard Biesheuvel
2023-06-08  0:43       ` Yunhong Jiang
2023-06-08  6:34         ` Ard Biesheuvel
2023-06-08 16:10           ` Yunhong Jiang
2023-06-07  7:23 ` [PATCH v5 14/20] x86/efistub: Prefer EFI memory attributes protocol over DXE services Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 15/20] decompress: Use 8 byte alignment Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 16/20] x86/decompressor: Move global symbol references to C code Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 17/20] x86/decompressor: Factor out kernel decompression and relocation Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 18/20] efi/libstub: Add limit argument to efi_random_alloc() Ard Biesheuvel
2023-06-07  7:23 ` [PATCH v5 19/20] x86/efistub: Perform SNP feature test while running in the firmware Ard Biesheuvel
2023-06-07 16:07   ` Tom Lendacky
2023-06-07 16:51     ` Ard Biesheuvel
2023-06-07 17:29       ` Tom Lendacky
2023-06-07  7:23 ` [PATCH v5 20/20] x86/efistub: Avoid legacy decompressor when doing EFI boot Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230607201924.GD3110@yjiang5-mobl.amr.corp.intel.com \
    --to=yunhong.jiang@linux.intel.com \
    --cc=ardb@kernel.org \
    --cc=baskov@ispras.ru \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dyoung@redhat.com \
    --cc=jroedel@suse.de \
    --cc=keescook@chromium.org \
    --cc=khoroshilov@ispras.ru \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kraxel@redhat.com \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjones@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).