linux-hardening.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] x86_64: Improvements at compressed kernel stage
@ 2022-09-06 10:41 Evgeniy Baskov
  2022-09-06 10:41 ` [PATCH 01/16] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
                   ` (16 more replies)
  0 siblings, 17 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

This patchset is aimed
* to improve UEFI compatibility of compressed kernel code for x86_64
* to setup proper memory access attributes for code and rodata sections
* to implement W^X protection policy throughout the whole execution 
  of compressed kernel for EFISTUB code path. 

Kernel is made to be more compatible with PE image specification [3],
allowing it to be successfully loaded by stricter PE loader
implementations like the one from [2]. There is at least one
known implementation that uses that loader in production [4].
There are also ongoing efforts to upstream these changes.

Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
EFI specification since version 2.10, as a better alternative to
using DXE services for memory protection attributes manipulation,
since it is defined by the UEFI specification itself and not UEFI PI
specification. This protocol is not widely available so the code
using DXE services is kept in place as a fallback in case specific
implementation does not support the new protocol.
One of EFI implementations that already support
EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
 
Kernel image generation tool (tools/build.c) is refactored as a part
of changes that makes PE image more compatible.
   
The patchset implements memory protection for compressed kernel
code while executing both inside EFI boot services and outside of
them. For EFISTUB code path W^X protection policy is maintained
throughout the whole execution of compressed kernel. The latter
is achieved by extracting the kernel directly from EFI environment
and jumping to it's head immediately after exiting EFI boot services.
As a side effect of this change one page table rebuild and a copy of
the kernel image is removed.

Direct extraction can be toggled using CONFIG_EFI_STUB_EXTRACT_DIRECT.
Memory protection inside EFI environment is controlled by the
CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
protection attributes of PE sections and not only DXE services as the
name might suggest.

[1] https://lkml.org/lkml/2022/8/1/1314
[2] https://github.com/acidanthera/audk/tree/secure_pe
[3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
[4] https://www.ispras.ru/en/technologies/asperitas/
[5] https://github.com/microsoft/mu_tiano_platforms

Evgeniy Baskov (16):
  x86/boot: Align vmlinuz sections on page size
  x86/build: Remove RWX sections and align on 4KB
  x86/boot: Set cr0 to known state in trampoline
  x86/boot: Increase boot page table size
  x86/boot: Support 4KB pages for identity mapping
  x86/boot: Setup memory protection for bzImage code
  x86/boot: Map memory explicitly
  x86/boot: Remove mapping from page fault handler
  efi/libstub: Move helper function to related file
  x86/boot: Make console interface more abstract
  x86/boot: Split trampoline and pt init code
  x86/boot: Add EFI kernel extraction interface
  efi/x86: Support extracting kernel from libstub
  x86/build: Make generated PE more spec compliant
  efi/libstub: Add memory attribute protocol definitions
  efi/libstub: Use memory attribute protocol

 arch/x86/boot/Makefile                        |   2 +-
 arch/x86/boot/compressed/Makefile             |   2 +-
 arch/x86/boot/compressed/acpi.c               |  21 +-
 arch/x86/boot/compressed/efi.c                |  19 +-
 arch/x86/boot/compressed/head_32.S            |   9 +-
 arch/x86/boot/compressed/head_64.S            |  77 ++-
 arch/x86/boot/compressed/ident_map_64.c       | 129 ++--
 arch/x86/boot/compressed/kaslr.c              |   4 +
 arch/x86/boot/compressed/misc.c               | 255 ++++----
 arch/x86/boot/compressed/misc.h               |  25 +-
 arch/x86/boot/compressed/pgtable.h            |  20 -
 arch/x86/boot/compressed/pgtable_64.c         |  75 ++-
 arch/x86/boot/compressed/putstr.c             | 133 ++++
 arch/x86/boot/compressed/sev.c                |   6 +-
 arch/x86/boot/compressed/vmlinux.lds.S        |   6 +
 arch/x86/boot/header.S                        | 110 +---
 arch/x86/boot/tools/build.c                   | 575 ++++++++++++------
 arch/x86/include/asm/boot.h                   |  26 +-
 arch/x86/include/asm/efi.h                    |   7 +
 arch/x86/include/asm/init.h                   |   1 +
 arch/x86/include/asm/shared/extract.h         |  25 +
 arch/x86/include/asm/shared/pgtable.h         |  29 +
 arch/x86/kernel/vmlinux.lds.S                 |  15 +-
 arch/x86/mm/ident_map.c                       | 186 +++++-
 drivers/firmware/efi/Kconfig                  |  14 +
 drivers/firmware/efi/libstub/Makefile         |   1 +
 drivers/firmware/efi/libstub/efistub.h        |  31 +
 drivers/firmware/efi/libstub/mem.c            | 189 ++++++
 .../firmware/efi/libstub/x86-extract-direct.c | 220 +++++++
 drivers/firmware/efi/libstub/x86-stub.c       | 172 +++---
 include/linux/efi.h                           |   1 +
 31 files changed, 1701 insertions(+), 684 deletions(-)
 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/boot/compressed/putstr.c
 create mode 100644 arch/x86/include/asm/shared/extract.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h
 create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/16] x86/boot: Align vmlinuz sections on page size
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:01   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

To protect sections on page table level each section
needs to be aligned on page size (4KB).

Set sections alignment in linker script.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/vmlinux.lds.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 112b2375d021..6be90f1a1198 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -27,21 +27,27 @@ SECTIONS
 		HEAD_TEXT
 		_ehead = . ;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.rodata..compressed : {
+		_compressed = .;
 		*(.rodata..compressed)
+		_ecompressed = .;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.text :	{
 		_text = .; 	/* Text */
 		*(.text)
 		*(.text.*)
 		_etext = . ;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.rodata : {
 		_rodata = . ;
 		*(.rodata)	 /* read-only data */
 		*(.rodata.*)
 		_erodata = . ;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.data :	{
 		_data = . ;
 		*(.data)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
  2022-09-06 10:41 ` [PATCH 01/16] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:04   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Avoid creating sections with maximal privileges to prepare for W^X
implementation. Align sections on page size (4KB) to allow protecting
them in page table.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/kernel/vmlinux.lds.S | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 15f29053cec4..6587e0201b50 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -102,12 +102,11 @@ jiffies = jiffies_64;
 PHDRS {
 	text PT_LOAD FLAGS(5);          /* R_E */
 	data PT_LOAD FLAGS(6);          /* RW_ */
-#ifdef CONFIG_X86_64
-#ifdef CONFIG_SMP
+#if defined(CONFIG_X86_64) && defined(CONFIG_SMP)
 	percpu PT_LOAD FLAGS(6);        /* RW_ */
 #endif
-	init PT_LOAD FLAGS(7);          /* RWE */
-#endif
+	inittext PT_LOAD FLAGS(5);      /* R_E */
+	init PT_LOAD FLAGS(6);          /* RW_ */
 	note PT_NOTE FLAGS(0);          /* ___ */
 }
 
@@ -226,9 +225,10 @@ SECTIONS
 #endif
 
 	INIT_TEXT_SECTION(PAGE_SIZE)
-#ifdef CONFIG_X86_64
-	:init
-#endif
+	:inittext
+
+	. = ALIGN(PAGE_SIZE);
+
 
 	/*
 	 * Section for code used exclusively before alternatives are run. All
@@ -240,6 +240,7 @@ SECTIONS
 	.altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
 		*(.altinstr_aux)
 	}
+	:init
 
 	INIT_DATA_SECTION(16)
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
  2022-09-06 10:41 ` [PATCH 01/16] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
  2022-09-06 10:41 ` [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:06   ` Ard Biesheuvel
  2022-10-19  7:44   ` Andrew Cooper
  2022-09-06 10:41 ` [PATCH 04/16] x86/boot: Increase boot page table size Evgeniy Baskov
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Ensure WP bit to be set to prevent boot code from writing to
non-writable memory pages.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/head_64.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d33f060900d2..5273367283b7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -619,9 +619,8 @@ SYM_CODE_START(trampoline_32bit_src)
 	/* Set up new stack */
 	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
 
-	/* Disable paging */
-	movl	%cr0, %eax
-	btrl	$X86_CR0_PG_BIT, %eax
+	/* Disable paging and setup CR0 */
+	movl	$(CR0_STATE & ~X86_CR0_PG), %eax
 	movl	%eax, %cr0
 
 	/* Check what paging mode we want to be in after the trampoline */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 04/16] x86/boot: Increase boot page table size
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (2 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:08   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Previous calculations ignored pages implicitly mapped by ACPI code,
so theoretical upper limit is higher than was set.

Using 4KB pages is desirable for better memory protection granularity.
Approximately twice as much memory is required for those.

Increase initial page table size to 64 4KB page tables.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/include/asm/boot.h | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3..024d972c248e 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -41,22 +41,24 @@
 # define BOOT_STACK_SIZE	0x4000
 
 # define BOOT_INIT_PGT_SIZE	(6*4096)
-# ifdef CONFIG_RANDOMIZE_BASE
 /*
  * Assuming all cross the 512GB boundary:
  * 1 page for level4
- * (2+2)*4 pages for kernel, param, cmd_line, and randomized kernel
- * 2 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
- * Total is 19 pages.
+ * (3+3)*2 pages for param and cmd_line
+ * (2+2+S)*2 pages for kernel and randomized kernel, where S is total number
+ *     of sections of kernel. Explanation: 2+2 are upper level page tables.
+ *     We can have only S unaligned parts of section: 1 at the end of the kernel
+ *     and (S-1) at the section borders. The start address of the kernel is
+ *     aligned, so an extra page table. There are at most S=6 sections in
+ *     vmlinux ELF image.
+ * 3 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
+ * Total is 36 pages.
+ *
+ * Some pages are also required for UEFI memory map and
+ * ACPI table mappings, so we need to add extra space.
+ * FIXME: Figure out exact amount of pages.
  */
-#  ifdef CONFIG_X86_VERBOSE_BOOTUP
-#   define BOOT_PGT_SIZE	(19*4096)
-#  else /* !CONFIG_X86_VERBOSE_BOOTUP */
-#   define BOOT_PGT_SIZE	(17*4096)
-#  endif
-# else /* !CONFIG_RANDOMIZE_BASE */
-#  define BOOT_PGT_SIZE		BOOT_INIT_PGT_SIZE
-# endif
+# define BOOT_PGT_SIZE		(64*4096)
 
 #else /* !CONFIG_X86_64 */
 # define BOOT_STACK_SIZE	0x1000
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (3 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 04/16] x86/boot: Increase boot page table size Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:11   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 06/16] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Current identity mapping code only supports 2M and 1G pages.
4KB pages are desirable for better memory protection granularity
in compressed kernel code.

Change identity mapping code to support 4KB pages and
memory remapping with different attributes.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/include/asm/init.h |   1 +
 arch/x86/mm/ident_map.c     | 186 +++++++++++++++++++++++++++++-------
 2 files changed, 155 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 5f1d3c421f68..a8277ee82c51 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -8,6 +8,7 @@ struct x86_mapping_info {
 	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
 	unsigned long offset;		 /* ident mapping offset */
 	bool direct_gbpages;		 /* PUD level 1GB page support */
+	bool allow_4kpages;		 /* Allow more granular mappings with 4K pages */
 	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
 };
 
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..ad455d4ef595 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -2,26 +2,130 @@
 /*
  * Helper routines for building identity mapping page tables. This is
  * included by both the compressed kernel and the regular kernel.
+ *
  */
 
-static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
+static void ident_pte_init(struct x86_mapping_info *info, pte_t *pte_page,
+			   unsigned long addr, unsigned long end,
+			   unsigned long flags)
 {
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
+	addr &= PAGE_MASK;
+	for (; addr < end; addr += PAGE_SIZE) {
+		pte_t *pte = pte_page + pte_index(addr);
+
+		set_pte(pte, __pte((addr - info->offset) | flags));
+	}
+}
+
+pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
+			     pmd_t *pmdp, unsigned long page_addr)
+{
+	unsigned long pmd_addr, page_flags;
+	pte_t *pte;
+
+	pte = (pte_t *)info->alloc_pgt_page(info->context);
+	if (!pte)
+		return NULL;
+
+	pmd_addr = page_addr & PMD_MASK;
+
+	/* Not a large page - clear PSE flag */
+	page_flags = pmd_flags(*pmdp) & ~_PSE;
+	ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE, page_flags);
+
+	return pte;
+}
+
+static int ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
+			  unsigned long addr, unsigned long end,
+			  unsigned long flags)
+{
+	unsigned long next;
+	bool new_table = 0;
+
+	for (; addr < end; addr = next) {
 		pmd_t *pmd = pmd_page + pmd_index(addr);
+		pte_t *pte;
 
-		if (pmd_present(*pmd))
+		next = (addr & PMD_MASK) + PMD_SIZE;
+		if (next > end)
+			next = end;
+
+		/*
+		 * Use 2M pages if 4k pages are not allowed or
+		 * we are not mapping extra, i.e. address and size are aligned.
+		 */
+
+		if (!info->allow_4kpages ||
+		    (!(addr & ~PMD_MASK) && next == addr + PMD_SIZE)) {
+
+			pmd_t pmdval;
+
+			addr &= PMD_MASK;
+			pmdval = __pmd((addr - info->offset) | flags | _PSE);
+			set_pmd(pmd, pmdval);
 			continue;
+		}
+
+		/*
+		 * If currently mapped page is large, we need to split it.
+		 * The case when we can remap 2M page to 2M page
+		 * with different flags is already covered above.
+		 *
+		 * If there's nothing mapped to desired address,
+		 * we need to allocate new page table.
+		 */
 
-		set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
+		if (pmd_large(*pmd)) {
+			pte = ident_split_large_pmd(info, pmd, addr);
+			new_table = 1;
+		} else if (!pmd_present(*pmd)) {
+			pte = (pte_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
+			pte = pte_offset_kernel(pmd, 0);
+			new_table = 0;
+		}
+
+		if (!pte)
+			return -ENOMEM;
+
+		ident_pte_init(info, pte, addr, next, flags);
+
+		if (new_table)
+			set_pmd(pmd, __pmd(__pa(pte) | info->kernpg_flag));
 	}
+
+	return 0;
 }
 
+
+pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
+			     pud_t *pudp, unsigned long page_addr)
+{
+	unsigned long pud_addr, page_flags;
+	pmd_t *pmd;
+
+	pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+	if (!pmd)
+		return NULL;
+
+	pud_addr = page_addr & PUD_MASK;
+
+	/* Not a large page - clear PSE flag */
+	page_flags = pud_flags(*pudp) & ~_PSE;
+	ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE, page_flags);
+
+	return pmd;
+}
+
+
 static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 			  unsigned long addr, unsigned long end)
 {
 	unsigned long next;
+	bool new_table = 0;
+	int result;
 
 	for (; addr < end; addr = next) {
 		pud_t *pud = pud_page + pud_index(addr);
@@ -31,28 +135,39 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		if (next > end)
 			next = end;
 
+		/* Use 1G pages only if forced, even if they are supported. */
 		if (info->direct_gbpages) {
 			pud_t pudval;
-
-			if (pud_present(*pud))
-				continue;
+			unsigned long flags;
 
 			addr &= PUD_MASK;
-			pudval = __pud((addr - info->offset) | info->page_flag);
+			flags = info->page_flag | _PSE;
+			pudval = __pud((addr - info->offset) | flags);
+
 			set_pud(pud, pudval);
 			continue;
 		}
 
-		if (pud_present(*pud)) {
+		if (pud_large(*pud)) {
+			pmd = ident_split_large_pud(info, pud, addr);
+			new_table = 1;
+		} else if (!pud_present(*pud)) {
+			pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
 			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info, pmd, addr, next);
-			continue;
+			new_table = 0;
 		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+
 		if (!pmd)
 			return -ENOMEM;
-		ident_pmd_init(info, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
+
+		result = ident_pmd_init(info, pmd, addr, next, info->page_flag);
+		if (result)
+			return result;
+
+		if (new_table)
+			set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
 	}
 
 	return 0;
@@ -63,6 +178,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 {
 	unsigned long next;
 	int result;
+	bool new_table = 0;
 
 	for (; addr < end; addr = next) {
 		p4d_t *p4d = p4d_page + p4d_index(addr);
@@ -72,15 +188,14 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		if (next > end)
 			next = end;
 
-		if (p4d_present(*p4d)) {
+		if (!p4d_present(*p4d)) {
+			pud = (pud_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
 			pud = pud_offset(p4d, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-
-			continue;
+			new_table = 0;
 		}
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
+
 		if (!pud)
 			return -ENOMEM;
 
@@ -88,19 +203,22 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		if (result)
 			return result;
 
-		set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
+		if (new_table)
+			set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
 	}
 
 	return 0;
 }
 
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long pstart, unsigned long pend)
+int kernel_ident_mapping_init(struct x86_mapping_info *info,
+			      pgd_t *pgd_page, unsigned long pstart,
+			      unsigned long pend)
 {
 	unsigned long addr = pstart + info->offset;
 	unsigned long end = pend + info->offset;
 	unsigned long next;
 	int result;
+	bool new_table;
 
 	/* Set the default pagetable flags if not supplied */
 	if (!info->kernpg_flag)
@@ -117,20 +235,24 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		if (next > end)
 			next = end;
 
-		if (pgd_present(*pgd)) {
+		if (!pgd_present(*pgd)) {
+			p4d = (p4d_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
 			p4d = p4d_offset(pgd, 0);
-			result = ident_p4d_init(info, p4d, addr, next);
-			if (result)
-				return result;
-			continue;
+			new_table = 0;
 		}
 
-		p4d = (p4d_t *)info->alloc_pgt_page(info->context);
 		if (!p4d)
 			return -ENOMEM;
+
 		result = ident_p4d_init(info, p4d, addr, next);
 		if (result)
 			return result;
+
+		if (!new_table)
+			continue;
+
 		if (pgtable_l5_enabled()) {
 			set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
 		} else {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 06/16] x86/boot: Setup memory protection for bzImage code
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (4 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:17   ` Ard Biesheuvel
  2022-10-19  7:57   ` Andrew Cooper
  2022-09-06 10:41 ` [PATCH 07/16] x86/boot: Map memory explicitly Evgeniy Baskov
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Use previously added code to use 4KB pages for mapping. Map compressed
and uncompressed kernel with appropriate memory protection attributes.
For compressed kernel set them up manually. For uncompressed kernel
used flags specified in ELF header.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h
---
 arch/x86/boot/compressed/head_64.S      | 25 ++++++-
 arch/x86/boot/compressed/ident_map_64.c | 96 ++++++++++++++++---------
 arch/x86/boot/compressed/misc.c         | 63 ++++++++++++++--
 arch/x86/boot/compressed/misc.h         | 16 ++++-
 arch/x86/boot/compressed/pgtable.h      | 20 ------
 arch/x86/boot/compressed/pgtable_64.c   |  2 +-
 arch/x86/boot/compressed/sev.c          |  6 +-
 arch/x86/include/asm/shared/pgtable.h   | 29 ++++++++
 8 files changed, 193 insertions(+), 64 deletions(-)
 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 5273367283b7..889ca7176aa7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -35,7 +35,7 @@
 #include <asm/bootparam.h>
 #include <asm/desc_defs.h>
 #include <asm/trapnr.h>
-#include "pgtable.h"
+#include <asm/shared/pgtable.h>
 
 /*
  * Locally defined symbols should be marked hidden:
@@ -578,6 +578,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	pushq	%rsi
 	call	load_stage2_idt
 
+	call	startup32_enable_nx_if_supported
 	/* Pass boot_params to initialize_identity_maps() */
 	movq	(%rsp), %rdi
 	call	initialize_identity_maps
@@ -602,6 +603,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	jmp	*%rax
 SYM_FUNC_END(.Lrelocated)
 
+SYM_FUNC_START_LOCAL_NOALIGN(startup32_enable_nx_if_supported)
+	pushq	%rbx
+
+	leaq	has_nx(%rip), %rcx
+
+	mov	$0x80000001, %eax
+	cpuid
+	btl	$20, %edx
+	jnc	.Lnonx
+
+	movl	$1, (%rcx)
+
+	movl	$MSR_EFER, %ecx
+	rdmsr
+	btsl	$_EFER_NX, %eax
+	wrmsr
+
+.Lnonx:
+	popq	%rbx
+	RET
+SYM_FUNC_END(startup32_enable_nx_if_supported)
+
 	.code32
 /*
  * This is the 32-bit trampoline that will be copied over to low memory.
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index d4a314cc50d6..880e08293023 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -28,6 +28,7 @@
 #include <asm/trap_pf.h>
 #include <asm/trapnr.h>
 #include <asm/init.h>
+#include <asm/shared/pgtable.h>
 /* Use the static base for this part of the boot process */
 #undef __PAGE_OFFSET
 #define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -86,24 +87,46 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
  * Due to relocation, pointers must be assigned at run time not build time.
  */
 static struct x86_mapping_info mapping_info;
+int has_nx;
 
 /*
  * Adds the specified range to the identity mappings.
  */
-void kernel_add_identity_map(unsigned long start, unsigned long end)
+unsigned long kernel_add_identity_map(unsigned long start,
+				      unsigned long end,
+				      unsigned int flags)
 {
 	int ret;
 
 	/* Align boundary to 2M. */
-	start = round_down(start, PMD_SIZE);
-	end = round_up(end, PMD_SIZE);
+	start = round_down(start, PAGE_SIZE);
+	end = round_up(end, PAGE_SIZE);
 	if (start >= end)
-		return;
+		return start;
+
+	/* Enforce W^X -- just stop booting with error on violation. */
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) &&
+	    (flags & (MAP_EXEC | MAP_WRITE)) == (MAP_EXEC | MAP_WRITE))
+		error("Error: W^X violation\n");
+
+	bool nx = !(flags & MAP_EXEC) && has_nx;
+	bool ro = !(flags & MAP_WRITE);
+
+	mapping_info.page_flag = sme_me_mask | (nx ?
+		(ro ? __PAGE_KERNEL_RO : __PAGE_KERNEL) :
+		(ro ? __PAGE_KERNEL_ROX : __PAGE_KERNEL_EXEC));
 
 	/* Build the mapping. */
-	ret = kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, start, end);
+	ret = kernel_ident_mapping_init(&mapping_info,
+					(pgd_t *)top_level_pgt,
+					start, end);
 	if (ret)
 		error("Error: kernel_ident_mapping_init() failed\n");
+
+	if (!(flags & MAP_NOFLUSH))
+		write_cr3(top_level_pgt);
+
+	return start;
 }
 
 /* Locates and clears a region for a new top level page table. */
@@ -112,14 +135,17 @@ void initialize_identity_maps(void *rmode)
 	unsigned long cmdline;
 	struct setup_data *sd;
 
+	boot_params = rmode;
+
 	/* Exclude the encryption mask from __PHYSICAL_MASK */
 	physical_mask &= ~sme_me_mask;
 
 	/* Init mapping_info with run-time function/buffer pointers. */
 	mapping_info.alloc_pgt_page = alloc_pgt_page;
 	mapping_info.context = &pgt_data;
-	mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask;
+	mapping_info.page_flag = __PAGE_KERNEL_EXEC | sme_me_mask;
 	mapping_info.kernpg_flag = _KERNPG_TABLE;
+	mapping_info.allow_4kpages = 1;
 
 	/*
 	 * It should be impossible for this not to already be true,
@@ -154,15 +180,34 @@ void initialize_identity_maps(void *rmode)
 	/*
 	 * New page-table is set up - map the kernel image, boot_params and the
 	 * command line. The uncompressed kernel requires boot_params and the
-	 * command line to be mapped in the identity mapping. Map them
-	 * explicitly here in case the compressed kernel does not touch them,
-	 * or does not touch all the pages covering them.
+	 * command line to be mapped in the identity mapping.
+	 * Every other accessed memory region is mapped later, if required.
 	 */
-	kernel_add_identity_map((unsigned long)_head, (unsigned long)_end);
-	boot_params = rmode;
-	kernel_add_identity_map((unsigned long)boot_params, (unsigned long)(boot_params + 1));
+	extern char _head[], _ehead[];
+	kernel_add_identity_map((unsigned long)_head,
+				(unsigned long)_ehead, MAP_EXEC | MAP_NOFLUSH);
+
+	extern char _compressed[], _ecompressed[];
+	kernel_add_identity_map((unsigned long)_compressed,
+				(unsigned long)_ecompressed, MAP_WRITE | MAP_NOFLUSH);
+
+	extern char _text[], _etext[];
+	kernel_add_identity_map((unsigned long)_text,
+				(unsigned long)_etext, MAP_EXEC | MAP_NOFLUSH);
+
+	extern char _rodata[], _erodata[];
+	kernel_add_identity_map((unsigned long)_rodata,
+				(unsigned long)_erodata, MAP_NOFLUSH);
+
+	extern char _data[], _end[];
+	kernel_add_identity_map((unsigned long)_data,
+				(unsigned long)_end, MAP_WRITE | MAP_NOFLUSH);
+
+	kernel_add_identity_map((unsigned long)boot_params,
+				(unsigned long)(boot_params + 1), MAP_WRITE | MAP_NOFLUSH);
+
 	cmdline = get_cmd_line_ptr();
-	kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
+	kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE, MAP_NOFLUSH);
 
 	/*
 	 * Also map the setup_data entries passed via boot_params in case they
@@ -172,7 +217,7 @@ void initialize_identity_maps(void *rmode)
 	while (sd) {
 		unsigned long sd_addr = (unsigned long)sd;
 
-		kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len);
+		kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len, MAP_NOFLUSH);
 		sd = (struct setup_data *)sd->next;
 	}
 
@@ -185,26 +230,11 @@ void initialize_identity_maps(void *rmode)
 static pte_t *split_large_pmd(struct x86_mapping_info *info,
 			      pmd_t *pmdp, unsigned long __address)
 {
-	unsigned long page_flags;
-	unsigned long address;
-	pte_t *pte;
-	pmd_t pmd;
-	int i;
-
-	pte = (pte_t *)info->alloc_pgt_page(info->context);
+	unsigned long address = __address & PMD_MASK;
+	pte_t *pte = ident_split_large_pmd(info, pmdp, address);
 	if (!pte)
 		return NULL;
 
-	address     = __address & PMD_MASK;
-	/* No large page - clear PSE flag */
-	page_flags  = info->page_flag & ~_PAGE_PSE;
-
-	/* Populate the PTEs */
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		set_pte(&pte[i], __pte(address | page_flags));
-		address += PAGE_SIZE;
-	}
-
 	/*
 	 * Ideally we need to clear the large PMD first and do a TLB
 	 * flush before we write the new PMD. But the 2M range of the
@@ -214,7 +244,7 @@ static pte_t *split_large_pmd(struct x86_mapping_info *info,
 	 * also the only user of the page-table, so there is no chance
 	 * of a TLB multihit.
 	 */
-	pmd = __pmd((unsigned long)pte | info->kernpg_flag);
+	pmd_t pmd = __pmd((unsigned long)pte | info->kernpg_flag);
 	set_pmd(pmdp, pmd);
 	/* Flush TLB to establish the new PMD */
 	write_cr3(top_level_pgt);
@@ -377,5 +407,5 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 * Error code is sane - now identity map the 2M region around
 	 * the faulting address.
 	 */
-	kernel_add_identity_map(address, end);
+	kernel_add_identity_map(address, end, MAP_WRITE);
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..d377e434c4e3 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -14,10 +14,10 @@
 
 #include "misc.h"
 #include "error.h"
-#include "pgtable.h"
 #include "../string.h"
 #include "../voffset.h"
 #include <asm/bootparam_utils.h>
+#include <asm/shared/pgtable.h>
 
 /*
  * WARNING!!
@@ -277,7 +277,8 @@ static inline void handle_relocations(void *output, unsigned long output_len,
 { }
 #endif
 
-static void parse_elf(void *output)
+static void parse_elf(void *output, unsigned long output_len,
+		      unsigned long virt_addr)
 {
 #ifdef CONFIG_X86_64
 	Elf64_Ehdr ehdr;
@@ -287,6 +288,7 @@ static void parse_elf(void *output)
 	Elf32_Phdr *phdrs, *phdr;
 #endif
 	void *dest;
+	unsigned long addr;
 	int i;
 
 	memcpy(&ehdr, output, sizeof(ehdr));
@@ -323,10 +325,50 @@ static void parse_elf(void *output)
 #endif
 			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
-		default: /* Ignore other PT_* */ break;
+		default:
+			/* Ignore other PT_* */
+			break;
+		}
+	}
+
+	handle_relocations(output, output_len, virt_addr);
+
+	if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+		goto skip_protect;
+
+	for (i = 0; i < ehdr.e_phnum; i++) {
+		phdr = &phdrs[i];
+
+		switch (phdr->p_type) {
+		case PT_LOAD:
+#ifdef CONFIG_RELOCATABLE
+			addr = (unsigned long)output;
+			addr += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+#else
+			addr = phdr->p_paddr;
+#endif
+			/*
+			 * Simultaneously readable and writable segments are
+			 * violating W^X, and should not be present in vmlinux image.
+			 */
+			if ((phdr->p_flags & (PF_X | PF_W)) == (PF_X | PF_W))
+				error("W^X violation for ELF segment");
+
+			unsigned int flags = MAP_PROTECT;
+			if (phdr->p_flags & PF_X)
+				flags |= MAP_EXEC;
+			if (phdr->p_flags & PF_W)
+				flags |= MAP_WRITE;
+
+			kernel_add_identity_map(addr, addr + phdr->p_memsz, flags);
+			break;
+		default:
+			/* Ignore other PT_* */
+			break;
 		}
 	}
 
+skip_protect:
 	free(phdrs);
 }
 
@@ -434,6 +476,18 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 				needed_size,
 				&virt_addr);
 
+	unsigned long phys_addr = (unsigned long)output;
+
+	/*
+	 * If KASLR is disabled input and output regions may overlap.
+	 * In this case we need to map region excutable as well.
+	 */
+	unsigned long map_flags = MAP_ALLOC | MAP_WRITE |
+			(IS_ENABLED(CONFIG_RANDOMIZE_BASE) ? 0 : MAP_EXEC);
+	output = (unsigned char *)kernel_add_identity_map(phys_addr,
+							  phys_addr + needed_size,
+							  map_flags);
+
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
 		error("Destination physical address inappropriately aligned");
@@ -456,8 +510,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	debug_putstr("\nDecompressing Linux... ");
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
-	parse_elf(output);
-	handle_relocations(output, output_len, virt_addr);
+	parse_elf(output, output_len, virt_addr);
 	debug_putstr("done.\nBooting the kernel.\n");
 
 	/* Disable exception handling before booting the kernel */
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 62208ec04ca4..a4f99516f310 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -171,8 +171,20 @@ static inline int count_immovable_mem_regions(void) { return 0; }
 #ifdef CONFIG_X86_5LEVEL
 extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
 #endif
-extern void kernel_add_identity_map(unsigned long start, unsigned long end);
-
+#ifdef CONFIG_X86_64
+extern unsigned long kernel_add_identity_map(unsigned long start,
+					     unsigned long end,
+					     unsigned int flags);
+#else
+static inline unsigned long kernel_add_identity_map(unsigned long start,
+						    unsigned long end,
+						    unsigned int flags)
+{
+	(void)flags;
+	(void)end;
+	return start;
+}
+#endif
 /* Used by PAGE_KERN* macros: */
 extern pteval_t __default_kernel_pte_mask;
 
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
deleted file mode 100644
index cc9b2529a086..000000000000
--- a/arch/x86/boot/compressed/pgtable.h
+++ /dev/null
@@ -1,20 +0,0 @@
-#ifndef BOOT_COMPRESSED_PAGETABLE_H
-#define BOOT_COMPRESSED_PAGETABLE_H
-
-#define TRAMPOLINE_32BIT_SIZE		(2 * PAGE_SIZE)
-
-#define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
-
-#define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE	0x80
-
-#define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
-
-#ifndef __ASSEMBLER__
-
-extern unsigned long *trampoline_32bit;
-
-extern void trampoline_32bit_src(void *return_ptr);
-
-#endif /* __ASSEMBLER__ */
-#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111b..c7cf5a1059a8 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -2,7 +2,7 @@
 #include "misc.h"
 #include <asm/e820/types.h>
 #include <asm/processor.h>
-#include "pgtable.h"
+#include <asm/shared/pgtable.h>
 #include "../string.h"
 #include "efi.h"
 
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index c93930d5ccbd..99f3ad0b30f3 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -13,6 +13,7 @@
 #include "misc.h"
 
 #include <asm/pgtable_types.h>
+#include <asm/shared/pgtable.h>
 #include <asm/sev.h>
 #include <asm/trapnr.h>
 #include <asm/trap_pf.h>
@@ -435,10 +436,11 @@ void sev_prep_identity_maps(unsigned long top_level_pgt)
 		unsigned long cc_info_pa = boot_params->cc_blob_address;
 		struct cc_blob_sev_info *cc_info;
 
-		kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info));
+		kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info), MAP_NOFLUSH);
 
 		cc_info = (struct cc_blob_sev_info *)cc_info_pa;
-		kernel_add_identity_map(cc_info->cpuid_phys, cc_info->cpuid_phys + cc_info->cpuid_len);
+		kernel_add_identity_map(cc_info->cpuid_phys,
+					cc_info->cpuid_phys + cc_info->cpuid_len, MAP_NOFLUSH);
 	}
 
 	sev_verify_cbit(top_level_pgt);
diff --git a/arch/x86/include/asm/shared/pgtable.h b/arch/x86/include/asm/shared/pgtable.h
new file mode 100644
index 000000000000..6527dadf39d6
--- /dev/null
+++ b/arch/x86/include/asm/shared/pgtable.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ASM_SHARED_PAGETABLE_H
+#define ASM_SHARED_PAGETABLE_H
+
+#define MAP_WRITE	0x02 /* Writable memory */
+#define MAP_EXEC	0x04 /* Executable memory */
+#define MAP_ALLOC	0x10 /* Range needs to be allocated */
+#define MAP_PROTECT	0x20 /* Set exact memory attributes for memory range */
+#define MAP_NOFLUSH	0x40 /* Avoid flushing TLB */
+
+#define TRAMPOLINE_32BIT_SIZE		(3 * PAGE_SIZE)
+
+#define TRAMPOLINE_32BIT_PLACEMENT_MAX	(0xA0000)
+
+#define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
+
+#define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE	0x80
+
+#define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern unsigned long *trampoline_32bit;
+
+extern void trampoline_32bit_src(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* ASM_SHARED_PAGETABLE_H */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 07/16] x86/boot: Map memory explicitly
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (5 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 06/16] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-09-06 10:41 ` [PATCH 08/16] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Implicit mappings hide possible memory errors, e.g. allocations for
ACPI tables were not included in boot page table size.

Replace all implicit mappings from page fault handler with
explicit mappings.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/acpi.c  | 21 ++++++++++++++++++++-
 arch/x86/boot/compressed/efi.c   | 19 ++++++++++++++++++-
 arch/x86/boot/compressed/kaslr.c |  4 ++++
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 9caf89063e77..633ac56262ee 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -93,6 +93,8 @@ static u8 *scan_mem_for_rsdp(u8 *start, u32 length)
 
 	end = start + length;
 
+	kernel_add_identity_map((unsigned long)start, (unsigned long)end, 0);
+
 	/* Search from given start address for the requested length */
 	for (address = start; address < end; address += ACPI_RSDP_SCAN_STEP) {
 		/*
@@ -128,6 +130,9 @@ static acpi_physical_address bios_get_rsdp_addr(void)
 	unsigned long address;
 	u8 *rsdp;
 
+	kernel_add_identity_map((unsigned long)ACPI_EBDA_PTR_LOCATION,
+				(unsigned long)ACPI_EBDA_PTR_LOCATION + 2, 0);
+
 	/* Get the location of the Extended BIOS Data Area (EBDA) */
 	address = *(u16 *)ACPI_EBDA_PTR_LOCATION;
 	address <<= 4;
@@ -215,6 +220,9 @@ static unsigned long get_acpi_srat_table(void)
 	if (!rsdp)
 		return 0;
 
+	kernel_add_identity_map((unsigned long)rsdp,
+				(unsigned long)(rsdp + 1), 0);
+
 	/* Get ACPI root table from RSDP.*/
 	if (!(cmdline_find_option("acpi", arg, sizeof(arg)) == 4 &&
 	    !strncmp(arg, "rsdt", 4)) &&
@@ -235,6 +243,9 @@ static unsigned long get_acpi_srat_table(void)
 	if (len < sizeof(struct acpi_table_header) + size)
 		return 0;
 
+	kernel_add_identity_map((unsigned long)header,
+				(unsigned long)header + len, 0);
+
 	num_entries = (len - sizeof(struct acpi_table_header)) / size;
 	entry = (u8 *)(root_table + sizeof(struct acpi_table_header));
 
@@ -247,8 +258,16 @@ static unsigned long get_acpi_srat_table(void)
 		if (acpi_table) {
 			header = (struct acpi_table_header *)acpi_table;
 
-			if (ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_SRAT))
+			kernel_add_identity_map(acpi_table,
+						acpi_table + sizeof(*header),
+						0);
+
+			if (ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_SRAT)) {
+				kernel_add_identity_map(acpi_table,
+							acpi_table + header->length,
+							0);
 				return acpi_table;
+			}
 		}
 		entry += size;
 	}
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index 6edd034b0b30..ce70103fbbc0 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -57,10 +57,14 @@ enum efi_type efi_get_type(struct boot_params *bp)
  */
 unsigned long efi_get_system_table(struct boot_params *bp)
 {
-	unsigned long sys_tbl_pa;
+	static unsigned long sys_tbl_pa __section(".data");
 	struct efi_info *ei;
+	unsigned long sys_tbl_size;
 	enum efi_type et;
 
+	if (sys_tbl_pa)
+		return sys_tbl_pa;
+
 	/* Get systab from boot params. */
 	ei = &bp->efi_info;
 #ifdef CONFIG_X86_64
@@ -73,6 +77,13 @@ unsigned long efi_get_system_table(struct boot_params *bp)
 		return 0;
 	}
 
+	if (efi_get_type(bp) == EFI_TYPE_64)
+		sys_tbl_size = sizeof(efi_system_table_64_t);
+	else
+		sys_tbl_size = sizeof(efi_system_table_32_t);
+
+	kernel_add_identity_map(sys_tbl_pa, sys_tbl_pa + sys_tbl_size, 0);
+
 	return sys_tbl_pa;
 }
 
@@ -92,6 +103,10 @@ static struct efi_setup_data *get_kexec_setup_data(struct boot_params *bp,
 
 	pa_data = bp->hdr.setup_data;
 	while (pa_data) {
+		unsigned long pa_data_end = pa_data + sizeof(struct setup_data)
+					  + sizeof(struct efi_setup_data);
+		kernel_add_identity_map(pa_data, pa_data_end, 0);
+
 		data = (struct setup_data *)pa_data;
 		if (data->type == SETUP_EFI) {
 			esd = (struct efi_setup_data *)(pa_data + sizeof(struct setup_data));
@@ -160,6 +175,8 @@ int efi_get_conf_table(struct boot_params *bp, unsigned long *cfg_tbl_pa,
 		return -EINVAL;
 	}
 
+	kernel_add_identity_map(*cfg_tbl_pa, *cfg_tbl_pa + *cfg_tbl_len, 0);
+
 	return 0;
 }
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 4a3f223973f4..073c7cfbd785 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -687,6 +687,8 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
 	u32 nr_desc;
 	int i;
 
+	kernel_add_identity_map((unsigned long)e, (unsigned long)(e + 1), 0);
+
 	signature = (char *)&e->efi_loader_signature;
 	if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
 	    strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
@@ -703,6 +705,8 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
 	pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
 #endif
 
+	kernel_add_identity_map(pmap, pmap + e->efi_memmap_size, 0);
+
 	nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
 	for (i = 0; i < nr_desc; i++) {
 		md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 08/16] x86/boot: Remove mapping from page fault handler
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (6 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 07/16] x86/boot: Map memory explicitly Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:20   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 09/16] efi/libstub: Move helper function to related file Evgeniy Baskov
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

After every implicit mapping is removed, this code is no longer needed.

Remove memory mapping from page fault handler to ensure that there are
no hidden invalid memory accesses.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/ident_map_64.c | 26 ++++++++++---------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 880e08293023..c20cd31e665f 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -385,27 +385,21 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	unsigned long address = native_read_cr2();
 	unsigned long end;
-	bool ghcb_fault;
+	char *msg;
 
-	ghcb_fault = sev_es_check_ghcb_fault(address);
+	if (sev_es_check_ghcb_fault(address))
+		msg = "Page-fault on GHCB page:";
+	else
+		msg = "Unexpected page-fault:";
 
 	address   &= PMD_MASK;
 	end        = address + PMD_SIZE;
 
 	/*
-	 * Check for unexpected error codes. Unexpected are:
-	 *	- Faults on present pages
-	 *	- User faults
-	 *	- Reserved bits set
-	 */
-	if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
-		do_pf_error("Unexpected page-fault:", error_code, address, regs->ip);
-	else if (ghcb_fault)
-		do_pf_error("Page-fault on GHCB page:", error_code, address, regs->ip);
-
-	/*
-	 * Error code is sane - now identity map the 2M region around
-	 * the faulting address.
+	 * Since all memory allocations are made explicit
+	 * now, every page fault at this stage is an
+	 * error and the error handler is there only
+	 * for debug purposes.
 	 */
-	kernel_add_identity_map(address, end, MAP_WRITE);
+	do_pf_error(msg, error_code, address, regs->ip);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/16] efi/libstub: Move helper function to related file
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (7 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 08/16] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:21   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 10/16] x86/boot: Make console interface more abstract Evgeniy Baskov
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

efi_adjust_memory_range_protection() can be useful outside x86-stub.c.

Move it to mem.c, where memory related code resides and make it
non-static.

Change its behavior to setup exact attibutes and disallow making memory
regions readable and writable simultaniosly for supported
configurations.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 drivers/firmware/efi/libstub/efistub.h  |   4 +
 drivers/firmware/efi/libstub/mem.c      | 101 ++++++++++++++++++++++++
 drivers/firmware/efi/libstub/x86-stub.c |  67 ++--------------
 3 files changed, 111 insertions(+), 61 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index b0ae0a454404..22fe28385db7 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -907,6 +907,10 @@ efi_status_t efi_relocate_kernel(unsigned long *image_addr,
 				 unsigned long alignment,
 				 unsigned long min_addr);
 
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+						unsigned long size,
+						unsigned long attributes);
+
 efi_status_t efi_parse_options(char const *cmdline);
 
 void efi_parse_option_graphics(char *option);
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index feef8d4be113..89ebc8ad2c22 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -130,3 +130,104 @@ void efi_free(unsigned long size, unsigned long addr)
 	nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
 	efi_bs_call(free_pages, addr, nr_pages);
 }
+
+/**
+ * efi_adjust_memory_range_protection() - change memory range protection attributes
+ * @start:	memory range start address
+ * @size:	memory range size
+ *
+ * Actual memory range for which memory attributes are modified is
+ * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
+ * that includes [start, start + size].
+ *
+ * @return: status code
+ */
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+						unsigned long size,
+						unsigned long attributes)
+{
+	efi_status_t status;
+	efi_gcd_memory_space_desc_t desc;
+	efi_physical_addr_t end, next;
+	efi_physical_addr_t rounded_start, rounded_end;
+	efi_physical_addr_t unprotect_start, unprotect_size;
+	int has_system_memory = 0;
+
+	if (efi_dxe_table == NULL)
+		return EFI_UNSUPPORTED;
+
+	/*
+	 * This function should not be used to modify attributes
+	 * other than writable/executable.
+	 */
+
+	if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
+		return EFI_INVALID_PARAMETER;
+
+	/*
+	 * Disallow simultaniously executable and writable memory
+	 * to inforce W^X policy if direct extraction code is enabled.
+	 */
+
+	if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
+	    IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
+		return EFI_INVALID_PARAMETER;
+
+	rounded_start = rounddown(start, EFI_PAGE_SIZE);
+	rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+
+	/*
+	 * Don't modify memory region attributes, they are
+	 * already suitable, to lower the possibility to
+	 * encounter firmware bugs.
+	 */
+
+	for (end = start + size; start < end; start = next) {
+
+		status = efi_dxe_call(get_memory_space_descriptor,
+				      start, &desc);
+
+		if (status != EFI_SUCCESS) {
+			efi_warn("Unable to get memory descriptor at %lx\n",
+				 start);
+			return status;
+		}
+
+		next = desc.base_address + desc.length;
+
+		/*
+		 * Only system memory is suitable for trampoline/kernel image
+		 * placement, so only this type of memory needs its attributes
+		 * to be modified.
+		 */
+
+		if (desc.gcd_memory_type != EfiGcdMemoryTypeSystemMemory) {
+			efi_warn("Attempted to change protection of special memory range\n");
+			return EFI_UNSUPPORTED;
+		}
+
+		if (((desc.attributes ^ attributes) &
+		     (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0)
+			continue;
+
+		desc.attributes &= ~(EFI_MEMORY_RO | EFI_MEMORY_XP);
+		desc.attributes |= attributes;
+
+		unprotect_start = max(rounded_start, desc.base_address);
+		unprotect_size = min(rounded_end, next) - unprotect_start;
+
+		status = efi_dxe_call(set_memory_space_attributes,
+				      unprotect_start, unprotect_size,
+				      desc.attributes);
+
+		if (status != EFI_SUCCESS) {
+			efi_warn("Unable to unprotect memory range [%08lx,%08lx]: %lx\n",
+				 (unsigned long)unprotect_start,
+				 (unsigned long)(unprotect_start + unprotect_size),
+				 status);
+			return status;
+		}
+	}
+
+	return EFI_SUCCESS;
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 05ae8bcc9d67..678f9c2ccafc 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -212,62 +212,6 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
 	}
 }
 
-static void
-adjust_memory_range_protection(unsigned long start, unsigned long size)
-{
-	efi_status_t status;
-	efi_gcd_memory_space_desc_t desc;
-	unsigned long end, next;
-	unsigned long rounded_start, rounded_end;
-	unsigned long unprotect_start, unprotect_size;
-	int has_system_memory = 0;
-
-	if (efi_dxe_table == NULL)
-		return;
-
-	rounded_start = rounddown(start, EFI_PAGE_SIZE);
-	rounded_end = roundup(start + size, EFI_PAGE_SIZE);
-
-	/*
-	 * Don't modify memory region attributes, they are
-	 * already suitable, to lower the possibility to
-	 * encounter firmware bugs.
-	 */
-
-	for (end = start + size; start < end; start = next) {
-
-		status = efi_dxe_call(get_memory_space_descriptor, start, &desc);
-
-		if (status != EFI_SUCCESS)
-			return;
-
-		next = desc.base_address + desc.length;
-
-		/*
-		 * Only system memory is suitable for trampoline/kernel image placement,
-		 * so only this type of memory needs its attributes to be modified.
-		 */
-
-		if (desc.gcd_memory_type != EfiGcdMemoryTypeSystemMemory ||
-		    (desc.attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0)
-			continue;
-
-		unprotect_start = max(rounded_start, (unsigned long)desc.base_address);
-		unprotect_size = min(rounded_end, next) - unprotect_start;
-
-		status = efi_dxe_call(set_memory_space_attributes,
-				      unprotect_start, unprotect_size,
-				      EFI_MEMORY_WB);
-
-		if (status != EFI_SUCCESS) {
-			efi_warn("Unable to unprotect memory range [%08lx,%08lx]: %lx\n",
-				 unprotect_start,
-				 unprotect_start + unprotect_size,
-				 status);
-		}
-	}
-}
-
 /*
  * Trampoline takes 2 pages and can be loaded in first megabyte of memory
  * with its end placed between 128k and 640k where BIOS might start.
@@ -291,12 +235,12 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
 	 * and relocated kernel image.
 	 */
 
-	adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
-				       TRAMPOLINE_PLACEMENT_SIZE);
+	efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
+					   TRAMPOLINE_PLACEMENT_SIZE, 0);
 
 #ifdef CONFIG_64BIT
 	if (image_base != (unsigned long)startup_32)
-		adjust_memory_range_protection(image_base, image_size);
+		efi_adjust_memory_range_protection(image_base, image_size, 0);
 #else
 	/*
 	 * Clear protection flags on a whole range of possible
@@ -306,8 +250,9 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
 	 * need to remove possible protection on relocated image
 	 * itself disregarding further relocations.
 	 */
-	adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
-				       KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
+	efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
+					   KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
+					   0);
 #endif
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 10/16] x86/boot: Make console interface more abstract
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (8 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 09/16] efi/libstub: Move helper function to related file Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:23   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 11/16] x86/boot: Split trampoline and pt init code Evgeniy Baskov
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

To be able to extract kernel from EFI, console output functions
need to be replaceable by alternative implementations.

Make all of those functions pointers.
Move serial console code to separate file.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/Makefile       |   2 +-
 arch/x86/boot/compressed/ident_map_64.c |  15 ++-
 arch/x86/boot/compressed/misc.c         | 109 +--------------------
 arch/x86/boot/compressed/misc.h         |  13 ++-
 arch/x86/boot/compressed/putstr.c       | 124 ++++++++++++++++++++++++
 5 files changed, 146 insertions(+), 117 deletions(-)
 create mode 100644 arch/x86/boot/compressed/putstr.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 35ce1a64068b..29411864bfcd 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -92,7 +92,7 @@ $(obj)/misc.o: $(obj)/../voffset.h
 
 vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o \
 	$(obj)/misc.o $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
-	$(obj)/piggy.o $(obj)/cpuflags.o
+	$(obj)/piggy.o $(obj)/cpuflags.o $(obj)/putstr.o
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index c20cd31e665f..c39373687e50 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -89,12 +89,20 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
 static struct x86_mapping_info mapping_info;
 int has_nx;
 
+/*
+ * This points to actual implementation of mapping function.
+ * Either the one down below or the UEFI API wrapper.
+ */
+unsigned long (*kernel_add_identity_map)(unsigned long start,
+					 unsigned long end,
+					 unsigned int flags);
+
 /*
  * Adds the specified range to the identity mappings.
  */
-unsigned long kernel_add_identity_map(unsigned long start,
-				      unsigned long end,
-				      unsigned int flags)
+unsigned long kernel_add_identity_map_(unsigned long start,
+				       unsigned long end,
+				       unsigned int flags)
 {
 	int ret;
 
@@ -136,6 +144,7 @@ void initialize_identity_maps(void *rmode)
 	struct setup_data *sd;
 
 	boot_params = rmode;
+	kernel_add_identity_map = kernel_add_identity_map_;
 
 	/* Exclude the encryption mask from __PHYSICAL_MASK */
 	physical_mask &= ~sme_me_mask;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index d377e434c4e3..e2c0d05ac293 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -53,13 +53,6 @@ struct port_io_ops pio_ops;
 memptr free_mem_ptr;
 memptr free_mem_end_ptr;
 
-static char *vidmem;
-static int vidport;
-
-/* These might be accessed before .bss is cleared, so use .data instead. */
-static int lines __section(".data");
-static int cols __section(".data");
-
 #ifdef CONFIG_KERNEL_GZIP
 #include "../../../../lib/decompress_inflate.c"
 #endif
@@ -92,95 +85,6 @@ static int cols __section(".data");
  * ../header.S.
  */
 
-static void scroll(void)
-{
-	int i;
-
-	memmove(vidmem, vidmem + cols * 2, (lines - 1) * cols * 2);
-	for (i = (lines - 1) * cols * 2; i < lines * cols * 2; i += 2)
-		vidmem[i] = ' ';
-}
-
-#define XMTRDY          0x20
-
-#define TXR             0       /*  Transmit register (WRITE) */
-#define LSR             5       /*  Line Status               */
-static void serial_putchar(int ch)
-{
-	unsigned timeout = 0xffff;
-
-	while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
-		cpu_relax();
-
-	outb(ch, early_serial_base + TXR);
-}
-
-void __putstr(const char *s)
-{
-	int x, y, pos;
-	char c;
-
-	if (early_serial_base) {
-		const char *str = s;
-		while (*str) {
-			if (*str == '\n')
-				serial_putchar('\r');
-			serial_putchar(*str++);
-		}
-	}
-
-	if (lines == 0 || cols == 0)
-		return;
-
-	x = boot_params->screen_info.orig_x;
-	y = boot_params->screen_info.orig_y;
-
-	while ((c = *s++) != '\0') {
-		if (c == '\n') {
-			x = 0;
-			if (++y >= lines) {
-				scroll();
-				y--;
-			}
-		} else {
-			vidmem[(x + cols * y) * 2] = c;
-			if (++x >= cols) {
-				x = 0;
-				if (++y >= lines) {
-					scroll();
-					y--;
-				}
-			}
-		}
-	}
-
-	boot_params->screen_info.orig_x = x;
-	boot_params->screen_info.orig_y = y;
-
-	pos = (x + cols * y) * 2;	/* Update cursor position */
-	outb(14, vidport);
-	outb(0xff & (pos >> 9), vidport+1);
-	outb(15, vidport);
-	outb(0xff & (pos >> 1), vidport+1);
-}
-
-void __puthex(unsigned long value)
-{
-	char alpha[2] = "0";
-	int bits;
-
-	for (bits = sizeof(value) * 8 - 4; bits >= 0; bits -= 4) {
-		unsigned long digit = (value >> bits) & 0xf;
-
-		if (digit < 0xA)
-			alpha[0] = '0' + digit;
-		else
-			alpha[0] = 'a' + (digit - 0xA);
-
-		__putstr(alpha);
-	}
-}
-
 #ifdef CONFIG_X86_NEED_RELOCS
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
@@ -407,17 +311,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 
 	sanitize_boot_params(boot_params);
 
-	if (boot_params->screen_info.orig_video_mode == 7) {
-		vidmem = (char *) 0xb0000;
-		vidport = 0x3b4;
-	} else {
-		vidmem = (char *) 0xb8000;
-		vidport = 0x3d4;
-	}
-
-	lines = boot_params->screen_info.orig_video_lines;
-	cols = boot_params->screen_info.orig_video_cols;
-
 	init_default_io_ops();
 
 	/*
@@ -428,7 +321,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	 */
 	early_tdx_detect();
 
-	console_init();
+	init_bare_console();
 
 	/*
 	 * Save RSDP address for later use. Have this after console_init()
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index a4f99516f310..39dc3de50268 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -53,8 +53,8 @@ extern memptr free_mem_end_ptr;
 void *malloc(int size);
 void free(void *where);
 extern struct boot_params *boot_params;
-void __putstr(const char *s);
-void __puthex(unsigned long value);
+extern void (*__putstr)(const char *s);
+extern void (*__puthex)(unsigned long value);
 #define error_putstr(__x)  __putstr(__x)
 #define error_puthex(__x)  __puthex(__x)
 
@@ -124,6 +124,9 @@ static inline void console_init(void)
 { }
 #endif
 
+/* putstr.c */
+void init_bare_console(void);
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 void sev_enable(struct boot_params *bp);
 void sev_es_shutdown_ghcb(void);
@@ -172,9 +175,9 @@ static inline int count_immovable_mem_regions(void) { return 0; }
 extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
 #endif
 #ifdef CONFIG_X86_64
-extern unsigned long kernel_add_identity_map(unsigned long start,
-					     unsigned long end,
-					     unsigned int flags);
+extern unsigned long (*kernel_add_identity_map)(unsigned long start,
+						unsigned long end,
+						unsigned int flags);
 #else
 static inline unsigned long kernel_add_identity_map(unsigned long start,
 						    unsigned long end,
diff --git a/arch/x86/boot/compressed/putstr.c b/arch/x86/boot/compressed/putstr.c
new file mode 100644
index 000000000000..accba0de8be9
--- /dev/null
+++ b/arch/x86/boot/compressed/putstr.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "misc.h"
+
+/* These might be accessed before .bss is cleared, so use .data instead. */
+static char *vidmem __section(".data");
+static int vidport __section(".data");
+static int lines __section(".data");
+static int cols __section(".data");
+
+void (*__putstr)(const char *s);
+void (*__puthex)(unsigned long value);
+
+static void putstr(const char *s);
+static void puthex(unsigned long value);
+
+void init_bare_console(void)
+{
+	__putstr = putstr;
+	__puthex = puthex;
+
+	if (boot_params->screen_info.orig_video_mode == 7) {
+		vidmem = (char *) 0xb0000;
+		vidport = 0x3b4;
+	} else {
+		vidmem = (char *) 0xb8000;
+		vidport = 0x3d4;
+	}
+
+	lines = boot_params->screen_info.orig_video_lines;
+	cols = boot_params->screen_info.orig_video_cols;
+
+	console_init();
+}
+
+static void scroll(void)
+{
+	int i;
+
+	memmove(vidmem, vidmem + cols * 2, (lines - 1) * cols * 2);
+	for (i = (lines - 1) * cols * 2; i < lines * cols * 2; i += 2)
+		vidmem[i] = ' ';
+}
+
+#define XMTRDY          0x20
+
+#define TXR             0       /*  Transmit register (WRITE) */
+#define LSR             5       /*  Line Status               */
+
+static void serial_putchar(int ch)
+{
+	unsigned int timeout = 0xffff;
+
+	while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
+		cpu_relax();
+
+	outb(ch, early_serial_base + TXR);
+}
+
+static void putstr(const char *s)
+{
+	int x, y, pos;
+	char c;
+
+	if (early_serial_base) {
+		const char *str = s;
+
+		while (*str) {
+			if (*str == '\n')
+				serial_putchar('\r');
+			serial_putchar(*str++);
+		}
+	}
+
+	if (lines == 0 || cols == 0)
+		return;
+
+	x = boot_params->screen_info.orig_x;
+	y = boot_params->screen_info.orig_y;
+
+	while ((c = *s++) != '\0') {
+		if (c == '\n') {
+			x = 0;
+			if (++y >= lines) {
+				scroll();
+				y--;
+			}
+		} else {
+			vidmem[(x + cols * y) * 2] = c;
+			if (++x >= cols) {
+				x = 0;
+				if (++y >= lines) {
+					scroll();
+					y--;
+				}
+			}
+		}
+	}
+
+	boot_params->screen_info.orig_x = x;
+	boot_params->screen_info.orig_y = y;
+
+	pos = (x + cols * y) * 2;	/* Update cursor position */
+	outb(14, vidport);
+	outb(0xff & (pos >> 9), vidport+1);
+	outb(15, vidport);
+	outb(0xff & (pos >> 1), vidport+1);
+}
+
+static void puthex(unsigned long value)
+{
+	char alpha[2] = "0";
+	int bits;
+
+	for (bits = sizeof(value) * 8 - 4; bits >= 0; bits -= 4) {
+		unsigned long digit = (value >> bits) & 0xf;
+
+		if (digit < 0xA)
+			alpha[0] = '0' + digit;
+		else
+			alpha[0] = 'a' + (digit - 0xA);
+
+		putstr(alpha);
+	}
+}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 11/16] x86/boot: Split trampoline and pt init code
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (9 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 10/16] x86/boot: Make console interface more abstract Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-09-06 10:41 ` [PATCH 12/16] x86/boot: Add EFI kernel extraction interface Evgeniy Baskov
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

When allocating trampoline from libstub trampoline allocation is
performed separately, so it needs to be skipped.

Split trampoline initialization and allocation code into two
functions to make them invokable separately.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/pgtable_64.c | 73 +++++++++++++++++----------
 1 file changed, 46 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index c7cf5a1059a8..1f7169248612 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -106,12 +106,8 @@ static unsigned long find_trampoline_placement(void)
 	return bios_start - TRAMPOLINE_32BIT_SIZE;
 }
 
-struct paging_config paging_prepare(void *rmode)
+bool trampoline_pgtable_init(struct boot_params *boot_params)
 {
-	struct paging_config paging_config = {};
-
-	/* Initialize boot_params. Required for cmdline_find_option_bool(). */
-	boot_params = rmode;
 
 	/*
 	 * Check if LA57 is desired and supported.
@@ -125,26 +121,10 @@ struct paging_config paging_prepare(void *rmode)
 	 *
 	 * That's substitute for boot_cpu_has() in early boot code.
 	 */
-	if (IS_ENABLED(CONFIG_X86_5LEVEL) &&
-			!cmdline_find_option_bool("no5lvl") &&
-			native_cpuid_eax(0) >= 7 &&
-			(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
-		paging_config.l5_required = 1;
-	}
-
-	paging_config.trampoline_start = find_trampoline_placement();
-
-	trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
-
-	/* Preserve trampoline memory */
-	memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
-
-	/* Clear trampoline memory first */
-	memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
-
-	/* Copy trampoline code in place */
-	memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
-			&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+	bool l5_required = IS_ENABLED(CONFIG_X86_5LEVEL) &&
+			   !cmdline_find_option_bool("no5lvl") &&
+			   native_cpuid_eax(0) >= 7 &&
+			   (native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)));
 
 	/*
 	 * The code below prepares page table in trampoline memory.
@@ -160,10 +140,10 @@ struct paging_config paging_prepare(void *rmode)
 	 * We are not going to use the page table in trampoline memory if we
 	 * are already in the desired paging mode.
 	 */
-	if (paging_config.l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+	if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
 		goto out;
 
-	if (paging_config.l5_required) {
+	if (l5_required) {
 		/*
 		 * For 4- to 5-level paging transition, set up current CR3 as
 		 * the first and the only entry in a new top-level page table.
@@ -185,6 +165,45 @@ struct paging_config paging_prepare(void *rmode)
 		       (void *)src, PAGE_SIZE);
 	}
 
+out:
+	return l5_required;
+}
+
+struct paging_config paging_prepare(void *rmode)
+{
+	struct paging_config paging_config = {};
+	bool early_trampoline_alloc = 0;
+
+	/* Initialize boot_params. Required for cmdline_find_option_bool(). */
+	boot_params = rmode;
+
+	/*
+	 * We only need to find trampoline placement, if we have
+	 * not already done it from libstub.
+	 */
+
+	paging_config.trampoline_start = find_trampoline_placement();
+	trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
+	early_trampoline_alloc = 0;
+
+	/*
+	 * Preserve trampoline memory.
+	 * When trampoline is located in memory
+	 * owned by us, i.e. allocated in EFISTUB,
+	 * we don't care about previous contents
+	 * of this memory so copying can also be skipped.
+	 */
+	memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
+
+	/* Clear trampoline memory first */
+	memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
+
+	/* Copy trampoline code in place */
+	memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+			&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+
+	paging_config.l5_required = trampoline_pgtable_init(boot_params);
+
 out:
 	return paging_config;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 12/16] x86/boot: Add EFI kernel extraction interface
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (10 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 11/16] x86/boot: Split trampoline and pt init code Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:27   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 13/16] efi/x86: Support extracting kernel from libstub Evgeniy Baskov
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

To enable extraction of kernel image from EFI stub code directly
extraction code needs to have separate interface that avoid part
of low level initialization logic, i.e. serial port setup.

Add kernel extraction function callable from libstub as a part
of preparation for extracting the kernel directly from EFI environment.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/compressed/head_32.S |  3 +-
 arch/x86/boot/compressed/head_64.S |  2 +-
 arch/x86/boot/compressed/misc.c    | 85 +++++++++++++++++++++---------
 arch/x86/boot/compressed/misc.h    |  2 +
 arch/x86/boot/compressed/putstr.c  |  9 ++++
 5 files changed, 73 insertions(+), 28 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 3b354eb9516d..b46a1c4109cf 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -217,8 +217,7 @@ SYM_DATA(image_offset, .long 0)
  */
 	.bss
 	.balign 4
-boot_heap:
-	.fill BOOT_HEAP_SIZE, 1, 0
+SYM_DATA(boot_heap,	.fill BOOT_HEAP_SIZE, 1, 0)
 boot_stack:
 	.fill BOOT_STACK_SIZE, 1, 0
 boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 889ca7176aa7..37ce094571b5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -1007,7 +1007,7 @@ SYM_FUNC_END(startup32_check_sev_cbit)
  */
 	.bss
 	.balign 4
-SYM_DATA_LOCAL(boot_heap,	.fill BOOT_HEAP_SIZE, 1, 0)
+SYM_DATA(boot_heap,	.fill BOOT_HEAP_SIZE, 1, 0)
 
 SYM_DATA_START_LOCAL(boot_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index e2c0d05ac293..8016cc5c300e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -293,11 +293,11 @@ static void parse_elf(void *output, unsigned long output_len,
  *             |-------uncompressed kernel image---------|
  *
  */
-asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
-				  unsigned char *input_data,
-				  unsigned long input_len,
-				  unsigned char *output,
-				  unsigned long output_len)
+static void *do_extract_kernel(void *rmode,
+			       unsigned char *input_data,
+			       unsigned long input_len,
+			       unsigned char *output,
+			       unsigned long output_len)
 {
 	const unsigned long kernel_total_size = VO__end - VO__text;
 	unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
@@ -311,18 +311,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 
 	sanitize_boot_params(boot_params);
 
-	init_default_io_ops();
-
-	/*
-	 * Detect TDX guest environment.
-	 *
-	 * It has to be done before console_init() in order to use
-	 * paravirtualized port I/O operations if needed.
-	 */
-	early_tdx_detect();
-
-	init_bare_console();
-
 	/*
 	 * Save RSDP address for later use. Have this after console_init()
 	 * so that early debugging output from the RSDP parsing code can be
@@ -330,11 +318,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	 */
 	boot_params->acpi_rsdp_addr = get_rsdp_addr();
 
-	debug_putstr("early console in extract_kernel\n");
-
-	free_mem_ptr     = heap;	/* Heap */
-	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
-
 	/*
 	 * The memory hole needed for the kernel is the larger of either
 	 * the entire decompressed kernel plus relocation table, or the
@@ -387,12 +370,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	if (virt_addr & (MIN_KERNEL_ALIGN - 1))
 		error("Destination virtual address inappropriately aligned");
 #ifdef CONFIG_X86_64
-	if (heap > 0x3fffffffffffUL)
+	if (phys_addr > 0x3fffffffffffUL)
 		error("Destination address too large");
 	if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
 		error("Destination virtual address is beyond the kernel mapping area");
 #else
-	if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
+	if (phys_addr > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
 		error("Destination address too large");
 #endif
 #ifndef CONFIG_RELOCATABLE
@@ -406,12 +389,64 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	parse_elf(output, output_len, virt_addr);
 	debug_putstr("done.\nBooting the kernel.\n");
 
+	return output;
+}
+
+asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
+				  unsigned char *input_data,
+				  unsigned long input_len,
+				  unsigned char *output,
+				  unsigned long output_len)
+{
+	void *entry;
+
+	init_default_io_ops();
+
+	/*
+	 * Detect TDX guest environment.
+	 *
+	 * It has to be done before console_init() in order to use
+	 * paravirtualized port I/O operations if needed.
+	 */
+	early_tdx_detect();
+
+	init_bare_console();
+
+	debug_putstr("early console in extract_kernel\n");
+
+	free_mem_ptr     = heap;	/* Heap */
+	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
+
+	entry = do_extract_kernel(rmode, input_data,
+				  input_len, output, output_len);
+
 	/* Disable exception handling before booting the kernel */
 	cleanup_exception_handling();
 
-	return output;
+	return entry;
 }
 
+void *efi_extract_kernel(struct boot_params *rmode,
+			 struct efi_iofunc *iofunc,
+			 unsigned char *input_data,
+			 unsigned long input_len,
+			 unsigned char *output,
+			 unsigned long output_len)
+{
+	extern char boot_heap[BOOT_HEAP_SIZE];
+
+	free_mem_ptr     = (unsigned long)boot_heap;	/* Heap */
+	free_mem_end_ptr = (unsigned long)boot_heap + BOOT_HEAP_SIZE;
+
+	init_efi_console(iofunc);
+
+	return do_extract_kernel(rmode, input_data,
+				 input_len, output, output_len);
+}
+
+
+
+
 void fortify_panic(const char *name)
 {
 	error("detected buffer overflow");
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 39dc3de50268..b5aa0af6c59e 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -26,6 +26,7 @@
 #include <asm/boot.h>
 #include <asm/bootparam.h>
 #include <asm/desc_defs.h>
+#include <asm/shared/extract.h>
 
 #include "tdx.h"
 
@@ -126,6 +127,7 @@ static inline void console_init(void)
 
 /* putstr.c */
 void init_bare_console(void);
+void init_efi_console(struct efi_iofunc *iofunc);
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 void sev_enable(struct boot_params *bp);
diff --git a/arch/x86/boot/compressed/putstr.c b/arch/x86/boot/compressed/putstr.c
index accba0de8be9..238d9677df61 100644
--- a/arch/x86/boot/compressed/putstr.c
+++ b/arch/x86/boot/compressed/putstr.c
@@ -32,6 +32,15 @@ void init_bare_console(void)
 	console_init();
 }
 
+void init_efi_console(struct efi_iofunc *iofunc)
+{
+	__putstr = iofunc->putstr;
+	__puthex = iofunc->puthex;
+#ifdef CONFIG_X86_64
+	kernel_add_identity_map = iofunc->map_range;
+#endif
+}
+
 static void scroll(void)
 {
 	int i;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 13/16] efi/x86: Support extracting kernel from libstub
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (11 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 12/16] x86/boot: Add EFI kernel extraction interface Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:35   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 14/16] x86/build: Make generated PE more spec compliant Evgeniy Baskov
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Doing it that way allows setting up stricter memory attributes,
simplifies boot code path and removes potential relocation
of kernel image.

Wire up required interfaces and minimally initialize zero page
fields needed for it to function correctly.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

 create mode 100644 arch/x86/include/asm/shared/extract.h
 create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
---
 arch/x86/boot/compressed/head_32.S            |   6 +-
 arch/x86/boot/compressed/head_64.S            |  45 ++++
 arch/x86/include/asm/shared/extract.h         |  25 ++
 drivers/firmware/efi/Kconfig                  |  14 ++
 drivers/firmware/efi/libstub/Makefile         |   1 +
 drivers/firmware/efi/libstub/efistub.h        |   5 +
 .../firmware/efi/libstub/x86-extract-direct.c | 220 ++++++++++++++++++
 drivers/firmware/efi/libstub/x86-stub.c       |  45 ++--
 8 files changed, 343 insertions(+), 18 deletions(-)
 create mode 100644 arch/x86/include/asm/shared/extract.h
 create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index b46a1c4109cf..d2866f06bc9f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -155,7 +155,11 @@ SYM_FUNC_START(efi32_stub_entry)
 	add	$0x4, %esp
 	movl	8(%esp), %esi	/* save boot_params pointer */
 	call	efi_main
-	/* efi_main returns the possibly relocated address of startup_32 */
+
+	/*
+	 * efi_main returns the possibly
+	 * relocated address of exteracted kernel entry point.
+	 */
 	jmp	*%eax
 SYM_FUNC_END(efi32_stub_entry)
 SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 37ce094571b5..b6bae8e7ee71 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -555,9 +555,54 @@ SYM_FUNC_START(efi64_stub_entry)
 	and	$~0xf, %rsp			/* realign the stack */
 	movq	%rdx, %rbx			/* save boot_params pointer */
 	call	efi_main
+
+#ifdef CONFIG_EFI_STUB_EXTRACT_DIRECT
+	cld
+	cli
+
+	movq	%rbx, %rdi /* boot_params */
+	movq	%rax, %rsi /* decompressed kernel address */
+
+	/* Make sure we have GDT with 32-bit code segment */
+	leaq	gdt64(%rip), %rax
+	addq	%rax, 2(%rax)
+	lgdt	(%rax)
+
+	/* Setup data segments. */
+	xorl	%eax, %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %ss
+	movl	%eax, %fs
+	movl	%eax, %gs
+
+	pushq	%rsi
+	pushq	%rdi
+
+	call startup32_enable_nx_if_supported
+
+	call	trampoline_pgtable_init
+	movq	%rax, %rdx
+
+
+	/* Swap %rsi and %rsi */
+	popq	%rsi
+	popq	%rdi
+
+	/* Save the trampoline address in RCX */
+	movq	trampoline_32bit(%rip), %rcx
+
+	/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
+	pushq	$__KERNEL32_CS
+	leaq	TRAMPOLINE_32BIT_CODE_OFFSET(%rcx), %rax
+	pushq	%rax
+	lretq
+#else
 	movq	%rbx,%rsi
 	leaq	rva(startup_64)(%rax), %rax
 	jmp	*%rax
+#endif
+
 SYM_FUNC_END(efi64_stub_entry)
 SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
 #endif
diff --git a/arch/x86/include/asm/shared/extract.h b/arch/x86/include/asm/shared/extract.h
new file mode 100644
index 000000000000..163678145884
--- /dev/null
+++ b/arch/x86/include/asm/shared/extract.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ASM_SHARED_EXTRACT_H
+#define ASM_SHARED_EXTRACT_H
+
+#define MAP_WRITE	0x02 /* Writable memory */
+#define MAP_EXEC	0x04 /* Executable memory */
+#define MAP_ALLOC	0x10 /* Range needs to be allocated */
+#define MAP_PROTECT	0x20 /* Set exact memory attributes for memory range */
+
+struct efi_iofunc {
+	void (*putstr)(const char *msg);
+	void (*puthex)(unsigned long x);
+	unsigned long (*map_range)(unsigned long start,
+				   unsigned long end,
+				   unsigned int flags);
+};
+
+void *efi_extract_kernel(struct boot_params *rmode,
+			 struct efi_iofunc *iofunc,
+			 unsigned char *input_data,
+			 unsigned long input_len,
+			 unsigned char *output,
+			 unsigned long output_len);
+
+#endif /* ASM_SHARED_EXTRACT_H */
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 6cb7384ad2ac..2418402a0bda 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -91,6 +91,20 @@ config EFI_DXE_MEM_ATTRIBUTES
 	  Use DXE services to check and alter memory protection
 	  attributes during boot via EFISTUB to ensure that memory
 	  ranges used by the kernel are writable and executable.
+	  This option also enables stricter memory attributes
+	  on compressed kernel PE image.
+
+config EFI_STUB_EXTRACT_DIRECT
+	bool "Extract kernel directly from UEFI environment"
+	depends on EFI && EFI_STUB && X86_64
+	default y
+	help
+	  Extract kernel before exiting EFI boot services
+	  This allows maintaining W^X for kernel image for
+	  the whole execution of compressed kernel code.
+	  This also slightly improves efficiency of extraction
+	  code by removing the need to copy the kernel around
+	  and rebuild page tables.
 
 config EFI_PARAMS_FROM_FDT
 	bool
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e..1cea7d913356 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -69,6 +69,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)	+= efi-stub.o fdt.o string.o \
 lib-$(CONFIG_ARM)		+= arm32-stub.o
 lib-$(CONFIG_ARM64)		+= arm64-stub.o
 lib-$(CONFIG_X86)		+= x86-stub.o
+lib-$(CONFIG_EFI_STUB_EXTRACT_DIRECT)	+= x86-extract-direct.o
 lib-$(CONFIG_RISCV)		+= riscv-stub.o
 CFLAGS_arm32-stub.o		:= -DTEXT_OFFSET=$(TEXT_OFFSET)
 
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 22fe28385db7..cdd1bb50c786 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -968,6 +968,11 @@ static inline void
 efi_enable_reset_attack_mitigation(void) { }
 #endif
 
+#ifdef CONFIG_X86
+unsigned long extract_kernel_direct(struct boot_params *boot_params);
+void startup_32(struct boot_params *boot_params);
+#endif
+
 void efi_retrieve_tpm2_eventlog(void);
 
 #endif
diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c b/drivers/firmware/efi/libstub/x86-extract-direct.c
new file mode 100644
index 000000000000..6076bd75cfd6
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
@@ -0,0 +1,220 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/acpi.h>
+#include <linux/efi.h>
+#include <linux/elf.h>
+#include <linux/stddef.h>
+
+#include <asm/efi.h>
+#include <asm/e820/types.h>
+#include <asm/desc.h>
+#include <asm/boot.h>
+#include <asm/bootparam_utils.h>
+#include <asm/shared/extract.h>
+#include <asm/shared/pgtable.h>
+
+#include "efistub.h"
+
+static void do_puthex(unsigned long value);
+static void do_putstr(const char *msg);
+
+static unsigned long do_map_range(unsigned long start,
+				  unsigned long end,
+				  unsigned int flags)
+{
+	efi_status_t status;
+
+	unsigned long size = end - start;
+
+	if (flags & MAP_ALLOC) {
+		if (start == (unsigned long)startup_32)
+			start = LOAD_PHYSICAL_ADDR;
+
+		unsigned long addr;
+
+		status = efi_low_alloc_above(size, CONFIG_PHYSICAL_ALIGN,
+					     &addr, start);
+		if (status != EFI_SUCCESS)
+			efi_err("Unable to allocate memory for uncompressed kernel");
+
+		if (start != addr) {
+			efi_debug("Unable to allocate at given address"
+				  " (desired=0x%lx, actual=0x%lx)",
+				  (unsigned long)start, addr);
+			start = addr;
+		}
+	}
+
+	if (flags & (MAP_PROTECT | MAP_ALLOC)) {
+		unsigned long attr = 0;
+
+		if (!(flags & MAP_EXEC))
+			attr |= EFI_MEMORY_XP;
+
+		if (!(flags & MAP_WRITE))
+			attr |= EFI_MEMORY_RO;
+
+		status = efi_adjust_memory_range_protection(start,
+							    end - start,
+							    attr);
+		if (status != EFI_SUCCESS)
+			efi_err("Unable to protect memory range");
+	}
+
+	return start;
+}
+
+/*
+ * Trampoline takes 3 pages and can be loaded in first megabyte of memory
+ * with its end placed between 0 and 640k where BIOS might start.
+ * (see arch/x86/boot/compressed/pgtable_64.c)
+ */
+
+#ifdef CONFIG_64BIT
+static efi_status_t prepare_trampoline(void)
+{
+	efi_status_t status;
+
+	status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
+				    (unsigned long *)&trampoline_32bit,
+				    TRAMPOLINE_32BIT_PLACEMENT_MAX);
+
+	if (status != EFI_SUCCESS)
+		return status;
+
+	unsigned long trampoline_start = (unsigned long)trampoline_32bit;
+
+	memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
+
+	/* First page of trampoline is a top level page table */
+	efi_adjust_memory_range_protection(trampoline_start,
+					   PAGE_SIZE,
+					   EFI_MEMORY_XP);
+
+	/* Second page of trampoline is the code (with a padding) */
+
+	void *caddr = (void *)trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET;
+
+	memcpy(caddr, trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+
+	efi_adjust_memory_range_protection((unsigned long)caddr,
+					   PAGE_SIZE,
+					   EFI_MEMORY_RO);
+
+	/* And the last page of trampoline is the stack */
+
+	efi_adjust_memory_range_protection(trampoline_start + 2 * PAGE_SIZE,
+					   PAGE_SIZE,
+					   EFI_MEMORY_XP);
+
+	return EFI_SUCCESS;
+}
+#else
+static inline efi_status_t prepare_trampoline(void)
+{
+	return EFI_SUCCESS;
+}
+#endif
+
+static efi_status_t init_loader_data(struct boot_params *params)
+{
+	struct efi_info *efi = (void *)&params->efi_info;
+	efi_status_t status;
+
+	unsigned long map_size, desc_size, buff_size;
+	u32 desc_ver;
+	efi_memory_desc_t *mmap;
+
+	struct efi_boot_memmap map = {
+		.map		= &mmap,
+		.map_size	= &map_size,
+		.desc_size	= &desc_size,
+		.desc_ver	= &desc_ver,
+		.key_ptr	= NULL,
+		.buff_size	= &buff_size,
+	};
+
+	status = efi_get_memory_map(&map);
+
+	if (status != EFI_SUCCESS) {
+		efi_err("Unable to get EFI memory map...\n");
+		return status;
+	}
+
+	const char *signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
+					       : EFI32_LOADER_SIGNATURE;
+
+	memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
+
+	efi->efi_memdesc_size = desc_size;
+	efi->efi_memdesc_version = desc_ver;
+	efi->efi_memmap_size = map_size;
+
+	efi_set_u64_split((unsigned long)mmap,
+			  &efi->efi_memmap, &efi->efi_memmap_hi);
+
+	efi_set_u64_split((unsigned long)efi_system_table,
+			  &efi->efi_systab, &efi->efi_systab_hi);
+
+	return EFI_SUCCESS;
+}
+
+static void free_loader_data(struct boot_params *params)
+{
+	struct efi_info *efi = (void *)&params->efi_info;
+	unsigned long mmap = efi->efi_memmap;
+
+#ifdef CONFIG_64BIT
+	mmap |= ((unsigned long)efi->efi_memmap_hi << 32);
+#endif
+
+	efi_bs_call(free_pool, (void *)mmap);
+
+	efi->efi_memdesc_size = 0;
+	efi->efi_memdesc_version = 0;
+	efi->efi_memmap_size = 0;
+	efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
+}
+
+unsigned long extract_kernel_direct(struct boot_params *params)
+{
+
+	extern unsigned char input_data[];
+	extern unsigned int output_len, input_len;
+
+	void *res;
+	efi_status_t status;
+	struct efi_iofunc iof = { 0 };
+
+	status = prepare_trampoline();
+
+	if (status != EFI_SUCCESS)
+		return 0;
+
+	/* Prepare environment for do_extract_kernel() call */
+	status = init_loader_data(params);
+
+	if (status != EFI_SUCCESS)
+		return 0;
+
+	iof.puthex = do_puthex;
+	iof.putstr = do_putstr;
+	iof.map_range = do_map_range;
+
+	res = efi_extract_kernel(params, &iof, input_data, input_len,
+				 (unsigned char *)startup_32, output_len);
+
+	free_loader_data(params);
+
+	return (unsigned long)res;
+}
+
+static void do_puthex(unsigned long value)
+{
+	efi_printk("%08lx", value);
+}
+
+static void do_putstr(const char *msg)
+{
+	efi_printk("%s", msg);
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 678f9c2ccafc..680184034cb7 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -230,26 +230,25 @@ static void
 setup_memory_protection(unsigned long image_base, unsigned long image_size)
 {
 	/*
-	 * Allow execution of possible trampoline used
-	 * for switching between 4- and 5-level page tables
-	 * and relocated kernel image.
-	 */
+	* Allow execution of possible trampoline used
+	* for switching between 4- and 5-level page tables
+	* and relocated kernel image.
+	*/
 
 	efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
 					   TRAMPOLINE_PLACEMENT_SIZE, 0);
 
 #ifdef CONFIG_64BIT
-	if (image_base != (unsigned long)startup_32)
-		efi_adjust_memory_range_protection(image_base, image_size, 0);
+	efi_adjust_memory_range_protection(image_base, image_size, 0);
 #else
 	/*
-	 * Clear protection flags on a whole range of possible
-	 * addresses used for KASLR. We don't need to do that
-	 * on x86_64, since KASLR/extraction is performed after
-	 * dedicated identity page tables are built and we only
-	 * need to remove possible protection on relocated image
-	 * itself disregarding further relocations.
-	 */
+	* Clear protection flags on a whole range of possible
+	* addresses used for KASLR. We don't need to do that
+	* on x86_64, since KASLR/extraction is performed after
+	* dedicated identity page tables are built and we only
+	* need to remove possible protection on relocated image
+	* itself disregarding further relocations.
+	*/
 	efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
 					   KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
 					   0);
@@ -270,8 +269,10 @@ static void setup_quirks(struct boot_params *boot_params,
 			retrieve_apple_device_properties(boot_params);
 	}
 
-	if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
+	if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES) &&
+	    !IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT)) {
 		setup_memory_protection(image_base, image_size);
+	}
 }
 
 /*
@@ -710,8 +711,10 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
 }
 
 /*
- * On success, we return the address of startup_32, which has potentially been
- * relocated by efi_relocate_kernel.
+ * On success, we return:
+ *   - the address of startup_32, which has potentially been
+ *     relocated by efi_relocate_kernel, if libstub direct extraction is disabled.
+ *   - extracted kernel entry point if libstub direct extraction is enabled.
  * On failure, we exit to the firmware via efi_exit instead of returning.
  */
 unsigned long efi_main(efi_handle_t handle,
@@ -736,6 +739,7 @@ unsigned long efi_main(efi_handle_t handle,
 		efi_dxe_table = NULL;
 	}
 
+#ifndef CONFIG_EFI_STUB_EXTRACT_DIRECT
 	/*
 	 * If the kernel isn't already loaded at a suitable address,
 	 * relocate it.
@@ -789,6 +793,7 @@ unsigned long efi_main(efi_handle_t handle,
 		 */
 		image_offset = 0;
 	}
+#endif
 
 #ifdef CONFIG_CMDLINE_BOOL
 	status = efi_parse_options(CONFIG_CMDLINE);
@@ -845,7 +850,13 @@ unsigned long efi_main(efi_handle_t handle,
 
 	setup_efi_pci(boot_params);
 
-	setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+	setup_quirks(boot_params, buffer_start, buffer_end - buffer_start);
+
+#ifdef CONFIG_EFI_STUB_EXTRACT_DIRECT
+	bzimage_addr = extract_kernel_direct(boot_params);
+	if (!bzimage_addr)
+		goto fail;
+#endif
 
 	status = exit_boot(boot_params, handle);
 	if (status != EFI_SUCCESS) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 14/16] x86/build: Make generated PE more spec compliant
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (12 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 13/16] efi/x86: Support extracting kernel from libstub Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:39   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 15/16] efi/libstub: Add memory attribute protocol definitions Evgeniy Baskov
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Currently kernel image is not fully compliant PE image, so it may
fail to boot with stricter implementations of UEFI PE loaders.

Set minimal alignments and sizes specified by PE documentation [1]
referenced by UEFI specification [2]. Align PE header to 8 bytes.
Generate '.reloc' section with 2 entries and set reloc data directory.

To make code more readable refactor tools/build.c:
	- Use mmap() to access kernel image.
	- Generate sections dynamically.
	- Setup sections protection. Since we cannot fit every
	  needed section, set a part of protection flags
	  dynamically during initialization. This step is omitted
	  if CONFIG_EFI_DXE_MEM_ATTRIBUTES is not set.

Reduce boot sector error message size since the space for the PE header
before the zero page beginning is constrained.

Explicitly change sections permissions in efi_pe_entry in case
of incorrect EFI implementations and to reduce access rights to
compressed kernel blob. By default it is set executable due to
restriction in maximum number of sections that can fit before zero
page.

[1] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
[2] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/boot/Makefile                  |   2 +-
 arch/x86/boot/header.S                  | 110 +----
 arch/x86/boot/tools/build.c             | 575 +++++++++++++++---------
 drivers/firmware/efi/libstub/x86-stub.c |  63 ++-
 4 files changed, 452 insertions(+), 298 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index ffec8bb01ba8..828eb41c2603 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -90,7 +90,7 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|efi32_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|efi32_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
 
 quiet_cmd_zoffset = ZOFFSET $@
       cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index f912d7770130..05a75f0a1876 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -59,17 +59,16 @@ start2:
 	cld
 
 	movw	$bugger_off_msg, %si
+	movw	$bugger_off_msg_size, %cx
 
 msg_loop:
 	lodsb
-	andb	%al, %al
-	jz	bs_die
 	movb	$0xe, %ah
 	movw	$7, %bx
 	int	$0x10
-	jmp	msg_loop
+	decw	%cx
+	jnz	msg_loop
 
-bs_die:
 	# Allow the user to press a key, then reboot
 	xorw	%ax, %ax
 	int	$0x16
@@ -89,12 +88,12 @@ bs_die:
 
 	.section ".bsdata", "a"
 bugger_off_msg:
-	.ascii	"Use a boot loader.\r\n"
-	.ascii	"\n"
-	.ascii	"Remove disk and press any key to reboot...\r\n"
-	.byte	0
+	.ascii	"Use a boot loader. "
+	.ascii	"Press a key to reboot"
+	.set bugger_off_msg_size, . - bugger_off_msg
 
 #ifdef CONFIG_EFI_STUB
+	.align 8
 pe_header:
 	.long	PE_MAGIC
 
@@ -108,7 +107,7 @@ coff_header:
 	.set	pe_opt_magic, PE_OPT_MAGIC_PE32PLUS
 	.word	IMAGE_FILE_MACHINE_AMD64
 #endif
-	.word	section_count			# nr_sections
+	.word	0				# nr_sections
 	.long	0 				# TimeDateStamp
 	.long	0				# PointerToSymbolTable
 	.long	1				# NumberOfSymbols
@@ -132,7 +131,7 @@ optional_header:
 	# Filled in by build.c
 	.long	0x0000				# AddressOfEntryPoint
 
-	.long	0x0200				# BaseOfCode
+	.long	0x1000				# BaseOfCode
 #ifdef CONFIG_X86_32
 	.long	0				# data
 #endif
@@ -145,8 +144,8 @@ extra_header_fields:
 #else
 	.quad	image_base			# ImageBase
 #endif
-	.long	0x20				# SectionAlignment
-	.long	0x20				# FileAlignment
+	.long	0x1000				# SectionAlignment
+	.long	0x200				# FileAlignment
 	.word	0				# MajorOperatingSystemVersion
 	.word	0				# MinorOperatingSystemVersion
 	.word	LINUX_EFISTUB_MAJOR_VERSION	# MajorImageVersion
@@ -189,91 +188,14 @@ extra_header_fields:
 	.quad	0				# CertificationTable
 	.quad	0				# BaseRelocationTable
 
-	# Section table
-section_table:
-	#
-	# The offset & size fields are filled in by build.c.
-	#
-	.ascii	".setup"
-	.byte	0
-	.byte	0
-	.long	0
-	.long	0x0				# startup_{32,64}
-	.long	0				# Size of initialized data
-						# on disk
-	.long	0x0				# startup_{32,64}
-	.long	0				# PointerToRelocations
-	.long	0				# PointerToLineNumbers
-	.word	0				# NumberOfRelocations
-	.word	0				# NumberOfLineNumbers
-	.long	IMAGE_SCN_CNT_CODE		| \
-		IMAGE_SCN_MEM_READ		| \
-		IMAGE_SCN_MEM_EXECUTE		| \
-		IMAGE_SCN_ALIGN_16BYTES		# Characteristics
-
-	#
-	# The EFI application loader requires a relocation section
-	# because EFI applications must be relocatable. The .reloc
-	# offset & size fields are filled in by build.c.
 	#
-	.ascii	".reloc"
-	.byte	0
-	.byte	0
-	.long	0
-	.long	0
-	.long	0				# SizeOfRawData
-	.long	0				# PointerToRawData
-	.long	0				# PointerToRelocations
-	.long	0				# PointerToLineNumbers
-	.word	0				# NumberOfRelocations
-	.word	0				# NumberOfLineNumbers
-	.long	IMAGE_SCN_CNT_INITIALIZED_DATA	| \
-		IMAGE_SCN_MEM_READ		| \
-		IMAGE_SCN_MEM_DISCARDABLE	| \
-		IMAGE_SCN_ALIGN_1BYTES		# Characteristics
-
-#ifdef CONFIG_EFI_MIXED
-	#
-	# The offset & size fields are filled in by build.c.
+	# Section table
+	# It is generated by build.c and here we just need
+	# to reserve some space for sections
 	#
-	.asciz	".compat"
-	.long	0
-	.long	0x0
-	.long	0				# Size of initialized data
-						# on disk
-	.long	0x0
-	.long	0				# PointerToRelocations
-	.long	0				# PointerToLineNumbers
-	.word	0				# NumberOfRelocations
-	.word	0				# NumberOfLineNumbers
-	.long	IMAGE_SCN_CNT_INITIALIZED_DATA	| \
-		IMAGE_SCN_MEM_READ		| \
-		IMAGE_SCN_MEM_DISCARDABLE	| \
-		IMAGE_SCN_ALIGN_1BYTES		# Characteristics
-#endif
+section_table:
+	.fill 40*5, 1, 0
 
-	#
-	# The offset & size fields are filled in by build.c.
-	#
-	.ascii	".text"
-	.byte	0
-	.byte	0
-	.byte	0
-	.long	0
-	.long	0x0				# startup_{32,64}
-	.long	0				# Size of initialized data
-						# on disk
-	.long	0x0				# startup_{32,64}
-	.long	0				# PointerToRelocations
-	.long	0				# PointerToLineNumbers
-	.word	0				# NumberOfRelocations
-	.word	0				# NumberOfLineNumbers
-	.long	IMAGE_SCN_CNT_CODE		| \
-		IMAGE_SCN_MEM_READ		| \
-		IMAGE_SCN_MEM_EXECUTE		| \
-		IMAGE_SCN_ALIGN_16BYTES		# Characteristics
-
-	.set	section_count, (. - section_table) / 40
 #endif /* CONFIG_EFI_STUB */
 
 	# Kernel attributes; used by setup.  This is part 1 of the
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index a3725ad46c5a..dc3a1efb290e 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -40,6 +40,8 @@ typedef unsigned char  u8;
 typedef unsigned short u16;
 typedef unsigned int   u32;
 
+#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
+
 #define DEFAULT_MAJOR_ROOT 0
 #define DEFAULT_MINOR_ROOT 0
 #define DEFAULT_ROOT_DEV (DEFAULT_MAJOR_ROOT << 8 | DEFAULT_MINOR_ROOT)
@@ -59,12 +61,74 @@ u8 buf[SETUP_SECT_MAX*512];
 #define PECOFF_COMPAT_RESERVE 0x0
 #endif
 
+#define PARAGRAPH_SIZE 16
+#define SECTOR_SIZE 512
+#define FILE_ALIGNMENT 512
+#define SECTION_ALIGNMENT 4096
+
+#define RELOC_SECTION_SIZE 12
+
+#ifdef CONFIG_EFI_MIXED
+#define COMPAT_SECTION_SIZE 8
+#else
+#define COMPAT_SECTION_SIZE 0
+#endif
+
+#define DOS_PECOFF_HEADER_OFFSET 0x3c
+
+#define PECOFF_CODE_SIZE_OFFSET 0x1c
+#define PECOFF_DATA_SIZE_OFFSET 0x20
+#define PECOFF_IMAGE_SIZE_OFFSET 0x50
+#define PECOFF_ENTRY_POINT_OFFSET 0x28
+#define PECOFF_SECTIONS_COUNT_OFFSET 0x6
+#define PECOFF_BASE_OF_CODE_OFFSET 0x2c
+
+#ifdef CONFIG_X86_32
+#define PECOFF_SECTION_TABLE_OFFSET 0xa8
+#define PECOFF_RELOC_DIR_OFFSET 0xa0
+#else
+#define PECOFF_SECTION_TABLE_OFFSET 0xb8
+#define PECOFF_RELOC_DIR_OFFSET 0xb0
+#endif
+
+#define PECOFF_SECTION_SIZE 0x28
+
+#define PECOFF_SCN_NAME_OFFSET 0x0
+#define PECOFF_SCN_NAME_SIZE 8
+#define PECOFF_SCN_MEMSZ_OFFSET 0x8
+#define PECOFF_SCN_VADDR_OFFSET 0xc
+#define PECOFF_SCN_FILESZ_OFFSET 0x10
+#define PECOFF_SCN_OFFSET_OFFSET 0x14
+#define PECOFF_SCN_FLAGS_OFFSET 0x24
+
+#define IMAGE_SCN_CNT_CODE	0x00000020 /* .text */
+#define IMAGE_SCN_CNT_INITIALIZED_DATA 0x00000040 /* .data */
+#define IMAGE_SCN_ALIGN_512BYTES 0x00a00000
+#define IMAGE_SCN_ALIGN_4096BYTES 0x00d00000
+#define IMAGE_SCN_MEM_DISCARDABLE 0x02000000 /* scn can be discarded */
+#define IMAGE_SCN_MEM_EXECUTE	0x20000000 /* can be executed as code */
+#define IMAGE_SCN_MEM_READ	0x40000000 /* readable */
+#define IMAGE_SCN_MEM_WRITE	0x80000000 /* writeable */
+
+#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
+#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | IMAGE_SCN_ALIGN_4096BYTES)
+#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
+#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
+#else
+/* With memory protection disabled all sections are RWX */
+#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
+		IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
+#define SCN_RX SCN_RW
+#define SCN_RO SCN_RW
+#endif
+
 static unsigned long efi32_stub_entry;
 static unsigned long efi64_stub_entry;
 static unsigned long efi_pe_entry;
 static unsigned long efi32_pe_entry;
 static unsigned long kernel_info;
 static unsigned long startup_64;
+static unsigned long _rodata;
 static unsigned long _ehead;
 static unsigned long _end;
 
@@ -152,91 +216,126 @@ static void usage(void)
 	die("Usage: build setup system zoffset.h image");
 }
 
-#ifdef CONFIG_EFI_STUB
-
-static void update_pecoff_section_header_fields(char *section_name, u32 vma, u32 size, u32 datasz, u32 offset)
+static void *map_file(const char *path, size_t *psize)
 {
-	unsigned int pe_header;
-	unsigned short num_sections;
-	u8 *section;
-
-	pe_header = get_unaligned_le32(&buf[0x3c]);
-	num_sections = get_unaligned_le16(&buf[pe_header + 6]);
-
-#ifdef CONFIG_X86_32
-	section = &buf[pe_header + 0xa8];
-#else
-	section = &buf[pe_header + 0xb8];
-#endif
-
-	while (num_sections > 0) {
-		if (strncmp((char*)section, section_name, 8) == 0) {
-			/* section header size field */
-			put_unaligned_le32(size, section + 0x8);
+	struct stat statbuf;
+	size_t size;
+	void *addr;
+	int fd;
 
-			/* section header vma field */
-			put_unaligned_le32(vma, section + 0xc);
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		die("Unable to open `%s': %m", path);
+	if (fstat(fd, &statbuf))
+		die("Unable to stat `%s': %m", path);
 
-			/* section header 'size of initialised data' field */
-			put_unaligned_le32(datasz, section + 0x10);
+	size = statbuf.st_size;
+	/*
+	 * Map one byte more, to allow adding null-terminator
+	 * for text files.
+	 */
+	addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	if (addr == MAP_FAILED)
+		die("Unable to mmap '%s': %m", path);
 
-			/* section header 'file offset' field */
-			put_unaligned_le32(offset, section + 0x14);
+	close(fd);
 
-			break;
-		}
-		section += 0x28;
-		num_sections--;
-	}
+	*psize = size;
+	return addr;
 }
 
-static void update_pecoff_section_header(char *section_name, u32 offset, u32 size)
+static void unmap_file(void *addr, size_t size)
 {
-	update_pecoff_section_header_fields(section_name, offset, size, size, offset);
+	munmap(addr, size + 1);
 }
 
-static void update_pecoff_setup_and_reloc(unsigned int size)
+static void *map_output_file(const char *path, size_t size)
 {
-	u32 setup_offset = 0x200;
-	u32 reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
-#ifdef CONFIG_EFI_MIXED
-	u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
-#endif
-	u32 setup_size = reloc_offset - setup_offset;
+	void *addr;
+	int fd;
 
-	update_pecoff_section_header(".setup", setup_offset, setup_size);
-	update_pecoff_section_header(".reloc", reloc_offset, PECOFF_RELOC_RESERVE);
+	fd = open(path, O_RDWR | O_CREAT, 0660);
+	if (fd < 0)
+		die("Unable to create `%s': %m", path);
 
-	/*
-	 * Modify .reloc section contents with a single entry. The
-	 * relocation is applied to offset 10 of the relocation section.
-	 */
-	put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
-	put_unaligned_le32(10, &buf[reloc_offset + 4]);
+	if (ftruncate(fd, size))
+		die("Unable to resize `%s': %m", path);
 
-#ifdef CONFIG_EFI_MIXED
-	update_pecoff_section_header(".compat", compat_offset, PECOFF_COMPAT_RESERVE);
+	addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	if (addr == MAP_FAILED)
+		die("Unable to mmap '%s': %m", path);
 
-	/*
-	 * Put the IA-32 machine type (0x14c) and the associated entry point
-	 * address in the .compat section, so loaders can figure out which other
-	 * execution modes this image supports.
-	 */
-	buf[compat_offset] = 0x1;
-	buf[compat_offset + 1] = 0x8;
-	put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
-	put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset + 4]);
-#endif
+	return addr;
 }
 
-static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
-			       unsigned int init_sz)
+#ifdef CONFIG_EFI_STUB
+
+static unsigned int reloc_offset;
+static unsigned int compat_offset;
+
+#define MAX_SECTIONS 5
+
+static void emit_pecoff_section(const char *section_name, unsigned int size,
+				unsigned int bss, unsigned int *file_offset,
+				unsigned int *mem_offset, u32 flags)
 {
+	unsigned int section_memsz, section_filesz;
 	unsigned int pe_header;
-	unsigned int text_sz = file_sz - text_start;
-	unsigned int bss_sz = init_sz - file_sz;
+	unsigned short num_sections;
+	u8 *pnum_sections;
+	u8 *section;
+
+	pe_header = get_unaligned_le32(&buf[DOS_PECOFF_HEADER_OFFSET]);
+	pnum_sections = &buf[pe_header + PECOFF_SECTIONS_COUNT_OFFSET];
+	num_sections = get_unaligned_le16(pnum_sections);
+	if (num_sections >= MAX_SECTIONS)
+		die("Not enough space to generate all sections");
+
+	section = &buf[pe_header + PECOFF_SECTION_TABLE_OFFSET];
+	section += PECOFF_SECTION_SIZE * num_sections;
+
+	if ((size & (FILE_ALIGNMENT - 1)) || (bss & (FILE_ALIGNMENT - 1)))
+		die("Section '%s' is improperly aligned", section_name);
+
+	section_memsz = round_up(size + bss, SECTION_ALIGNMENT);
+	section_filesz = round_up(size, FILE_ALIGNMENT);
+
+	/* Zero out all section fields */
+	memset(section, 0, PECOFF_SECTION_SIZE);
+
+	/* Section header size field */
+	strncpy((char *)(section + PECOFF_SCN_NAME_OFFSET),
+		section_name, PECOFF_SCN_NAME_SIZE);
 
-	pe_header = get_unaligned_le32(&buf[0x3c]);
+	put_unaligned_le32(section_memsz, section + PECOFF_SCN_MEMSZ_OFFSET);
+	put_unaligned_le32(*mem_offset, section + PECOFF_SCN_VADDR_OFFSET);
+	put_unaligned_le32(section_filesz, section + PECOFF_SCN_FILESZ_OFFSET);
+	put_unaligned_le32(*file_offset, section + PECOFF_SCN_OFFSET_OFFSET);
+	put_unaligned_le32(flags, section + PECOFF_SCN_FLAGS_OFFSET);
+
+	put_unaligned_le16(num_sections + 1, pnum_sections);
+
+	*mem_offset += section_memsz;
+	*file_offset += section_filesz;
+
+}
+
+#define BASE_RVA 0x1000
+
+static unsigned int update_pecoff_sections(unsigned int setup_size,
+					   unsigned int file_size,
+					   unsigned int init_size,
+					   unsigned int text_size)
+{
+	/* First section starts at 512 byes, after PE header */
+	unsigned int mem_offset = BASE_RVA, file_offset = SECTOR_SIZE;
+	unsigned int compat_size, reloc_size, image_size, text_rva;
+	unsigned int pe_header, bss_size, text_rva_diff, reloc_rva;
+
+	pe_header = get_unaligned_le32(&buf[DOS_PECOFF_HEADER_OFFSET]);
+
+	if (get_unaligned_le32(&buf[pe_header + PECOFF_SECTIONS_COUNT_OFFSET]))
+		die("Some sections present in PE file");
 
 	/*
 	 * The PE/COFF loader may load the image at an address which is
@@ -247,42 +346,121 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
 	 * add slack to allow the buffer to be aligned within the declared size
 	 * of the image.
 	 */
-	bss_sz	+= CONFIG_PHYSICAL_ALIGN;
-	init_sz	+= CONFIG_PHYSICAL_ALIGN;
+	init_size += CONFIG_PHYSICAL_ALIGN;
+	image_size = init_size;
+
+	reloc_size = round_up(RELOC_SECTION_SIZE, FILE_ALIGNMENT);
+	compat_size = round_up(COMPAT_SECTION_SIZE, FILE_ALIGNMENT);
+
+	/*
+	 * Let's remove extra memory used by special sections
+	 * and use it as a part of bss.
+	 */
+	init_size -= round_up(reloc_size, SECTION_ALIGNMENT);
+	init_size -= round_up(compat_size, SECTION_ALIGNMENT);
+	if (init_size < file_size + setup_size) {
+		init_size = file_size + setup_size;
+		image_size += round_up(reloc_size, SECTION_ALIGNMENT);
+		image_size += round_up(compat_size, SECTION_ALIGNMENT);
+	}
 
 	/*
-	 * Size of code: Subtract the size of the first sector (512 bytes)
-	 * which includes the header.
+	 * Update sections offsets.
+	 * NOTE: Order is important
 	 */
-	put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header + 0x1c]);
 
-	/* Size of image */
-	put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
+	bss_size = init_size - file_size - setup_size;
+
+	emit_pecoff_section(".setup", setup_size - SECTOR_SIZE, 0,
+			    &file_offset, &mem_offset, SCN_RO |
+			    IMAGE_SCN_CNT_INITIALIZED_DATA);
+
+	text_rva_diff = mem_offset - file_offset;
+	text_rva = mem_offset;
+	emit_pecoff_section(".text", text_size, 0,
+			    &file_offset, &mem_offset, SCN_RX |
+			    IMAGE_SCN_CNT_CODE);
+
+	/* Check that kernel sections mapping is contiguous */
+	if (text_rva_diff != mem_offset - file_offset)
+		die("Kernel sections mapping is wrong: %#x != %#x",
+		    mem_offset - file_offset, text_rva_diff);
+
+	emit_pecoff_section(".data", file_size - text_size, bss_size,
+			    &file_offset, &mem_offset, SCN_RW |
+			    IMAGE_SCN_CNT_INITIALIZED_DATA);
+
+	reloc_offset = file_offset;
+	reloc_rva = mem_offset;
+	emit_pecoff_section(".reloc", reloc_size, 0,
+			    &file_offset, &mem_offset, SCN_RW |
+			    IMAGE_SCN_CNT_INITIALIZED_DATA |
+			    IMAGE_SCN_MEM_DISCARDABLE);
+
+	compat_offset = file_offset;
+#ifdef CONFIG_EFI_MIXED
+	emit_pecoff_section(".comat", compat_size, 0,
+			    &file_offset, &mem_offset, SCN_RW |
+			    IMAGE_SCN_CNT_INITIALIZED_DATA |
+			    IMAGE_SCN_MEM_DISCARDABLE);
+#endif
+
+	if (file_size + setup_size + reloc_size + compat_size != file_offset)
+		die("file_size(%#x) != filesz(%#x)",
+		    file_size + setup_size + reloc_size + compat_size, file_offset);
+
+	/* Size of code. */
+	put_unaligned_le32(round_up(text_size, SECTION_ALIGNMENT),
+			   &buf[pe_header + PECOFF_CODE_SIZE_OFFSET]);
+	/*
+	 * Size of data.
+	 * Exclude text size and first sector, which contains PE header.
+	 */
+	put_unaligned_le32(mem_offset - round_up(text_size, SECTION_ALIGNMENT),
+			   &buf[pe_header + PECOFF_DATA_SIZE_OFFSET]);
+
+	/* Size of image. */
+	put_unaligned_le32(mem_offset, &buf[pe_header + PECOFF_IMAGE_SIZE_OFFSET]);
 
 	/*
 	 * Address of entry point for PE/COFF executable
 	 */
-	put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header + 0x28]);
+	put_unaligned_le32(text_rva + efi_pe_entry, &buf[pe_header + PECOFF_ENTRY_POINT_OFFSET]);
 
-	update_pecoff_section_header_fields(".text", text_start, text_sz + bss_sz,
-					    text_sz, text_start);
-}
+	/*
+	 * BaseOfCode for PE/COFF executable
+	 */
+	put_unaligned_le32(text_rva, &buf[pe_header + PECOFF_BASE_OF_CODE_OFFSET]);
 
-static int reserve_pecoff_reloc_section(int c)
-{
-	/* Reserve 0x20 bytes for .reloc section */
-	memset(buf+c, 0, PECOFF_RELOC_RESERVE);
-	return PECOFF_RELOC_RESERVE;
+	/*
+	 * Since we have generated .reloc section, we need to
+	 * fill-in Reloc directory
+	 */
+	put_unaligned_le32(reloc_rva, &buf[pe_header + PECOFF_RELOC_DIR_OFFSET]);
+	put_unaligned_le32(RELOC_SECTION_SIZE, &buf[pe_header + PECOFF_RELOC_DIR_OFFSET + 4]);
+
+	return file_offset;
 }
 
-static void efi_stub_defaults(void)
+static void generate_pecoff_section_data(u8 *output, unsigned int setup_size)
 {
-	/* Defaults for old kernel */
-#ifdef CONFIG_X86_32
-	efi_pe_entry = 0x10;
-#else
-	efi_pe_entry = 0x210;
-	startup_64 = 0x200;
+	/*
+	 * Modify .reloc section contents with a two entries. The
+	 * relocation is applied to offset 10 of the relocation section.
+	 */
+	put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE, &output[reloc_offset]);
+	put_unaligned_le32(RELOC_SECTION_SIZE, &output[reloc_offset + 4]);
+
+#ifdef CONFIG_EFI_MIXED
+	/*
+	 * Put the IA-32 machine type (0x14c) and the associated entry point
+	 * address in the .compat section, so loaders can figure out which other
+	 * execution modes this image supports.
+	 */
+	output[compat_offset] = 0x1;
+	output[compat_offset + 1] = 0x8;
+	put_unaligned_le16(0x14c, &output[compat_offset + 2]);
+	put_unaligned_le32(efi32_pe_entry + setup_size, &output[compat_offset + 4]);
 #endif
 }
 
@@ -297,33 +475,27 @@ static void efi_stub_entry_update(void)
 
 #ifdef CONFIG_EFI_MIXED
 	if (efi32_stub_entry != addr)
-		die("32-bit and 64-bit EFI entry points do not match\n");
+		die("32-bit and 64-bit EFI entry points do not match");
 #endif
 	put_unaligned_le32(addr, &buf[0x264]);
 }
 
+static void efi_stub_update_defaults(void)
+{
+	/* Defaults for old kernel */
+#ifdef CONFIG_X86_32
+	efi_pe_entry = 0x10;
+#else
+	efi_pe_entry = 0x210;
+	startup_64 = 0x200;
+#endif
+}
 #else
 
-static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
-static inline void update_pecoff_text(unsigned int text_start,
-				      unsigned int file_sz,
-				      unsigned int init_sz) {}
-static inline void efi_stub_defaults(void) {}
-static inline void efi_stub_entry_update(void) {}
+static void efi_stub_update_defaults(void) {}
 
-static inline int reserve_pecoff_reloc_section(int c)
-{
-	return 0;
-}
 #endif /* CONFIG_EFI_STUB */
 
-static int reserve_pecoff_compat_section(int c)
-{
-	/* Reserve 0x20 bytes for .compat section */
-	memset(buf+c, 0, PECOFF_COMPAT_RESERVE);
-	return PECOFF_COMPAT_RESERVE;
-}
-
 /*
  * Parse zoffset.h and find the entry points. We could just #include zoffset.h
  * but that would mean tools/build would have to be rebuilt every time. It's
@@ -336,20 +508,15 @@ static int reserve_pecoff_compat_section(int c)
 
 static void parse_zoffset(char *fname)
 {
-	FILE *file;
-	char *p;
-	int c;
+	size_t size;
+	char *data, *p;
 
-	file = fopen(fname, "r");
-	if (!file)
-		die("Unable to open `%s': %m", fname);
-	c = fread(buf, 1, sizeof(buf) - 1, file);
-	if (ferror(file))
-		die("read-error on `zoffset.h'");
-	fclose(file);
-	buf[c] = 0;
+	data = map_file(fname, &size);
+
+	/* We can do that, since we mapped one byte more */
+	data[size] = 0;
 
-	p = (char *)buf;
+	p = (char *)data;
 
 	while (p && *p) {
 		PARSE_ZOFS(p, efi32_stub_entry);
@@ -358,6 +525,7 @@ static void parse_zoffset(char *fname)
 		PARSE_ZOFS(p, efi32_pe_entry);
 		PARSE_ZOFS(p, kernel_info);
 		PARSE_ZOFS(p, startup_64);
+		PARSE_ZOFS(p, _rodata);
 		PARSE_ZOFS(p, _ehead);
 		PARSE_ZOFS(p, _end);
 
@@ -365,82 +533,93 @@ static void parse_zoffset(char *fname)
 		while (p && (*p == '\r' || *p == '\n'))
 			p++;
 	}
+
+	unmap_file(data, size);
 }
 
-int main(int argc, char ** argv)
+static unsigned int read_setup(char *path)
 {
-	unsigned int i, sz, setup_sectors, init_sz;
-	int c;
-	u32 sys_size;
-	struct stat sb;
-	FILE *file, *dest;
-	int fd;
-	void *kernel;
-	u32 crc = 0xffffffffUL;
-
-	efi_stub_defaults();
-
-	if (argc != 5)
-		usage();
-	parse_zoffset(argv[3]);
-
-	dest = fopen(argv[4], "w");
-	if (!dest)
-		die("Unable to write `%s': %m", argv[4]);
+	FILE *file;
+	unsigned int setup_size, file_size;
 
 	/* Copy the setup code */
-	file = fopen(argv[1], "r");
+	file = fopen(path, "r");
 	if (!file)
-		die("Unable to open `%s': %m", argv[1]);
-	c = fread(buf, 1, sizeof(buf), file);
+		die("Unable to open `%s': %m", path);
+
+	file_size = fread(buf, 1, sizeof(buf), file);
 	if (ferror(file))
 		die("read-error on `setup'");
-	if (c < 1024)
+
+	if (file_size < 2 * SECTOR_SIZE)
 		die("The setup must be at least 1024 bytes");
-	if (get_unaligned_le16(&buf[510]) != 0xAA55)
+
+	if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
 		die("Boot block hasn't got boot flag (0xAA55)");
-	fclose(file);
 
-	c += reserve_pecoff_compat_section(c);
-	c += reserve_pecoff_reloc_section(c);
+	fclose(file);
 
 	/* Pad unused space with zeros */
-	setup_sectors = (c + 511) / 512;
-	if (setup_sectors < SETUP_SECT_MIN)
-		setup_sectors = SETUP_SECT_MIN;
-	i = setup_sectors*512;
-	memset(buf+c, 0, i-c);
+	setup_size = round_up(file_size, SECTOR_SIZE);
+
+	if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
+		setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
 
-	update_pecoff_setup_and_reloc(i);
+	/*
+	 * Global buffer is already initialised
+	 * to 0, but just in case, zero out padding.
+	 */
+
+	memset(buf + file_size, 0, setup_size - file_size);
+
+	return setup_size;
+}
+
+int main(int argc, char **argv)
+{
+	size_t kern_file_size;
+	unsigned int setup_size;
+	unsigned int setup_sectors;
+	unsigned int init_size;
+	unsigned int total_size;
+	unsigned int kern_size;
+	void *kernel;
+	u32 crc = 0xffffffffUL;
+	u8 *output;
+
+	if (argc != 5)
+		usage();
+
+	efi_stub_update_defaults();
+	parse_zoffset(argv[3]);
+
+	setup_size = read_setup(argv[1]);
+
+	setup_sectors = setup_size/SECTOR_SIZE;
 
 	/* Set the default root device */
 	put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
 
-	/* Open and stat the kernel file */
-	fd = open(argv[2], O_RDONLY);
-	if (fd < 0)
-		die("Unable to open `%s': %m", argv[2]);
-	if (fstat(fd, &sb))
-		die("Unable to stat `%s': %m", argv[2]);
-	sz = sb.st_size;
-	kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
-	if (kernel == MAP_FAILED)
-		die("Unable to mmap '%s': %m", argv[2]);
-	/* Number of 16-byte paragraphs, including space for a 4-byte CRC */
-	sys_size = (sz + 15 + 4) / 16;
+	/* Map kernel file to memory */
+	kernel = map_file(argv[2], &kern_file_size);
+
 #ifdef CONFIG_EFI_STUB
-	/*
-	 * COFF requires minimum 32-byte alignment of sections, and
-	 * adding a signature is problematic without that alignment.
-	 */
-	sys_size = (sys_size + 1) & ~1;
+	/* PE specification require 512-byte minimum section file alignment */
+	kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
+#else
+	/* Number of 16-byte paragraphs, including space for a 4-byte CRC */
+	kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
 #endif
 
 	/* Patch the setup code with the appropriate size parameters */
-	buf[0x1f1] = setup_sectors-1;
-	put_unaligned_le32(sys_size, &buf[0x1f4]);
+	buf[0x1f1] = setup_sectors - 1;
+	put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
+
+	/* Update kernel_info offset. */
+	put_unaligned_le32(kernel_info, &buf[0x268]);
+
+	init_size = get_unaligned_le32(&buf[0x260]);
 
-	init_sz = get_unaligned_le32(&buf[0x260]);
 #ifdef CONFIG_EFI_STUB
 	/*
 	 * The decompression buffer will start at ImageBase. When relocating
@@ -456,45 +635,39 @@ int main(int argc, char ** argv)
 	 * For future-proofing, increase init_sz if necessary.
 	 */
 
-	if (init_sz - _end < i + _ehead) {
-		init_sz = (i + _ehead + _end + 4095) & ~4095;
-		put_unaligned_le32(init_sz, &buf[0x260]);
+	if (init_size - _end < setup_size + _ehead) {
+		init_size = round_up(setup_size + _ehead + _end, SECTION_ALIGNMENT);
+		put_unaligned_le32(init_size, &buf[0x260]);
 	}
-#endif
-	update_pecoff_text(setup_sectors * 512, i + (sys_size * 16), init_sz);
 
-	efi_stub_entry_update();
+	total_size = update_pecoff_sections(setup_size, kern_size, init_size, _rodata);
 
-	/* Update kernel_info offset. */
-	put_unaligned_le32(kernel_info, &buf[0x268]);
+	efi_stub_entry_update();
+#else
+	(void)init_size;
+	total_size = setup_size + kern_size;
+#endif
 
-	crc = partial_crc32(buf, i, crc);
-	if (fwrite(buf, 1, i, dest) != i)
-		die("Writing setup failed");
+	output = map_output_file(argv[4], total_size);
 
-	/* Copy the kernel code */
-	crc = partial_crc32(kernel, sz, crc);
-	if (fwrite(kernel, 1, sz, dest) != sz)
-		die("Writing kernel failed");
+	memcpy(output, buf, setup_size);
+	memcpy(output + setup_size, kernel, kern_file_size);
+	memset(output + setup_size + kern_file_size, 0, kern_size - kern_file_size);
 
-	/* Add padding leaving 4 bytes for the checksum */
-	while (sz++ < (sys_size*16) - 4) {
-		crc = partial_crc32_one('\0', crc);
-		if (fwrite("\0", 1, 1, dest) != 1)
-			die("Writing padding failed");
-	}
+#ifdef CONFIG_EFI_STUB
+	generate_pecoff_section_data(output, setup_size);
+#endif
 
-	/* Write the CRC */
-	put_unaligned_le32(crc, buf);
-	if (fwrite(buf, 1, 4, dest) != 4)
-		die("Writing CRC failed");
+	/* Calculate and write kernel checksum. */
+	crc = partial_crc32(output, total_size - 4, crc);
+	put_unaligned_le32(crc, &output[total_size - 4]);
 
-	/* Catch any delayed write failures */
-	if (fclose(dest))
-		die("Writing image failed");
+	/* Catch any delayed write failures. */
+	if (munmap(output, total_size) < 0)
+		die("Writing kernel failed");
 
-	close(fd);
+	unmap_file(kernel, kern_file_size);
 
-	/* Everything is OK */
+	/* Everything is OK. */
 	return 0;
 }
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 680184034cb7..914106d547a6 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -392,6 +392,60 @@ static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
 		asm("hlt");
 }
 
+
+/*
+ * Manually setup memory protection attributes for each ELF section
+ * since we cannot do it properly by using PE sections.
+ */
+static void setup_sections_memory_protection(void *image_base,
+					     unsigned long init_size)
+{
+#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
+	efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+
+	if (!efi_dxe_table ||
+	    efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+		efi_warn("Unable to locate EFI DXE services table\n");
+		efi_dxe_table = NULL;
+		return;
+	}
+
+	extern char _head[], _ehead[];
+	extern char _compressed[], _ecompressed[];
+	extern char _text[], _etext[];
+	extern char _rodata[], _erodata[];
+	extern char _data[];
+
+	/* .setup [image_base, _head] */
+	efi_adjust_memory_range_protection((unsigned long)image_base,
+					   (unsigned long)_head - (unsigned long)image_base,
+					   EFI_MEMORY_RO | EFI_MEMORY_XP);
+	/* .head.text [_head, _ehead] */
+	efi_adjust_memory_range_protection((unsigned long)_head,
+					   (unsigned long)_ehead - (unsigned long)_head,
+					   EFI_MEMORY_RO);
+	/* .rodata..compressed [_compressed, _ecompressed] */
+	efi_adjust_memory_range_protection((unsigned long)_compressed,
+					   (unsigned long)_ecompressed - (unsigned long)_compressed,
+					   EFI_MEMORY_RO | EFI_MEMORY_XP);
+	/* .text [_text, _etext] */
+	efi_adjust_memory_range_protection((unsigned long)_text,
+					   (unsigned long)_etext - (unsigned long)_text,
+					   EFI_MEMORY_RO);
+	/* .rodata [_rodata, _erodata] */
+	efi_adjust_memory_range_protection((unsigned long)_rodata,
+					   (unsigned long)_erodata - (unsigned long)_rodata,
+					   EFI_MEMORY_RO | EFI_MEMORY_XP);
+	/* .data, .bss [_data, image_base + init_size] */
+	efi_adjust_memory_range_protection((unsigned long)_data,
+					   (unsigned long)image_base + init_size - (unsigned long)_rodata,
+					   EFI_MEMORY_XP);
+#else
+	(void)image_base;
+	(void)init_size;
+#endif
+}
+
 void __noreturn efi_stub_entry(efi_handle_t handle,
 			       efi_system_table_t *sys_table_arg,
 			       struct boot_params *boot_params);
@@ -438,10 +492,15 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
 
 	hdr = &boot_params->hdr;
 
-	/* Copy the setup header from the second sector to boot_params */
-	memcpy(&hdr->jump, image_base + 512,
+	/*
+	 * Copy the setup header from the second sector
+	 * (mapped to image_base + 0x1000) to boot_params
+	 */
+	memcpy(&hdr->jump, image_base + 0x1000,
 	       sizeof(struct setup_header) - offsetof(struct setup_header, jump));
 
+	setup_sections_memory_protection(image_base, hdr->init_size);
+
 	/*
 	 * Fill out some of the header fields ourselves because the
 	 * EFI firmware loader doesn't load the first sector.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 15/16] efi/libstub: Add memory attribute protocol definitions
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (13 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 14/16] x86/build: Make generated PE more spec compliant Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-19  7:39   ` Ard Biesheuvel
  2022-09-06 10:41 ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Evgeniy Baskov
  2022-10-18 21:04 ` [PATCH 00/16] x86_64: Improvements at compressed kernel stage Peter Jones
  16 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

EFI_MEMORY_ATTRIBUTE_PROTOCOL servers as a better alternative to
DXE services for setting memory attributes in EFI Boot Services
environment. This protocol is better since it is a part of UEFI
specification itself and not UEFI PI specification like DXE
services.

Add EFI_MEMORY_ATTRIBUTE_PROTOCOL definitions.
Support mixed mode properly for its calls.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 arch/x86/include/asm/efi.h             |  7 +++++++
 drivers/firmware/efi/libstub/efistub.h | 22 ++++++++++++++++++++++
 include/linux/efi.h                    |  1 +
 3 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 233ae6986d6f..522ff2e443b3 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -325,6 +325,13 @@ static inline u32 efi64_convert_status(efi_status_t status)
 #define __efi64_argmap_set_memory_space_attributes(phys, size, flags) \
 	(__efi64_split(phys), __efi64_split(size), __efi64_split(flags))
 
+/* Memory Attribute Protocol */
+#define __efi64_argmap_set_memory_attributes(protocol, phys, size, flags) \
+	((protocol), __efi64_split(phys), __efi64_split(size), __efi64_split(flags))
+
+#define __efi64_argmap_clear_memory_attributes(protocol, phys, size, flags) \
+	((protocol), __efi64_split(phys), __efi64_split(size), __efi64_split(flags))
+
 /*
  * The macros below handle the plumbing for the argument mapping. To add a
  * mapping for a specific EFI method, simply define a macro
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index cdd1bb50c786..87973f104731 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -39,6 +39,9 @@ extern const efi_system_table_t *efi_system_table;
 typedef union efi_dxe_services_table efi_dxe_services_table_t;
 extern const efi_dxe_services_table_t *efi_dxe_table;
 
+typedef union efi_memory_attribute_protocol efi_memory_attribute_protocol_t;
+extern efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
+
 efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
 				   efi_system_table_t *sys_table_arg);
 
@@ -403,6 +406,25 @@ union efi_dxe_services_table {
 	} mixed_mode;
 };
 
+union  efi_memory_attribute_protocol {
+	struct {
+		void *get_memory_attributes;
+		efi_status_t (__efiapi *set_memory_attributes)(efi_memory_attribute_protocol_t *,
+								efi_physical_addr_t,
+								u64,
+								u64);
+		efi_status_t (__efiapi *clear_memory_attributes)(efi_memory_attribute_protocol_t *,
+								  efi_physical_addr_t,
+								  u64,
+								  u64);
+	};
+	struct {
+		u32 get_memory_attributes;
+		u32 set_memory_attributes;
+		u32 clear_memory_attributes;
+	} mixed_mode;
+};
+
 typedef union efi_uga_draw_protocol efi_uga_draw_protocol_t;
 
 union efi_uga_draw_protocol {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index d2b84c2fec39..d32368a32285 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -386,6 +386,7 @@ void efi_native_runtime_setup(void);
 #define EFI_LOAD_FILE2_PROTOCOL_GUID		EFI_GUID(0x4006c0c1, 0xfcb3, 0x403e,  0x99, 0x6d, 0x4a, 0x6c, 0x87, 0x24, 0xe0, 0x6d)
 #define EFI_RT_PROPERTIES_TABLE_GUID		EFI_GUID(0xeb66918a, 0x7eef, 0x402a,  0x84, 0x2e, 0x93, 0x1d, 0x21, 0xc3, 0x8a, 0xe9)
 #define EFI_DXE_SERVICES_TABLE_GUID		EFI_GUID(0x05ad34ba, 0x6f02, 0x4214,  0x95, 0x2e, 0x4d, 0xa0, 0x39, 0x8e, 0x2b, 0xb9)
+#define EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID	EFI_GUID(0xf4560cf6, 0x40ec, 0x4b4a,  0xa1, 0x92, 0xbf, 0x1d, 0x57, 0xd0, 0xb1, 0x89)
 
 #define EFI_IMAGE_SECURITY_DATABASE_GUID	EFI_GUID(0xd719b2cb, 0x3d3a, 0x4596,  0xa3, 0xbc, 0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f)
 #define EFI_SHIM_LOCK_GUID			EFI_GUID(0x605dab50, 0xe046, 0x4300,  0xab, 0xb6, 0x3d, 0xd8, 0x10, 0xdd, 0x8b, 0x23)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 16/16] efi/libstub: Use memory attribute protocol
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (14 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 15/16] efi/libstub: Add memory attribute protocol definitions Evgeniy Baskov
@ 2022-09-06 10:41 ` Evgeniy Baskov
  2022-10-18 20:51   ` [PATCH] efi/libstub: make memory protection warnings include newlines Peter Jones
  2022-10-19  7:42   ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Ard Biesheuvel
  2022-10-18 21:04 ` [PATCH 00/16] x86_64: Improvements at compressed kernel stage Peter Jones
  16 siblings, 2 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-09-06 10:41 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

Add EFI_MEMORY_ATTRIBUTE_PROTOCOL as preferred alternative to DXE
services for changing memory attributes in the EFISTUB.

Use DXE services only as a fallback in case aforementioned protocol
is not supported by UEFI implementation.

Move DXE services initialization code closer to the place they are used
to match EFI_MEMORY_ATTRIBUTE_PROTOCOL initialization code.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
---
 drivers/firmware/efi/libstub/mem.c      | 166 ++++++++++++++++++------
 drivers/firmware/efi/libstub/x86-stub.c |  17 ---
 2 files changed, 127 insertions(+), 56 deletions(-)

diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 89ebc8ad2c22..8c8782993b30 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -5,6 +5,9 @@
 
 #include "efistub.h"
 
+const efi_dxe_services_table_t *efi_dxe_table;
+efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
+
 static inline bool mmap_has_headroom(unsigned long buff_size,
 				     unsigned long map_size,
 				     unsigned long desc_size)
@@ -131,50 +134,32 @@ void efi_free(unsigned long size, unsigned long addr)
 	efi_bs_call(free_pages, addr, nr_pages);
 }
 
-/**
- * efi_adjust_memory_range_protection() - change memory range protection attributes
- * @start:	memory range start address
- * @size:	memory range size
- *
- * Actual memory range for which memory attributes are modified is
- * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
- * that includes [start, start + size].
- *
- * @return: status code
- */
-efi_status_t efi_adjust_memory_range_protection(unsigned long start,
-						unsigned long size,
-						unsigned long attributes)
+static void retrive_dxe_table(void)
+{
+	efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+	if (efi_dxe_table &&
+	    efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+		efi_warn("Ignoring DXE services table: invalid signature\n");
+		efi_dxe_table = NULL;
+	}
+}
+
+static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t rounded_start,
+					  efi_physical_addr_t rounded_end,
+					  unsigned long attributes)
 {
 	efi_status_t status;
 	efi_gcd_memory_space_desc_t desc;
-	efi_physical_addr_t end, next;
-	efi_physical_addr_t rounded_start, rounded_end;
+	efi_physical_addr_t end, next, start;
 	efi_physical_addr_t unprotect_start, unprotect_size;
 	int has_system_memory = 0;
 
-	if (efi_dxe_table == NULL)
-		return EFI_UNSUPPORTED;
+	if (!efi_dxe_table) {
+		retrive_dxe_table();
 
-	/*
-	 * This function should not be used to modify attributes
-	 * other than writable/executable.
-	 */
-
-	if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
-		return EFI_INVALID_PARAMETER;
-
-	/*
-	 * Disallow simultaniously executable and writable memory
-	 * to inforce W^X policy if direct extraction code is enabled.
-	 */
-
-	if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
-	    IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
-		return EFI_INVALID_PARAMETER;
-
-	rounded_start = rounddown(start, EFI_PAGE_SIZE);
-	rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+		if (!efi_dxe_table)
+			return EFI_UNSUPPORTED;
+	}
 
 	/*
 	 * Don't modify memory region attributes, they are
@@ -182,14 +167,15 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
 	 * encounter firmware bugs.
 	 */
 
-	for (end = start + size; start < end; start = next) {
+
+	for (start = rounded_start, end = rounded_end; start < end; start = next) {
 
 		status = efi_dxe_call(get_memory_space_descriptor,
 				      start, &desc);
 
 		if (status != EFI_SUCCESS) {
 			efi_warn("Unable to get memory descriptor at %lx\n",
-				 start);
+				 (unsigned long)start);
 			return status;
 		}
 
@@ -231,3 +217,105 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
 
 	return EFI_SUCCESS;
 }
+
+static void retrive_memory_attributes_proto(void)
+{
+	efi_status_t status;
+	efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
+
+	status = efi_bs_call(locate_protocol, &guid, NULL,
+			     (void **)&efi_mem_attrib_proto);
+	if (status != EFI_SUCCESS)
+		efi_mem_attrib_proto = NULL;
+}
+
+/**
+ * efi_adjust_memory_range_protection() - change memory range protection attributes
+ * @start:	memory range start address
+ * @size:	memory range size
+ *
+ * Actual memory range for which memory attributes are modified is
+ * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
+ * that includes [start, start + size].
+ *
+ * This function first attempts to use EFI_MEMORY_ATTRIBUTE_PROTOCOL,
+ * that is a part of UEFI Specification since version 2.10.
+ * If the protocol is unavailable it falls back to DXE services functions.
+ *
+ * @return: status code
+ */
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+						unsigned long size,
+						unsigned long attributes)
+{
+	efi_status_t status;
+	efi_physical_addr_t rounded_start, rounded_end;
+	unsigned long attr_clear;
+
+	/*
+	 * This function should not be used to modify attributes
+	 * other than writable/executable.
+	 */
+
+	if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
+		return EFI_INVALID_PARAMETER;
+
+	/*
+	 * Disallow simultaniously executable and writable memory
+	 * to inforce W^X policy if direct extraction code is enabled.
+	 */
+
+	if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
+	    IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
+		return EFI_INVALID_PARAMETER;
+
+	rounded_start = rounddown(start, EFI_PAGE_SIZE);
+	rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+
+	if (!efi_mem_attrib_proto) {
+		retrive_memory_attributes_proto();
+
+		/* Fall back to DXE services if unsupported */
+		if (!efi_mem_attrib_proto) {
+			return adjust_mem_attrib_dxe(rounded_start,
+						     rounded_end,
+						     attributes);
+		}
+	}
+
+	/*
+	 * Unlike DXE services functions, EFI_MEMORY_ATTRIBUTE_PROTOCOL
+	 * does not clear unset protection bit, so it needs to be cleared
+	 * explcitly
+	 */
+
+	attr_clear = ~attributes &
+		     (EFI_MEMORY_RO | EFI_MEMORY_XP | EFI_MEMORY_RP);
+
+	status = efi_call_proto(efi_mem_attrib_proto,
+				clear_memory_attributes,
+				rounded_start,
+				rounded_end - rounded_start,
+				attr_clear);
+	if (status != EFI_SUCCESS) {
+		efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",
+			 (unsigned long)rounded_start,
+			 (unsigned long)rounded_end,
+			 status);
+		return status;
+	}
+
+	status = efi_call_proto(efi_mem_attrib_proto,
+				set_memory_attributes,
+				rounded_start,
+				rounded_end - rounded_start,
+				attributes);
+	if (status != EFI_SUCCESS) {
+		efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",
+			 (unsigned long)rounded_start,
+			 (unsigned long)rounded_end,
+			 status);
+	}
+
+	return status;
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 914106d547a6..dd1e1e663072 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -22,7 +22,6 @@
 #define MAXMEM_X86_64_4LEVEL (1ull << 46)
 
 const efi_system_table_t *efi_system_table;
-const efi_dxe_services_table_t *efi_dxe_table;
 extern u32 image_offset;
 static efi_loaded_image_t *image = NULL;
 
@@ -401,15 +400,6 @@ static void setup_sections_memory_protection(void *image_base,
 					     unsigned long init_size)
 {
 #ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
-	efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
-
-	if (!efi_dxe_table ||
-	    efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
-		efi_warn("Unable to locate EFI DXE services table\n");
-		efi_dxe_table = NULL;
-		return;
-	}
-
 	extern char _head[], _ehead[];
 	extern char _compressed[], _ecompressed[];
 	extern char _text[], _etext[];
@@ -791,13 +781,6 @@ unsigned long efi_main(efi_handle_t handle,
 	if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
 		efi_exit(handle, EFI_INVALID_PARAMETER);
 
-	efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
-	if (efi_dxe_table &&
-	    efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
-		efi_warn("Ignoring DXE services table: invalid signature\n");
-		efi_dxe_table = NULL;
-	}
-
 #ifndef CONFIG_EFI_STUB_EXTRACT_DIRECT
 	/*
 	 * If the kernel isn't already loaded at a suitable address,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH] efi/libstub: make memory protection warnings include newlines.
  2022-09-06 10:41 ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Evgeniy Baskov
@ 2022-10-18 20:51   ` Peter Jones
  2022-10-19  7:44     ` Ard Biesheuvel
  2022-10-19  7:42   ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Ard Biesheuvel
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Jones @ 2022-10-18 20:51 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening, Ard Biesheuvel,
	Peter Jones

efi_warn() doesn't put newlines on messages, and that makes reading
warnings without newlines hard to do.

Signed-off-by: Peter Jones <pjones@redhat.com>
---
 drivers/firmware/efi/libstub/mem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 4d6c7f4fb7e..1b874096109 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -293,7 +293,7 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
 				rounded_end - rounded_start,
 				attr_clear);
 	if (status != EFI_SUCCESS) {
-		efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",
+		efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx\n",
 			 (unsigned long)rounded_start,
 			 (unsigned long)rounded_end,
 			 status);
@@ -306,7 +306,7 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
 				rounded_end - rounded_start,
 				attributes);
 	if (status != EFI_SUCCESS) {
-		efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",
+		efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx\n",
 			 (unsigned long)rounded_start,
 			 (unsigned long)rounded_end,
 			 status);
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/16] x86_64: Improvements at compressed kernel stage
  2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
                   ` (15 preceding siblings ...)
  2022-09-06 10:41 ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Evgeniy Baskov
@ 2022-10-18 21:04 ` Peter Jones
  2022-10-20 11:05   ` Evgeniy Baskov
  16 siblings, 1 reply; 51+ messages in thread
From: Peter Jones @ 2022-10-18 21:04 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Ard Biesheuvel, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

On Tue, Sep 06, 2022 at 01:41:04PM +0300, Evgeniy Baskov wrote:
> This patchset is aimed
> * to improve UEFI compatibility of compressed kernel code for x86_64
> * to setup proper memory access attributes for code and rodata sections
> * to implement W^X protection policy throughout the whole execution 
>   of compressed kernel for EFISTUB code path. 

Hi Evgeniy,

I've tested this set of patches with the Mu firmware that supports the W^X
feature and a modified bootloader to also support it, and also with an
existing firmware and the grub2 build in fedora 36.  On the firmware
without W^X support, this all works for me.  With W^X support, it works
so long as I use CONFIG_EFI_STUB_EXTRACT_DIRECT, though I still need
some changes in grub's loader.  IMO that's a big step forward.

I can't currently make it work with W^X enabled but without direct
extraction, and I'm still investigating why not, but I figured I'd give
you a heads up.

-- 
        Peter


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/16] x86/boot: Align vmlinuz sections on page size
  2022-09-06 10:41 ` [PATCH 01/16] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
@ 2022-10-19  7:01   ` Ard Biesheuvel
  2022-10-20 11:13     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:01 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> To protect sections on page table level each section
> needs to be aligned on page size (4KB).
>
> Set sections alignment in linker script.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/boot/compressed/vmlinux.lds.S | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 112b2375d021..6be90f1a1198 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -27,21 +27,27 @@ SECTIONS
>                 HEAD_TEXT
>                 _ehead = . ;
>         }
> +       . = ALIGN(PAGE_SIZE);
>         .rodata..compressed : {
> +               _compressed = .;

Why are you adding these?

>                 *(.rodata..compressed)
> +               _ecompressed = .;
>         }
> +       . = ALIGN(PAGE_SIZE);

On other EFI architectures, we only distinguish between R-X and RW-
regions, and alignment between .rodata and .text is unnecessary. Do we
really need to deviate from that here?


>         .text : {
>                 _text = .;      /* Text */
>                 *(.text)
>                 *(.text.*)
>                 _etext = . ;
>         }
> +       . = ALIGN(PAGE_SIZE);
>         .rodata : {
>                 _rodata = . ;
>                 *(.rodata)       /* read-only data */
>                 *(.rodata.*)
>                 _erodata = . ;
>         }
> +       . = ALIGN(PAGE_SIZE);
>         .data : {
>                 _data = . ;
>                 *(.data)
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB
  2022-09-06 10:41 ` [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
@ 2022-10-19  7:04   ` Ard Biesheuvel
  2022-10-20 11:15     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:04 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Avoid creating sections with maximal privileges to prepare for W^X

privileges

> implementation. Align sections on page size (4KB) to allow protecting
> them in page table.
>

in the page tables.

> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/kernel/vmlinux.lds.S | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 15f29053cec4..6587e0201b50 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -102,12 +102,11 @@ jiffies = jiffies_64;
>  PHDRS {
>         text PT_LOAD FLAGS(5);          /* R_E */
>         data PT_LOAD FLAGS(6);          /* RW_ */
> -#ifdef CONFIG_X86_64
> -#ifdef CONFIG_SMP
> +#if defined(CONFIG_X86_64) && defined(CONFIG_SMP)
>         percpu PT_LOAD FLAGS(6);        /* RW_ */
>  #endif
> -       init PT_LOAD FLAGS(7);          /* RWE */
> -#endif
> +       inittext PT_LOAD FLAGS(5);      /* R_E */
> +       init PT_LOAD FLAGS(6);          /* RW_ */

Please explain in the commit log how this change affects X86_32

>         note PT_NOTE FLAGS(0);          /* ___ */
>  }
>
> @@ -226,9 +225,10 @@ SECTIONS
>  #endif
>
>         INIT_TEXT_SECTION(PAGE_SIZE)
> -#ifdef CONFIG_X86_64
> -       :init
> -#endif
> +       :inittext
> +
> +       . = ALIGN(PAGE_SIZE);
> +
>
>         /*
>          * Section for code used exclusively before alternatives are run. All
> @@ -240,6 +240,7 @@ SECTIONS
>         .altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
>                 *(.altinstr_aux)
>         }
> +       :init
>
>         INIT_DATA_SECTION(16)
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline
  2022-09-06 10:41 ` [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
@ 2022-10-19  7:06   ` Ard Biesheuvel
  2022-10-20 11:23     ` Evgeniy Baskov
  2022-10-19  7:44   ` Andrew Cooper
  1 sibling, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:06 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Ensure WP bit to be set to prevent boot code from writing to
> non-writable memory pages.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/boot/compressed/head_64.S | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index d33f060900d2..5273367283b7 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -619,9 +619,8 @@ SYM_CODE_START(trampoline_32bit_src)
>         /* Set up new stack */
>         leal    TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
>
> -       /* Disable paging */
> -       movl    %cr0, %eax
> -       btrl    $X86_CR0_PG_BIT, %eax

Why do we no longer care about CR0's prior value?

> +       /* Disable paging and setup CR0 */
> +       movl    $(CR0_STATE & ~X86_CR0_PG), %eax
>         movl    %eax, %cr0
>
>         /* Check what paging mode we want to be in after the trampoline */
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 04/16] x86/boot: Increase boot page table size
  2022-09-06 10:41 ` [PATCH 04/16] x86/boot: Increase boot page table size Evgeniy Baskov
@ 2022-10-19  7:08   ` Ard Biesheuvel
  2022-10-20 11:29     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:08 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Previous calculations ignored pages implicitly mapped by ACPI code,

I'm not sure I understand what this means. Which ACPI code and which
pages does it map?

> so theoretical upper limit is higher than was set.
>
> Using 4KB pages is desirable for better memory protection granularity.
> Approximately twice as much memory is required for those.
>
> Increase initial page table size to 64 4KB page tables.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/include/asm/boot.h | 26 ++++++++++++++------------
>  1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> index 9191280d9ea3..024d972c248e 100644
> --- a/arch/x86/include/asm/boot.h
> +++ b/arch/x86/include/asm/boot.h
> @@ -41,22 +41,24 @@
>  # define BOOT_STACK_SIZE       0x4000
>
>  # define BOOT_INIT_PGT_SIZE    (6*4096)
> -# ifdef CONFIG_RANDOMIZE_BASE
>  /*
>   * Assuming all cross the 512GB boundary:
>   * 1 page for level4
> - * (2+2)*4 pages for kernel, param, cmd_line, and randomized kernel
> - * 2 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
> - * Total is 19 pages.
> + * (3+3)*2 pages for param and cmd_line
> + * (2+2+S)*2 pages for kernel and randomized kernel, where S is total number
> + *     of sections of kernel. Explanation: 2+2 are upper level page tables.
> + *     We can have only S unaligned parts of section: 1 at the end of the kernel
> + *     and (S-1) at the section borders. The start address of the kernel is
> + *     aligned, so an extra page table. There are at most S=6 sections in
> + *     vmlinux ELF image.
> + * 3 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
> + * Total is 36 pages.
> + *
> + * Some pages are also required for UEFI memory map and
> + * ACPI table mappings, so we need to add extra space.
> + * FIXME: Figure out exact amount of pages.
>   */
> -#  ifdef CONFIG_X86_VERBOSE_BOOTUP
> -#   define BOOT_PGT_SIZE       (19*4096)
> -#  else /* !CONFIG_X86_VERBOSE_BOOTUP */
> -#   define BOOT_PGT_SIZE       (17*4096)
> -#  endif
> -# else /* !CONFIG_RANDOMIZE_BASE */
> -#  define BOOT_PGT_SIZE                BOOT_INIT_PGT_SIZE
> -# endif
> +# define BOOT_PGT_SIZE         (64*4096)
>
>  #else /* !CONFIG_X86_64 */
>  # define BOOT_STACK_SIZE       0x1000
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping
  2022-09-06 10:41 ` [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
@ 2022-10-19  7:11   ` Ard Biesheuvel
  2022-10-20 11:30     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:11 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Current identity mapping code only supports 2M and 1G pages.
> 4KB pages are desirable for better memory protection granularity
> in compressed kernel code.
>
> Change identity mapping code to support 4KB pages and
> memory remapping with different attributes.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

This looks reasonable to me but someone on team-x86 will need to review this.

One nit below

> ---
>  arch/x86/include/asm/init.h |   1 +
>  arch/x86/mm/ident_map.c     | 186 +++++++++++++++++++++++++++++-------
>  2 files changed, 155 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 5f1d3c421f68..a8277ee82c51 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -8,6 +8,7 @@ struct x86_mapping_info {
>         unsigned long page_flag;         /* page flag for PMD or PUD entry */
>         unsigned long offset;            /* ident mapping offset */
>         bool direct_gbpages;             /* PUD level 1GB page support */
> +       bool allow_4kpages;              /* Allow more granular mappings with 4K pages */
>         unsigned long kernpg_flag;       /* kernel pagetable flag override */
>  };
>
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 968d7005f4a7..ad455d4ef595 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -2,26 +2,130 @@
>  /*
>   * Helper routines for building identity mapping page tables. This is
>   * included by both the compressed kernel and the regular kernel.
> + *

Drop this change

>   */
>
> -static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
> -                          unsigned long addr, unsigned long end)
> +static void ident_pte_init(struct x86_mapping_info *info, pte_t *pte_page,
> +                          unsigned long addr, unsigned long end,
> +                          unsigned long flags)
>  {
> -       addr &= PMD_MASK;
> -       for (; addr < end; addr += PMD_SIZE) {
> +       addr &= PAGE_MASK;
> +       for (; addr < end; addr += PAGE_SIZE) {
> +               pte_t *pte = pte_page + pte_index(addr);
> +
> +               set_pte(pte, __pte((addr - info->offset) | flags));
> +       }
> +}
> +
> +pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
> +                            pmd_t *pmdp, unsigned long page_addr)
> +{
> +       unsigned long pmd_addr, page_flags;
> +       pte_t *pte;
> +
> +       pte = (pte_t *)info->alloc_pgt_page(info->context);
> +       if (!pte)
> +               return NULL;
> +
> +       pmd_addr = page_addr & PMD_MASK;
> +
> +       /* Not a large page - clear PSE flag */
> +       page_flags = pmd_flags(*pmdp) & ~_PSE;
> +       ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE, page_flags);
> +
> +       return pte;
> +}
> +
> +static int ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
> +                         unsigned long addr, unsigned long end,
> +                         unsigned long flags)
> +{
> +       unsigned long next;
> +       bool new_table = 0;
> +
> +       for (; addr < end; addr = next) {
>                 pmd_t *pmd = pmd_page + pmd_index(addr);
> +               pte_t *pte;
>
> -               if (pmd_present(*pmd))
> +               next = (addr & PMD_MASK) + PMD_SIZE;
> +               if (next > end)
> +                       next = end;
> +
> +               /*
> +                * Use 2M pages if 4k pages are not allowed or
> +                * we are not mapping extra, i.e. address and size are aligned.
> +                */
> +
> +               if (!info->allow_4kpages ||
> +                   (!(addr & ~PMD_MASK) && next == addr + PMD_SIZE)) {
> +
> +                       pmd_t pmdval;
> +
> +                       addr &= PMD_MASK;
> +                       pmdval = __pmd((addr - info->offset) | flags | _PSE);
> +                       set_pmd(pmd, pmdval);
>                         continue;
> +               }
> +
> +               /*
> +                * If currently mapped page is large, we need to split it.
> +                * The case when we can remap 2M page to 2M page
> +                * with different flags is already covered above.
> +                *
> +                * If there's nothing mapped to desired address,
> +                * we need to allocate new page table.
> +                */
>
> -               set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
> +               if (pmd_large(*pmd)) {
> +                       pte = ident_split_large_pmd(info, pmd, addr);
> +                       new_table = 1;
> +               } else if (!pmd_present(*pmd)) {
> +                       pte = (pte_t *)info->alloc_pgt_page(info->context);
> +                       new_table = 1;
> +               } else {
> +                       pte = pte_offset_kernel(pmd, 0);
> +                       new_table = 0;
> +               }
> +
> +               if (!pte)
> +                       return -ENOMEM;
> +
> +               ident_pte_init(info, pte, addr, next, flags);
> +
> +               if (new_table)
> +                       set_pmd(pmd, __pmd(__pa(pte) | info->kernpg_flag));
>         }
> +
> +       return 0;
>  }
>
> +
> +pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
> +                            pud_t *pudp, unsigned long page_addr)
> +{
> +       unsigned long pud_addr, page_flags;
> +       pmd_t *pmd;
> +
> +       pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> +       if (!pmd)
> +               return NULL;
> +
> +       pud_addr = page_addr & PUD_MASK;
> +
> +       /* Not a large page - clear PSE flag */
> +       page_flags = pud_flags(*pudp) & ~_PSE;
> +       ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE, page_flags);
> +
> +       return pmd;
> +}
> +
> +
>  static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>                           unsigned long addr, unsigned long end)
>  {
>         unsigned long next;
> +       bool new_table = 0;
> +       int result;
>
>         for (; addr < end; addr = next) {
>                 pud_t *pud = pud_page + pud_index(addr);
> @@ -31,28 +135,39 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>                 if (next > end)
>                         next = end;
>
> +               /* Use 1G pages only if forced, even if they are supported. */
>                 if (info->direct_gbpages) {
>                         pud_t pudval;
> -
> -                       if (pud_present(*pud))
> -                               continue;
> +                       unsigned long flags;
>
>                         addr &= PUD_MASK;
> -                       pudval = __pud((addr - info->offset) | info->page_flag);
> +                       flags = info->page_flag | _PSE;
> +                       pudval = __pud((addr - info->offset) | flags);
> +
>                         set_pud(pud, pudval);
>                         continue;
>                 }
>
> -               if (pud_present(*pud)) {
> +               if (pud_large(*pud)) {
> +                       pmd = ident_split_large_pud(info, pud, addr);
> +                       new_table = 1;
> +               } else if (!pud_present(*pud)) {
> +                       pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> +                       new_table = 1;
> +               } else {
>                         pmd = pmd_offset(pud, 0);
> -                       ident_pmd_init(info, pmd, addr, next);
> -                       continue;
> +                       new_table = 0;
>                 }
> -               pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> +
>                 if (!pmd)
>                         return -ENOMEM;
> -               ident_pmd_init(info, pmd, addr, next);
> -               set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
> +
> +               result = ident_pmd_init(info, pmd, addr, next, info->page_flag);
> +               if (result)
> +                       return result;
> +
> +               if (new_table)
> +                       set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
>         }
>
>         return 0;
> @@ -63,6 +178,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>  {
>         unsigned long next;
>         int result;
> +       bool new_table = 0;
>
>         for (; addr < end; addr = next) {
>                 p4d_t *p4d = p4d_page + p4d_index(addr);
> @@ -72,15 +188,14 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>                 if (next > end)
>                         next = end;
>
> -               if (p4d_present(*p4d)) {
> +               if (!p4d_present(*p4d)) {
> +                       pud = (pud_t *)info->alloc_pgt_page(info->context);
> +                       new_table = 1;
> +               } else {
>                         pud = pud_offset(p4d, 0);
> -                       result = ident_pud_init(info, pud, addr, next);
> -                       if (result)
> -                               return result;
> -
> -                       continue;
> +                       new_table = 0;
>                 }
> -               pud = (pud_t *)info->alloc_pgt_page(info->context);
> +
>                 if (!pud)
>                         return -ENOMEM;
>
> @@ -88,19 +203,22 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>                 if (result)
>                         return result;
>
> -               set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
> +               if (new_table)
> +                       set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
>         }
>
>         return 0;
>  }
>
> -int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> -                             unsigned long pstart, unsigned long pend)
> +int kernel_ident_mapping_init(struct x86_mapping_info *info,
> +                             pgd_t *pgd_page, unsigned long pstart,
> +                             unsigned long pend)
>  {
>         unsigned long addr = pstart + info->offset;
>         unsigned long end = pend + info->offset;
>         unsigned long next;
>         int result;
> +       bool new_table;
>
>         /* Set the default pagetable flags if not supplied */
>         if (!info->kernpg_flag)
> @@ -117,20 +235,24 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>                 if (next > end)
>                         next = end;
>
> -               if (pgd_present(*pgd)) {
> +               if (!pgd_present(*pgd)) {
> +                       p4d = (p4d_t *)info->alloc_pgt_page(info->context);
> +                       new_table = 1;
> +               } else {
>                         p4d = p4d_offset(pgd, 0);
> -                       result = ident_p4d_init(info, p4d, addr, next);
> -                       if (result)
> -                               return result;
> -                       continue;
> +                       new_table = 0;
>                 }
>
> -               p4d = (p4d_t *)info->alloc_pgt_page(info->context);
>                 if (!p4d)
>                         return -ENOMEM;
> +
>                 result = ident_p4d_init(info, p4d, addr, next);
>                 if (result)
>                         return result;
> +
> +               if (!new_table)
> +                       continue;
> +
>                 if (pgtable_l5_enabled()) {
>                         set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
>                 } else {
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/16] x86/boot: Setup memory protection for bzImage code
  2022-09-06 10:41 ` [PATCH 06/16] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
@ 2022-10-19  7:17   ` Ard Biesheuvel
  2022-10-20 12:07     ` Evgeniy Baskov
  2022-10-19  7:57   ` Andrew Cooper
  1 sibling, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:17 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Use previously added code to use 4KB pages for mapping. Map compressed
> and uncompressed kernel with appropriate memory protection attributes.
> For compressed kernel set them up manually. For uncompressed kernel
> used flags specified in ELF header.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>
>  delete mode 100644 arch/x86/boot/compressed/pgtable.h
>  create mode 100644 arch/x86/include/asm/shared/pgtable.h
> ---
>  arch/x86/boot/compressed/head_64.S      | 25 ++++++-
>  arch/x86/boot/compressed/ident_map_64.c | 96 ++++++++++++++++---------
>  arch/x86/boot/compressed/misc.c         | 63 ++++++++++++++--
>  arch/x86/boot/compressed/misc.h         | 16 ++++-
>  arch/x86/boot/compressed/pgtable.h      | 20 ------
>  arch/x86/boot/compressed/pgtable_64.c   |  2 +-
>  arch/x86/boot/compressed/sev.c          |  6 +-
>  arch/x86/include/asm/shared/pgtable.h   | 29 ++++++++
>  8 files changed, 193 insertions(+), 64 deletions(-)
>  delete mode 100644 arch/x86/boot/compressed/pgtable.h
>  create mode 100644 arch/x86/include/asm/shared/pgtable.h
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 5273367283b7..889ca7176aa7 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -35,7 +35,7 @@
>  #include <asm/bootparam.h>
>  #include <asm/desc_defs.h>
>  #include <asm/trapnr.h>
> -#include "pgtable.h"
> +#include <asm/shared/pgtable.h>
>
>  /*
>   * Locally defined symbols should be marked hidden:
> @@ -578,6 +578,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>         pushq   %rsi
>         call    load_stage2_idt
>
> +       call    startup32_enable_nx_if_supported
>         /* Pass boot_params to initialize_identity_maps() */
>         movq    (%rsp), %rdi
>         call    initialize_identity_maps
> @@ -602,6 +603,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>         jmp     *%rax
>  SYM_FUNC_END(.Lrelocated)
>
> +SYM_FUNC_START_LOCAL_NOALIGN(startup32_enable_nx_if_supported)

Why the startup32_ prefix for this function name?

> +       pushq   %rbx
> +
> +       leaq    has_nx(%rip), %rcx
> +
> +       mov     $0x80000001, %eax
> +       cpuid
> +       btl     $20, %edx
> +       jnc     .Lnonx
> +
> +       movl    $1, (%rcx)
> +
> +       movl    $MSR_EFER, %ecx
> +       rdmsr
> +       btsl    $_EFER_NX, %eax
> +       wrmsr
> +
> +.Lnonx:
> +       popq    %rbx
> +       RET
> +SYM_FUNC_END(startup32_enable_nx_if_supported)
> +
>         .code32
>  /*
>   * This is the 32-bit trampoline that will be copied over to low memory.
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index d4a314cc50d6..880e08293023 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -28,6 +28,7 @@
>  #include <asm/trap_pf.h>
>  #include <asm/trapnr.h>
>  #include <asm/init.h>
> +#include <asm/shared/pgtable.h>
>  /* Use the static base for this part of the boot process */
>  #undef __PAGE_OFFSET
>  #define __PAGE_OFFSET __PAGE_OFFSET_BASE
> @@ -86,24 +87,46 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>   * Due to relocation, pointers must be assigned at run time not build time.
>   */
>  static struct x86_mapping_info mapping_info;
> +int has_nx;
>
>  /*
>   * Adds the specified range to the identity mappings.
>   */
> -void kernel_add_identity_map(unsigned long start, unsigned long end)
> +unsigned long kernel_add_identity_map(unsigned long start,
> +                                     unsigned long end,
> +                                     unsigned int flags)
>  {
>         int ret;
>
>         /* Align boundary to 2M. */
> -       start = round_down(start, PMD_SIZE);
> -       end = round_up(end, PMD_SIZE);
> +       start = round_down(start, PAGE_SIZE);
> +       end = round_up(end, PAGE_SIZE);
>         if (start >= end)
> -               return;
> +               return start;
> +
> +       /* Enforce W^X -- just stop booting with error on violation. */
> +       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) &&
> +           (flags & (MAP_EXEC | MAP_WRITE)) == (MAP_EXEC | MAP_WRITE))
> +               error("Error: W^X violation\n");
> +

Do we need to add a new failure mode here?

> +       bool nx = !(flags & MAP_EXEC) && has_nx;
> +       bool ro = !(flags & MAP_WRITE);
> +
> +       mapping_info.page_flag = sme_me_mask | (nx ?
> +               (ro ? __PAGE_KERNEL_RO : __PAGE_KERNEL) :
> +               (ro ? __PAGE_KERNEL_ROX : __PAGE_KERNEL_EXEC));
>
>         /* Build the mapping. */
> -       ret = kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, start, end);
> +       ret = kernel_ident_mapping_init(&mapping_info,
> +                                       (pgd_t *)top_level_pgt,
> +                                       start, end);
>         if (ret)
>                 error("Error: kernel_ident_mapping_init() failed\n");
> +
> +       if (!(flags & MAP_NOFLUSH))
> +               write_cr3(top_level_pgt);
> +
> +       return start;
>  }
>
>  /* Locates and clears a region for a new top level page table. */
> @@ -112,14 +135,17 @@ void initialize_identity_maps(void *rmode)
>         unsigned long cmdline;
>         struct setup_data *sd;
>
> +       boot_params = rmode;
> +
>         /* Exclude the encryption mask from __PHYSICAL_MASK */
>         physical_mask &= ~sme_me_mask;
>
>         /* Init mapping_info with run-time function/buffer pointers. */
>         mapping_info.alloc_pgt_page = alloc_pgt_page;
>         mapping_info.context = &pgt_data;
> -       mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask;
> +       mapping_info.page_flag = __PAGE_KERNEL_EXEC | sme_me_mask;
>         mapping_info.kernpg_flag = _KERNPG_TABLE;
> +       mapping_info.allow_4kpages = 1;
>
>         /*
>          * It should be impossible for this not to already be true,
> @@ -154,15 +180,34 @@ void initialize_identity_maps(void *rmode)
>         /*
>          * New page-table is set up - map the kernel image, boot_params and the
>          * command line. The uncompressed kernel requires boot_params and the
> -        * command line to be mapped in the identity mapping. Map them
> -        * explicitly here in case the compressed kernel does not touch them,
> -        * or does not touch all the pages covering them.
> +        * command line to be mapped in the identity mapping.
> +        * Every other accessed memory region is mapped later, if required.
>          */
> -       kernel_add_identity_map((unsigned long)_head, (unsigned long)_end);
> -       boot_params = rmode;
> -       kernel_add_identity_map((unsigned long)boot_params, (unsigned long)(boot_params + 1));
> +       extern char _head[], _ehead[];

Please move these extern declarations out of the function scope (at
the very least)

> +       kernel_add_identity_map((unsigned long)_head,
> +                               (unsigned long)_ehead, MAP_EXEC | MAP_NOFLUSH);
> +
> +       extern char _compressed[], _ecompressed[];
> +       kernel_add_identity_map((unsigned long)_compressed,
> +                               (unsigned long)_ecompressed, MAP_WRITE | MAP_NOFLUSH);
> +
> +       extern char _text[], _etext[];
> +       kernel_add_identity_map((unsigned long)_text,
> +                               (unsigned long)_etext, MAP_EXEC | MAP_NOFLUSH);
> +
> +       extern char _rodata[], _erodata[];
> +       kernel_add_identity_map((unsigned long)_rodata,
> +                               (unsigned long)_erodata, MAP_NOFLUSH);
> +

Same question as before: do we really need three different regions for
rodata+text here?

> +       extern char _data[], _end[];
> +       kernel_add_identity_map((unsigned long)_data,
> +                               (unsigned long)_end, MAP_WRITE | MAP_NOFLUSH);
> +
> +       kernel_add_identity_map((unsigned long)boot_params,
> +                               (unsigned long)(boot_params + 1), MAP_WRITE | MAP_NOFLUSH);
> +
>         cmdline = get_cmd_line_ptr();
> -       kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
> +       kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE, MAP_NOFLUSH);
>
>         /*
>          * Also map the setup_data entries passed via boot_params in case they
> @@ -172,7 +217,7 @@ void initialize_identity_maps(void *rmode)
>         while (sd) {
>                 unsigned long sd_addr = (unsigned long)sd;
>
> -               kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len);
> +               kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len, MAP_NOFLUSH);
>                 sd = (struct setup_data *)sd->next;
>         }
>
> @@ -185,26 +230,11 @@ void initialize_identity_maps(void *rmode)
>  static pte_t *split_large_pmd(struct x86_mapping_info *info,
>                               pmd_t *pmdp, unsigned long __address)
>  {
> -       unsigned long page_flags;
> -       unsigned long address;
> -       pte_t *pte;
> -       pmd_t pmd;
> -       int i;
> -
> -       pte = (pte_t *)info->alloc_pgt_page(info->context);
> +       unsigned long address = __address & PMD_MASK;
> +       pte_t *pte = ident_split_large_pmd(info, pmdp, address);
>         if (!pte)
>                 return NULL;
>
> -       address     = __address & PMD_MASK;
> -       /* No large page - clear PSE flag */
> -       page_flags  = info->page_flag & ~_PAGE_PSE;
> -
> -       /* Populate the PTEs */
> -       for (i = 0; i < PTRS_PER_PMD; i++) {
> -               set_pte(&pte[i], __pte(address | page_flags));
> -               address += PAGE_SIZE;
> -       }
> -
>         /*
>          * Ideally we need to clear the large PMD first and do a TLB
>          * flush before we write the new PMD. But the 2M range of the
> @@ -214,7 +244,7 @@ static pte_t *split_large_pmd(struct x86_mapping_info *info,
>          * also the only user of the page-table, so there is no chance
>          * of a TLB multihit.
>          */
> -       pmd = __pmd((unsigned long)pte | info->kernpg_flag);
> +       pmd_t pmd = __pmd((unsigned long)pte | info->kernpg_flag);
>         set_pmd(pmdp, pmd);
>         /* Flush TLB to establish the new PMD */
>         write_cr3(top_level_pgt);
> @@ -377,5 +407,5 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
>          * Error code is sane - now identity map the 2M region around
>          * the faulting address.
>          */
> -       kernel_add_identity_map(address, end);
> +       kernel_add_identity_map(address, end, MAP_WRITE);
>  }
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index cf690d8712f4..d377e434c4e3 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -14,10 +14,10 @@
>
>  #include "misc.h"
>  #include "error.h"
> -#include "pgtable.h"
>  #include "../string.h"
>  #include "../voffset.h"
>  #include <asm/bootparam_utils.h>
> +#include <asm/shared/pgtable.h>
>
>  /*
>   * WARNING!!
> @@ -277,7 +277,8 @@ static inline void handle_relocations(void *output, unsigned long output_len,
>  { }
>  #endif
>
> -static void parse_elf(void *output)
> +static void parse_elf(void *output, unsigned long output_len,
> +                     unsigned long virt_addr)
>  {
>  #ifdef CONFIG_X86_64
>         Elf64_Ehdr ehdr;
> @@ -287,6 +288,7 @@ static void parse_elf(void *output)
>         Elf32_Phdr *phdrs, *phdr;
>  #endif
>         void *dest;
> +       unsigned long addr;
>         int i;
>
>         memcpy(&ehdr, output, sizeof(ehdr));
> @@ -323,10 +325,50 @@ static void parse_elf(void *output)
>  #endif
>                         memmove(dest, output + phdr->p_offset, phdr->p_filesz);
>                         break;
> -               default: /* Ignore other PT_* */ break;
> +               default:
> +                       /* Ignore other PT_* */
> +                       break;
> +               }
> +       }
> +
> +       handle_relocations(output, output_len, virt_addr);
> +
> +       if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> +               goto skip_protect;
> +
> +       for (i = 0; i < ehdr.e_phnum; i++) {
> +               phdr = &phdrs[i];
> +
> +               switch (phdr->p_type) {
> +               case PT_LOAD:
> +#ifdef CONFIG_RELOCATABLE
> +                       addr = (unsigned long)output;
> +                       addr += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
> +#else
> +                       addr = phdr->p_paddr;
> +#endif
> +                       /*
> +                        * Simultaneously readable and writable segments are
> +                        * violating W^X, and should not be present in vmlinux image.
> +                        */
> +                       if ((phdr->p_flags & (PF_X | PF_W)) == (PF_X | PF_W))
> +                               error("W^X violation for ELF segment");
> +

Can we catch this at build time instead?

> +                       unsigned int flags = MAP_PROTECT;
> +                       if (phdr->p_flags & PF_X)
> +                               flags |= MAP_EXEC;
> +                       if (phdr->p_flags & PF_W)
> +                               flags |= MAP_WRITE;
> +
> +                       kernel_add_identity_map(addr, addr + phdr->p_memsz, flags);
> +                       break;
> +               default:
> +                       /* Ignore other PT_* */
> +                       break;
>                 }
>         }
>
> +skip_protect:
>         free(phdrs);
>  }
>
> @@ -434,6 +476,18 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>                                 needed_size,
>                                 &virt_addr);
>
> +       unsigned long phys_addr = (unsigned long)output;
> +
> +       /*
> +        * If KASLR is disabled input and output regions may overlap.
> +        * In this case we need to map region excutable as well.
> +        */
> +       unsigned long map_flags = MAP_ALLOC | MAP_WRITE |
> +                       (IS_ENABLED(CONFIG_RANDOMIZE_BASE) ? 0 : MAP_EXEC);
> +       output = (unsigned char *)kernel_add_identity_map(phys_addr,
> +                                                         phys_addr + needed_size,
> +                                                         map_flags);
> +
>         /* Validate memory location choices. */
>         if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
>                 error("Destination physical address inappropriately aligned");
> @@ -456,8 +510,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>         debug_putstr("\nDecompressing Linux... ");
>         __decompress(input_data, input_len, NULL, NULL, output, output_len,
>                         NULL, error);
> -       parse_elf(output);
> -       handle_relocations(output, output_len, virt_addr);
> +       parse_elf(output, output_len, virt_addr);
>         debug_putstr("done.\nBooting the kernel.\n");
>
>         /* Disable exception handling before booting the kernel */
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 62208ec04ca4..a4f99516f310 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -171,8 +171,20 @@ static inline int count_immovable_mem_regions(void) { return 0; }
>  #ifdef CONFIG_X86_5LEVEL
>  extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
>  #endif
> -extern void kernel_add_identity_map(unsigned long start, unsigned long end);
> -
> +#ifdef CONFIG_X86_64
> +extern unsigned long kernel_add_identity_map(unsigned long start,
> +                                            unsigned long end,
> +                                            unsigned int flags);
> +#else
> +static inline unsigned long kernel_add_identity_map(unsigned long start,
> +                                                   unsigned long end,
> +                                                   unsigned int flags)
> +{
> +       (void)flags;
> +       (void)end;

Why these (void) casts? Can we just drop them?


> +       return start;
> +}
> +#endif
>  /* Used by PAGE_KERN* macros: */
>  extern pteval_t __default_kernel_pte_mask;
>
> diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
> deleted file mode 100644
> index cc9b2529a086..000000000000
> --- a/arch/x86/boot/compressed/pgtable.h
> +++ /dev/null
> @@ -1,20 +0,0 @@
> -#ifndef BOOT_COMPRESSED_PAGETABLE_H
> -#define BOOT_COMPRESSED_PAGETABLE_H
> -
> -#define TRAMPOLINE_32BIT_SIZE          (2 * PAGE_SIZE)
> -
> -#define TRAMPOLINE_32BIT_PGTABLE_OFFSET        0
> -
> -#define TRAMPOLINE_32BIT_CODE_OFFSET   PAGE_SIZE
> -#define TRAMPOLINE_32BIT_CODE_SIZE     0x80
> -
> -#define TRAMPOLINE_32BIT_STACK_END     TRAMPOLINE_32BIT_SIZE
> -
> -#ifndef __ASSEMBLER__
> -
> -extern unsigned long *trampoline_32bit;
> -
> -extern void trampoline_32bit_src(void *return_ptr);
> -
> -#endif /* __ASSEMBLER__ */
> -#endif /* BOOT_COMPRESSED_PAGETABLE_H */
> diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
> index 2ac12ff4111b..c7cf5a1059a8 100644
> --- a/arch/x86/boot/compressed/pgtable_64.c
> +++ b/arch/x86/boot/compressed/pgtable_64.c
> @@ -2,7 +2,7 @@
>  #include "misc.h"
>  #include <asm/e820/types.h>
>  #include <asm/processor.h>
> -#include "pgtable.h"
> +#include <asm/shared/pgtable.h>
>  #include "../string.h"
>  #include "efi.h"
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index c93930d5ccbd..99f3ad0b30f3 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -13,6 +13,7 @@
>  #include "misc.h"
>
>  #include <asm/pgtable_types.h>
> +#include <asm/shared/pgtable.h>
>  #include <asm/sev.h>
>  #include <asm/trapnr.h>
>  #include <asm/trap_pf.h>
> @@ -435,10 +436,11 @@ void sev_prep_identity_maps(unsigned long top_level_pgt)
>                 unsigned long cc_info_pa = boot_params->cc_blob_address;
>                 struct cc_blob_sev_info *cc_info;
>
> -               kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info));
> +               kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info), MAP_NOFLUSH);
>
>                 cc_info = (struct cc_blob_sev_info *)cc_info_pa;
> -               kernel_add_identity_map(cc_info->cpuid_phys, cc_info->cpuid_phys + cc_info->cpuid_len);
> +               kernel_add_identity_map(cc_info->cpuid_phys,
> +                                       cc_info->cpuid_phys + cc_info->cpuid_len, MAP_NOFLUSH);
>         }
>
>         sev_verify_cbit(top_level_pgt);
> diff --git a/arch/x86/include/asm/shared/pgtable.h b/arch/x86/include/asm/shared/pgtable.h
> new file mode 100644
> index 000000000000..6527dadf39d6
> --- /dev/null
> +++ b/arch/x86/include/asm/shared/pgtable.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef ASM_SHARED_PAGETABLE_H
> +#define ASM_SHARED_PAGETABLE_H
> +
> +#define MAP_WRITE      0x02 /* Writable memory */
> +#define MAP_EXEC       0x04 /* Executable memory */
> +#define MAP_ALLOC      0x10 /* Range needs to be allocated */
> +#define MAP_PROTECT    0x20 /* Set exact memory attributes for memory range */
> +#define MAP_NOFLUSH    0x40 /* Avoid flushing TLB */
> +
> +#define TRAMPOLINE_32BIT_SIZE          (3 * PAGE_SIZE)
> +
> +#define TRAMPOLINE_32BIT_PLACEMENT_MAX (0xA0000)
> +
> +#define TRAMPOLINE_32BIT_PGTABLE_OFFSET        0
> +
> +#define TRAMPOLINE_32BIT_CODE_OFFSET   PAGE_SIZE
> +#define TRAMPOLINE_32BIT_CODE_SIZE     0x80
> +
> +#define TRAMPOLINE_32BIT_STACK_END     TRAMPOLINE_32BIT_SIZE
> +
> +#ifndef __ASSEMBLER__
> +
> +extern unsigned long *trampoline_32bit;
> +
> +extern void trampoline_32bit_src(void *return_ptr);
> +
> +#endif /* __ASSEMBLER__ */
> +#endif /* ASM_SHARED_PAGETABLE_H */
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 08/16] x86/boot: Remove mapping from page fault handler
  2022-09-06 10:41 ` [PATCH 08/16] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
@ 2022-10-19  7:20   ` Ard Biesheuvel
  0 siblings, 0 replies; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:20 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> After every implicit mapping is removed, this code is no longer needed.
>
> Remove memory mapping from page fault handler to ensure that there are
> no hidden invalid memory accesses.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

I don't grok this 100% but to me, it seems not having to rely on a
page fault handler to ensure that the 1:1 mapping has sufficient
coverage is a win so

Acked-by: Ard Biesheuvel <ardb@kernel.org>

> ---
>  arch/x86/boot/compressed/ident_map_64.c | 26 ++++++++++---------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index 880e08293023..c20cd31e665f 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -385,27 +385,21 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
>         unsigned long address = native_read_cr2();
>         unsigned long end;
> -       bool ghcb_fault;
> +       char *msg;
>
> -       ghcb_fault = sev_es_check_ghcb_fault(address);
> +       if (sev_es_check_ghcb_fault(address))
> +               msg = "Page-fault on GHCB page:";
> +       else
> +               msg = "Unexpected page-fault:";
>
>         address   &= PMD_MASK;
>         end        = address + PMD_SIZE;
>
>         /*
> -        * Check for unexpected error codes. Unexpected are:
> -        *      - Faults on present pages
> -        *      - User faults
> -        *      - Reserved bits set
> -        */
> -       if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
> -               do_pf_error("Unexpected page-fault:", error_code, address, regs->ip);
> -       else if (ghcb_fault)
> -               do_pf_error("Page-fault on GHCB page:", error_code, address, regs->ip);
> -
> -       /*
> -        * Error code is sane - now identity map the 2M region around
> -        * the faulting address.
> +        * Since all memory allocations are made explicit
> +        * now, every page fault at this stage is an
> +        * error and the error handler is there only
> +        * for debug purposes.
>          */
> -       kernel_add_identity_map(address, end, MAP_WRITE);
> +       do_pf_error(msg, error_code, address, regs->ip);
>  }
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/16] efi/libstub: Move helper function to related file
  2022-09-06 10:41 ` [PATCH 09/16] efi/libstub: Move helper function to related file Evgeniy Baskov
@ 2022-10-19  7:21   ` Ard Biesheuvel
  0 siblings, 0 replies; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:21 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> efi_adjust_memory_range_protection() can be useful outside x86-stub.c.
>
> Move it to mem.c, where memory related code resides and make it
> non-static.
>
> Change its behavior to setup exact attibutes and disallow making memory
> regions readable and writable simultaniosly for supported
> configurations.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

Acked-by: Ard Biesheuvel <ardb@kernel.org>

> ---
>  drivers/firmware/efi/libstub/efistub.h  |   4 +
>  drivers/firmware/efi/libstub/mem.c      | 101 ++++++++++++++++++++++++
>  drivers/firmware/efi/libstub/x86-stub.c |  67 ++--------------
>  3 files changed, 111 insertions(+), 61 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index b0ae0a454404..22fe28385db7 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -907,6 +907,10 @@ efi_status_t efi_relocate_kernel(unsigned long *image_addr,
>                                  unsigned long alignment,
>                                  unsigned long min_addr);
>
> +efi_status_t efi_adjust_memory_range_protection(unsigned long start,
> +                                               unsigned long size,
> +                                               unsigned long attributes);
> +
>  efi_status_t efi_parse_options(char const *cmdline);
>
>  void efi_parse_option_graphics(char *option);
> diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
> index feef8d4be113..89ebc8ad2c22 100644
> --- a/drivers/firmware/efi/libstub/mem.c
> +++ b/drivers/firmware/efi/libstub/mem.c
> @@ -130,3 +130,104 @@ void efi_free(unsigned long size, unsigned long addr)
>         nr_pages = round_up(size, EFI_ALLOC_ALIGN) / EFI_PAGE_SIZE;
>         efi_bs_call(free_pages, addr, nr_pages);
>  }
> +
> +/**
> + * efi_adjust_memory_range_protection() - change memory range protection attributes
> + * @start:     memory range start address
> + * @size:      memory range size
> + *
> + * Actual memory range for which memory attributes are modified is
> + * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
> + * that includes [start, start + size].
> + *
> + * @return: status code
> + */
> +efi_status_t efi_adjust_memory_range_protection(unsigned long start,
> +                                               unsigned long size,
> +                                               unsigned long attributes)
> +{
> +       efi_status_t status;
> +       efi_gcd_memory_space_desc_t desc;
> +       efi_physical_addr_t end, next;
> +       efi_physical_addr_t rounded_start, rounded_end;
> +       efi_physical_addr_t unprotect_start, unprotect_size;
> +       int has_system_memory = 0;
> +
> +       if (efi_dxe_table == NULL)
> +               return EFI_UNSUPPORTED;
> +
> +       /*
> +        * This function should not be used to modify attributes
> +        * other than writable/executable.
> +        */
> +
> +       if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
> +               return EFI_INVALID_PARAMETER;
> +
> +       /*
> +        * Disallow simultaniously executable and writable memory
> +        * to inforce W^X policy if direct extraction code is enabled.
> +        */
> +
> +       if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
> +           IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
> +               return EFI_INVALID_PARAMETER;
> +
> +       rounded_start = rounddown(start, EFI_PAGE_SIZE);
> +       rounded_end = roundup(start + size, EFI_PAGE_SIZE);
> +
> +       /*
> +        * Don't modify memory region attributes, they are
> +        * already suitable, to lower the possibility to
> +        * encounter firmware bugs.
> +        */
> +
> +       for (end = start + size; start < end; start = next) {
> +
> +               status = efi_dxe_call(get_memory_space_descriptor,
> +                                     start, &desc);
> +
> +               if (status != EFI_SUCCESS) {
> +                       efi_warn("Unable to get memory descriptor at %lx\n",
> +                                start);
> +                       return status;
> +               }
> +
> +               next = desc.base_address + desc.length;
> +
> +               /*
> +                * Only system memory is suitable for trampoline/kernel image
> +                * placement, so only this type of memory needs its attributes
> +                * to be modified.
> +                */
> +
> +               if (desc.gcd_memory_type != EfiGcdMemoryTypeSystemMemory) {
> +                       efi_warn("Attempted to change protection of special memory range\n");
> +                       return EFI_UNSUPPORTED;
> +               }
> +
> +               if (((desc.attributes ^ attributes) &
> +                    (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0)
> +                       continue;
> +
> +               desc.attributes &= ~(EFI_MEMORY_RO | EFI_MEMORY_XP);
> +               desc.attributes |= attributes;
> +
> +               unprotect_start = max(rounded_start, desc.base_address);
> +               unprotect_size = min(rounded_end, next) - unprotect_start;
> +
> +               status = efi_dxe_call(set_memory_space_attributes,
> +                                     unprotect_start, unprotect_size,
> +                                     desc.attributes);
> +
> +               if (status != EFI_SUCCESS) {
> +                       efi_warn("Unable to unprotect memory range [%08lx,%08lx]: %lx\n",
> +                                (unsigned long)unprotect_start,
> +                                (unsigned long)(unprotect_start + unprotect_size),
> +                                status);
> +                       return status;
> +               }
> +       }
> +
> +       return EFI_SUCCESS;
> +}
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 05ae8bcc9d67..678f9c2ccafc 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -212,62 +212,6 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
>         }
>  }
>
> -static void
> -adjust_memory_range_protection(unsigned long start, unsigned long size)
> -{
> -       efi_status_t status;
> -       efi_gcd_memory_space_desc_t desc;
> -       unsigned long end, next;
> -       unsigned long rounded_start, rounded_end;
> -       unsigned long unprotect_start, unprotect_size;
> -       int has_system_memory = 0;
> -
> -       if (efi_dxe_table == NULL)
> -               return;
> -
> -       rounded_start = rounddown(start, EFI_PAGE_SIZE);
> -       rounded_end = roundup(start + size, EFI_PAGE_SIZE);
> -
> -       /*
> -        * Don't modify memory region attributes, they are
> -        * already suitable, to lower the possibility to
> -        * encounter firmware bugs.
> -        */
> -
> -       for (end = start + size; start < end; start = next) {
> -
> -               status = efi_dxe_call(get_memory_space_descriptor, start, &desc);
> -
> -               if (status != EFI_SUCCESS)
> -                       return;
> -
> -               next = desc.base_address + desc.length;
> -
> -               /*
> -                * Only system memory is suitable for trampoline/kernel image placement,
> -                * so only this type of memory needs its attributes to be modified.
> -                */
> -
> -               if (desc.gcd_memory_type != EfiGcdMemoryTypeSystemMemory ||
> -                   (desc.attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0)
> -                       continue;
> -
> -               unprotect_start = max(rounded_start, (unsigned long)desc.base_address);
> -               unprotect_size = min(rounded_end, next) - unprotect_start;
> -
> -               status = efi_dxe_call(set_memory_space_attributes,
> -                                     unprotect_start, unprotect_size,
> -                                     EFI_MEMORY_WB);
> -
> -               if (status != EFI_SUCCESS) {
> -                       efi_warn("Unable to unprotect memory range [%08lx,%08lx]: %lx\n",
> -                                unprotect_start,
> -                                unprotect_start + unprotect_size,
> -                                status);
> -               }
> -       }
> -}
> -
>  /*
>   * Trampoline takes 2 pages and can be loaded in first megabyte of memory
>   * with its end placed between 128k and 640k where BIOS might start.
> @@ -291,12 +235,12 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
>          * and relocated kernel image.
>          */
>
> -       adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
> -                                      TRAMPOLINE_PLACEMENT_SIZE);
> +       efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
> +                                          TRAMPOLINE_PLACEMENT_SIZE, 0);
>
>  #ifdef CONFIG_64BIT
>         if (image_base != (unsigned long)startup_32)
> -               adjust_memory_range_protection(image_base, image_size);
> +               efi_adjust_memory_range_protection(image_base, image_size, 0);
>  #else
>         /*
>          * Clear protection flags on a whole range of possible
> @@ -306,8 +250,9 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
>          * need to remove possible protection on relocated image
>          * itself disregarding further relocations.
>          */
> -       adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
> -                                      KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
> +       efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
> +                                          KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
> +                                          0);
>  #endif
>  }
>
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 10/16] x86/boot: Make console interface more abstract
  2022-09-06 10:41 ` [PATCH 10/16] x86/boot: Make console interface more abstract Evgeniy Baskov
@ 2022-10-19  7:23   ` Ard Biesheuvel
  2022-10-20 12:10     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:23 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> To be able to extract kernel from EFI, console output functions
> need to be replaceable by alternative implementations.
>
> Make all of those functions pointers.
> Move serial console code to separate file.
>

What does kernel_add_identity_map() have to do with the above? Should
that be a separate patch?

> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/boot/compressed/Makefile       |   2 +-
>  arch/x86/boot/compressed/ident_map_64.c |  15 ++-
>  arch/x86/boot/compressed/misc.c         | 109 +--------------------
>  arch/x86/boot/compressed/misc.h         |  13 ++-
>  arch/x86/boot/compressed/putstr.c       | 124 ++++++++++++++++++++++++
>  5 files changed, 146 insertions(+), 117 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/putstr.c
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 35ce1a64068b..29411864bfcd 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -92,7 +92,7 @@ $(obj)/misc.o: $(obj)/../voffset.h
>
>  vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o \
>         $(obj)/misc.o $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
> -       $(obj)/piggy.o $(obj)/cpuflags.o
> +       $(obj)/piggy.o $(obj)/cpuflags.o $(obj)/putstr.o
>
>  vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
>  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index c20cd31e665f..c39373687e50 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -89,12 +89,20 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
>  static struct x86_mapping_info mapping_info;
>  int has_nx;
>
> +/*
> + * This points to actual implementation of mapping function.
> + * Either the one down below or the UEFI API wrapper.
> + */
> +unsigned long (*kernel_add_identity_map)(unsigned long start,
> +                                        unsigned long end,
> +                                        unsigned int flags);
> +
>  /*
>   * Adds the specified range to the identity mappings.
>   */
> -unsigned long kernel_add_identity_map(unsigned long start,
> -                                     unsigned long end,
> -                                     unsigned int flags)
> +unsigned long kernel_add_identity_map_(unsigned long start,
> +                                      unsigned long end,
> +                                      unsigned int flags)
>  {
>         int ret;
>
> @@ -136,6 +144,7 @@ void initialize_identity_maps(void *rmode)
>         struct setup_data *sd;
>
>         boot_params = rmode;
> +       kernel_add_identity_map = kernel_add_identity_map_;
>
>         /* Exclude the encryption mask from __PHYSICAL_MASK */
>         physical_mask &= ~sme_me_mask;
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index d377e434c4e3..e2c0d05ac293 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -53,13 +53,6 @@ struct port_io_ops pio_ops;
>  memptr free_mem_ptr;
>  memptr free_mem_end_ptr;
>
> -static char *vidmem;
> -static int vidport;
> -
> -/* These might be accessed before .bss is cleared, so use .data instead. */
> -static int lines __section(".data");
> -static int cols __section(".data");
> -
>  #ifdef CONFIG_KERNEL_GZIP
>  #include "../../../../lib/decompress_inflate.c"
>  #endif
> @@ -92,95 +85,6 @@ static int cols __section(".data");
>   * ../header.S.
>   */
>
> -static void scroll(void)
> -{
> -       int i;
> -
> -       memmove(vidmem, vidmem + cols * 2, (lines - 1) * cols * 2);
> -       for (i = (lines - 1) * cols * 2; i < lines * cols * 2; i += 2)
> -               vidmem[i] = ' ';
> -}
> -
> -#define XMTRDY          0x20
> -
> -#define TXR             0       /*  Transmit register (WRITE) */
> -#define LSR             5       /*  Line Status               */
> -static void serial_putchar(int ch)
> -{
> -       unsigned timeout = 0xffff;
> -
> -       while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
> -               cpu_relax();
> -
> -       outb(ch, early_serial_base + TXR);
> -}
> -
> -void __putstr(const char *s)
> -{
> -       int x, y, pos;
> -       char c;
> -
> -       if (early_serial_base) {
> -               const char *str = s;
> -               while (*str) {
> -                       if (*str == '\n')
> -                               serial_putchar('\r');
> -                       serial_putchar(*str++);
> -               }
> -       }
> -
> -       if (lines == 0 || cols == 0)
> -               return;
> -
> -       x = boot_params->screen_info.orig_x;
> -       y = boot_params->screen_info.orig_y;
> -
> -       while ((c = *s++) != '\0') {
> -               if (c == '\n') {
> -                       x = 0;
> -                       if (++y >= lines) {
> -                               scroll();
> -                               y--;
> -                       }
> -               } else {
> -                       vidmem[(x + cols * y) * 2] = c;
> -                       if (++x >= cols) {
> -                               x = 0;
> -                               if (++y >= lines) {
> -                                       scroll();
> -                                       y--;
> -                               }
> -                       }
> -               }
> -       }
> -
> -       boot_params->screen_info.orig_x = x;
> -       boot_params->screen_info.orig_y = y;
> -
> -       pos = (x + cols * y) * 2;       /* Update cursor position */
> -       outb(14, vidport);
> -       outb(0xff & (pos >> 9), vidport+1);
> -       outb(15, vidport);
> -       outb(0xff & (pos >> 1), vidport+1);
> -}
> -
> -void __puthex(unsigned long value)
> -{
> -       char alpha[2] = "0";
> -       int bits;
> -
> -       for (bits = sizeof(value) * 8 - 4; bits >= 0; bits -= 4) {
> -               unsigned long digit = (value >> bits) & 0xf;
> -
> -               if (digit < 0xA)
> -                       alpha[0] = '0' + digit;
> -               else
> -                       alpha[0] = 'a' + (digit - 0xA);
> -
> -               __putstr(alpha);
> -       }
> -}
> -
>  #ifdef CONFIG_X86_NEED_RELOCS
>  static void handle_relocations(void *output, unsigned long output_len,
>                                unsigned long virt_addr)
> @@ -407,17 +311,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>
>         sanitize_boot_params(boot_params);
>
> -       if (boot_params->screen_info.orig_video_mode == 7) {
> -               vidmem = (char *) 0xb0000;
> -               vidport = 0x3b4;
> -       } else {
> -               vidmem = (char *) 0xb8000;
> -               vidport = 0x3d4;
> -       }
> -
> -       lines = boot_params->screen_info.orig_video_lines;
> -       cols = boot_params->screen_info.orig_video_cols;
> -
>         init_default_io_ops();
>
>         /*
> @@ -428,7 +321,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>          */
>         early_tdx_detect();
>
> -       console_init();
> +       init_bare_console();
>
>         /*
>          * Save RSDP address for later use. Have this after console_init()
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index a4f99516f310..39dc3de50268 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -53,8 +53,8 @@ extern memptr free_mem_end_ptr;
>  void *malloc(int size);
>  void free(void *where);
>  extern struct boot_params *boot_params;
> -void __putstr(const char *s);
> -void __puthex(unsigned long value);
> +extern void (*__putstr)(const char *s);
> +extern void (*__puthex)(unsigned long value);
>  #define error_putstr(__x)  __putstr(__x)
>  #define error_puthex(__x)  __puthex(__x)
>
> @@ -124,6 +124,9 @@ static inline void console_init(void)
>  { }
>  #endif
>
> +/* putstr.c */
> +void init_bare_console(void);
> +
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  void sev_enable(struct boot_params *bp);
>  void sev_es_shutdown_ghcb(void);
> @@ -172,9 +175,9 @@ static inline int count_immovable_mem_regions(void) { return 0; }
>  extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
>  #endif
>  #ifdef CONFIG_X86_64
> -extern unsigned long kernel_add_identity_map(unsigned long start,
> -                                            unsigned long end,
> -                                            unsigned int flags);
> +extern unsigned long (*kernel_add_identity_map)(unsigned long start,
> +                                               unsigned long end,
> +                                               unsigned int flags);
>  #else
>  static inline unsigned long kernel_add_identity_map(unsigned long start,
>                                                     unsigned long end,
> diff --git a/arch/x86/boot/compressed/putstr.c b/arch/x86/boot/compressed/putstr.c
> new file mode 100644
> index 000000000000..accba0de8be9
> --- /dev/null
> +++ b/arch/x86/boot/compressed/putstr.c
> @@ -0,0 +1,124 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "misc.h"
> +
> +/* These might be accessed before .bss is cleared, so use .data instead. */
> +static char *vidmem __section(".data");
> +static int vidport __section(".data");
> +static int lines __section(".data");
> +static int cols __section(".data");
> +
> +void (*__putstr)(const char *s);
> +void (*__puthex)(unsigned long value);
> +
> +static void putstr(const char *s);
> +static void puthex(unsigned long value);
> +
> +void init_bare_console(void)
> +{
> +       __putstr = putstr;
> +       __puthex = puthex;
> +
> +       if (boot_params->screen_info.orig_video_mode == 7) {
> +               vidmem = (char *) 0xb0000;
> +               vidport = 0x3b4;
> +       } else {
> +               vidmem = (char *) 0xb8000;
> +               vidport = 0x3d4;
> +       }
> +
> +       lines = boot_params->screen_info.orig_video_lines;
> +       cols = boot_params->screen_info.orig_video_cols;
> +
> +       console_init();
> +}
> +
> +static void scroll(void)
> +{
> +       int i;
> +
> +       memmove(vidmem, vidmem + cols * 2, (lines - 1) * cols * 2);
> +       for (i = (lines - 1) * cols * 2; i < lines * cols * 2; i += 2)
> +               vidmem[i] = ' ';
> +}
> +
> +#define XMTRDY          0x20
> +
> +#define TXR             0       /*  Transmit register (WRITE) */
> +#define LSR             5       /*  Line Status               */
> +
> +static void serial_putchar(int ch)
> +{
> +       unsigned int timeout = 0xffff;
> +
> +       while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
> +               cpu_relax();
> +
> +       outb(ch, early_serial_base + TXR);
> +}
> +
> +static void putstr(const char *s)
> +{
> +       int x, y, pos;
> +       char c;
> +
> +       if (early_serial_base) {
> +               const char *str = s;
> +
> +               while (*str) {
> +                       if (*str == '\n')
> +                               serial_putchar('\r');
> +                       serial_putchar(*str++);
> +               }
> +       }
> +
> +       if (lines == 0 || cols == 0)
> +               return;
> +
> +       x = boot_params->screen_info.orig_x;
> +       y = boot_params->screen_info.orig_y;
> +
> +       while ((c = *s++) != '\0') {
> +               if (c == '\n') {
> +                       x = 0;
> +                       if (++y >= lines) {
> +                               scroll();
> +                               y--;
> +                       }
> +               } else {
> +                       vidmem[(x + cols * y) * 2] = c;
> +                       if (++x >= cols) {
> +                               x = 0;
> +                               if (++y >= lines) {
> +                                       scroll();
> +                                       y--;
> +                               }
> +                       }
> +               }
> +       }
> +
> +       boot_params->screen_info.orig_x = x;
> +       boot_params->screen_info.orig_y = y;
> +
> +       pos = (x + cols * y) * 2;       /* Update cursor position */
> +       outb(14, vidport);
> +       outb(0xff & (pos >> 9), vidport+1);
> +       outb(15, vidport);
> +       outb(0xff & (pos >> 1), vidport+1);
> +}
> +
> +static void puthex(unsigned long value)
> +{
> +       char alpha[2] = "0";
> +       int bits;
> +
> +       for (bits = sizeof(value) * 8 - 4; bits >= 0; bits -= 4) {
> +               unsigned long digit = (value >> bits) & 0xf;
> +
> +               if (digit < 0xA)
> +                       alpha[0] = '0' + digit;
> +               else
> +                       alpha[0] = 'a' + (digit - 0xA);
> +
> +               putstr(alpha);
> +       }
> +}
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 12/16] x86/boot: Add EFI kernel extraction interface
  2022-09-06 10:41 ` [PATCH 12/16] x86/boot: Add EFI kernel extraction interface Evgeniy Baskov
@ 2022-10-19  7:27   ` Ard Biesheuvel
  2022-10-20 12:14     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:27 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> To enable extraction of kernel image from EFI stub code directly
> extraction code needs to have separate interface that avoid part
> of low level initialization logic, i.e. serial port setup.
>
> Add kernel extraction function callable from libstub as a part
> of preparation for extracting the kernel directly from EFI environment.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/boot/compressed/head_32.S |  3 +-
>  arch/x86/boot/compressed/head_64.S |  2 +-
>  arch/x86/boot/compressed/misc.c    | 85 +++++++++++++++++++++---------
>  arch/x86/boot/compressed/misc.h    |  2 +
>  arch/x86/boot/compressed/putstr.c  |  9 ++++
>  5 files changed, 73 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index 3b354eb9516d..b46a1c4109cf 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -217,8 +217,7 @@ SYM_DATA(image_offset, .long 0)
>   */
>         .bss
>         .balign 4
> -boot_heap:
> -       .fill BOOT_HEAP_SIZE, 1, 0
> +SYM_DATA(boot_heap,    .fill BOOT_HEAP_SIZE, 1, 0)
>  boot_stack:
>         .fill BOOT_STACK_SIZE, 1, 0
>  boot_stack_end:
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 889ca7176aa7..37ce094571b5 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -1007,7 +1007,7 @@ SYM_FUNC_END(startup32_check_sev_cbit)
>   */
>         .bss
>         .balign 4
> -SYM_DATA_LOCAL(boot_heap,      .fill BOOT_HEAP_SIZE, 1, 0)
> +SYM_DATA(boot_heap,    .fill BOOT_HEAP_SIZE, 1, 0)
>
>  SYM_DATA_START_LOCAL(boot_stack)
>         .fill BOOT_STACK_SIZE, 1, 0
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index e2c0d05ac293..8016cc5c300e 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -293,11 +293,11 @@ static void parse_elf(void *output, unsigned long output_len,
>   *             |-------uncompressed kernel image---------|
>   *
>   */
> -asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> -                                 unsigned char *input_data,
> -                                 unsigned long input_len,
> -                                 unsigned char *output,
> -                                 unsigned long output_len)
> +static void *do_extract_kernel(void *rmode,
> +                              unsigned char *input_data,
> +                              unsigned long input_len,
> +                              unsigned char *output,
> +                              unsigned long output_len)
>  {
>         const unsigned long kernel_total_size = VO__end - VO__text;
>         unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
> @@ -311,18 +311,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>
>         sanitize_boot_params(boot_params);
>
> -       init_default_io_ops();
> -
> -       /*
> -        * Detect TDX guest environment.
> -        *
> -        * It has to be done before console_init() in order to use
> -        * paravirtualized port I/O operations if needed.
> -        */
> -       early_tdx_detect();
> -
> -       init_bare_console();
> -
>         /*
>          * Save RSDP address for later use. Have this after console_init()
>          * so that early debugging output from the RSDP parsing code can be
> @@ -330,11 +318,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>          */
>         boot_params->acpi_rsdp_addr = get_rsdp_addr();
>
> -       debug_putstr("early console in extract_kernel\n");
> -
> -       free_mem_ptr     = heap;        /* Heap */
> -       free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
> -
>         /*
>          * The memory hole needed for the kernel is the larger of either
>          * the entire decompressed kernel plus relocation table, or the
> @@ -387,12 +370,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>         if (virt_addr & (MIN_KERNEL_ALIGN - 1))
>                 error("Destination virtual address inappropriately aligned");
>  #ifdef CONFIG_X86_64
> -       if (heap > 0x3fffffffffffUL)
> +       if (phys_addr > 0x3fffffffffffUL)
>                 error("Destination address too large");
>         if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
>                 error("Destination virtual address is beyond the kernel mapping area");
>  #else
> -       if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
> +       if (phys_addr > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
>                 error("Destination address too large");
>  #endif
>  #ifndef CONFIG_RELOCATABLE
> @@ -406,12 +389,64 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>         parse_elf(output, output_len, virt_addr);
>         debug_putstr("done.\nBooting the kernel.\n");
>
> +       return output;
> +}
> +
> +asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> +                                 unsigned char *input_data,
> +                                 unsigned long input_len,
> +                                 unsigned char *output,
> +                                 unsigned long output_len)
> +{
> +       void *entry;
> +
> +       init_default_io_ops();
> +
> +       /*
> +        * Detect TDX guest environment.
> +        *
> +        * It has to be done before console_init() in order to use
> +        * paravirtualized port I/O operations if needed.
> +        */
> +       early_tdx_detect();
> +
> +       init_bare_console();
> +
> +       debug_putstr("early console in extract_kernel\n");
> +
> +       free_mem_ptr     = heap;        /* Heap */
> +       free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
> +
> +       entry = do_extract_kernel(rmode, input_data,
> +                                 input_len, output, output_len);
> +
>         /* Disable exception handling before booting the kernel */
>         cleanup_exception_handling();
>
> -       return output;
> +       return entry;
>  }
>
> +void *efi_extract_kernel(struct boot_params *rmode,
> +                        struct efi_iofunc *iofunc,
> +                        unsigned char *input_data,
> +                        unsigned long input_len,
> +                        unsigned char *output,
> +                        unsigned long output_len)
> +{
> +       extern char boot_heap[BOOT_HEAP_SIZE];
> +
> +       free_mem_ptr     = (unsigned long)boot_heap;    /* Heap */
> +       free_mem_end_ptr = (unsigned long)boot_heap + BOOT_HEAP_SIZE;
> +
> +       init_efi_console(iofunc);
> +
> +       return do_extract_kernel(rmode, input_data,
> +                                input_len, output, output_len);
> +}
> +
> +
> +
> +
>  void fortify_panic(const char *name)
>  {
>         error("detected buffer overflow");
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 39dc3de50268..b5aa0af6c59e 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -26,6 +26,7 @@
>  #include <asm/boot.h>
>  #include <asm/bootparam.h>
>  #include <asm/desc_defs.h>
> +#include <asm/shared/extract.h>
>
>  #include "tdx.h"
>
> @@ -126,6 +127,7 @@ static inline void console_init(void)
>
>  /* putstr.c */
>  void init_bare_console(void);
> +void init_efi_console(struct efi_iofunc *iofunc);
>
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  void sev_enable(struct boot_params *bp);
> diff --git a/arch/x86/boot/compressed/putstr.c b/arch/x86/boot/compressed/putstr.c
> index accba0de8be9..238d9677df61 100644
> --- a/arch/x86/boot/compressed/putstr.c
> +++ b/arch/x86/boot/compressed/putstr.c
> @@ -32,6 +32,15 @@ void init_bare_console(void)
>         console_init();
>  }
>
> +void init_efi_console(struct efi_iofunc *iofunc)

struct efi_iofunc does not exist yet

> +{
> +       __putstr = iofunc->putstr;
> +       __puthex = iofunc->puthex;
> +#ifdef CONFIG_X86_64
> +       kernel_add_identity_map = iofunc->map_range;
> +#endif
> +}
> +
>  static void scroll(void)
>  {
>         int i;
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 13/16] efi/x86: Support extracting kernel from libstub
  2022-09-06 10:41 ` [PATCH 13/16] efi/x86: Support extracting kernel from libstub Evgeniy Baskov
@ 2022-10-19  7:35   ` Ard Biesheuvel
  2022-10-20 12:36     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:35 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Doing it that way allows setting up stricter memory attributes,
> simplifies boot code path and removes potential relocation
> of kernel image.
>
> Wire up required interfaces and minimally initialize zero page
> fields needed for it to function correctly.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>
>  create mode 100644 arch/x86/include/asm/shared/extract.h
>  create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
> ---
>  arch/x86/boot/compressed/head_32.S            |   6 +-
>  arch/x86/boot/compressed/head_64.S            |  45 ++++
>  arch/x86/include/asm/shared/extract.h         |  25 ++
>  drivers/firmware/efi/Kconfig                  |  14 ++
>  drivers/firmware/efi/libstub/Makefile         |   1 +
>  drivers/firmware/efi/libstub/efistub.h        |   5 +
>  .../firmware/efi/libstub/x86-extract-direct.c | 220 ++++++++++++++++++
>  drivers/firmware/efi/libstub/x86-stub.c       |  45 ++--
>  8 files changed, 343 insertions(+), 18 deletions(-)
>  create mode 100644 arch/x86/include/asm/shared/extract.h
>  create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index b46a1c4109cf..d2866f06bc9f 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -155,7 +155,11 @@ SYM_FUNC_START(efi32_stub_entry)
>         add     $0x4, %esp
>         movl    8(%esp), %esi   /* save boot_params pointer */
>         call    efi_main
> -       /* efi_main returns the possibly relocated address of startup_32 */
> +
> +       /*
> +        * efi_main returns the possibly
> +        * relocated address of exteracted kernel entry point.

extracted

> +        */
>         jmp     *%eax
>  SYM_FUNC_END(efi32_stub_entry)
>  SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 37ce094571b5..b6bae8e7ee71 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -555,9 +555,54 @@ SYM_FUNC_START(efi64_stub_entry)
>         and     $~0xf, %rsp                     /* realign the stack */
>         movq    %rdx, %rbx                      /* save boot_params pointer */
>         call    efi_main
> +
> +#ifdef CONFIG_EFI_STUB_EXTRACT_DIRECT
> +       cld
> +       cli
> +
> +       movq    %rbx, %rdi /* boot_params */
> +       movq    %rax, %rsi /* decompressed kernel address */
> +
> +       /* Make sure we have GDT with 32-bit code segment */
> +       leaq    gdt64(%rip), %rax
> +       addq    %rax, 2(%rax)
> +       lgdt    (%rax)
> +
> +       /* Setup data segments. */
> +       xorl    %eax, %eax
> +       movl    %eax, %ds
> +       movl    %eax, %es
> +       movl    %eax, %ss
> +       movl    %eax, %fs
> +       movl    %eax, %gs
> +
> +       pushq   %rsi
> +       pushq   %rdi
> +
> +       call startup32_enable_nx_if_supported
> +
> +       call    trampoline_pgtable_init
> +       movq    %rax, %rdx
> +
> +
> +       /* Swap %rsi and %rsi */
> +       popq    %rsi
> +       popq    %rdi
> +
> +       /* Save the trampoline address in RCX */
> +       movq    trampoline_32bit(%rip), %rcx
> +
> +       /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
> +       pushq   $__KERNEL32_CS
> +       leaq    TRAMPOLINE_32BIT_CODE_OFFSET(%rcx), %rax
> +       pushq   %rax
> +       lretq
> +#else
>         movq    %rbx,%rsi
>         leaq    rva(startup_64)(%rax), %rax
>         jmp     *%rax
> +#endif
> +
>  SYM_FUNC_END(efi64_stub_entry)
>  SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
>  #endif
> diff --git a/arch/x86/include/asm/shared/extract.h b/arch/x86/include/asm/shared/extract.h
> new file mode 100644
> index 000000000000..163678145884
> --- /dev/null
> +++ b/arch/x86/include/asm/shared/extract.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef ASM_SHARED_EXTRACT_H
> +#define ASM_SHARED_EXTRACT_H
> +
> +#define MAP_WRITE      0x02 /* Writable memory */
> +#define MAP_EXEC       0x04 /* Executable memory */
> +#define MAP_ALLOC      0x10 /* Range needs to be allocated */
> +#define MAP_PROTECT    0x20 /* Set exact memory attributes for memory range */
> +
> +struct efi_iofunc {
> +       void (*putstr)(const char *msg);
> +       void (*puthex)(unsigned long x);
> +       unsigned long (*map_range)(unsigned long start,
> +                                  unsigned long end,
> +                                  unsigned int flags);

This looks a bit random - having a map_range() routine as a member of
the console I/O struct. Can we make this abstraction a bit more
natural?

> +};
> +
> +void *efi_extract_kernel(struct boot_params *rmode,
> +                        struct efi_iofunc *iofunc,
> +                        unsigned char *input_data,
> +                        unsigned long input_len,
> +                        unsigned char *output,
> +                        unsigned long output_len);
> +
> +#endif /* ASM_SHARED_EXTRACT_H */
> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
> index 6cb7384ad2ac..2418402a0bda 100644
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -91,6 +91,20 @@ config EFI_DXE_MEM_ATTRIBUTES
>           Use DXE services to check and alter memory protection
>           attributes during boot via EFISTUB to ensure that memory
>           ranges used by the kernel are writable and executable.
> +         This option also enables stricter memory attributes
> +         on compressed kernel PE image.
> +
> +config EFI_STUB_EXTRACT_DIRECT
> +       bool "Extract kernel directly from UEFI environment"
> +       depends on EFI && EFI_STUB && X86_64
> +       default y

What is the reason for making this configurable? Couldn't we just
enable it unconditionally?

> +       help
> +         Extract kernel before exiting EFI boot services
> +         This allows maintaining W^X for kernel image for
> +         the whole execution of compressed kernel code.
> +         This also slightly improves efficiency of extraction
> +         code by removing the need to copy the kernel around
> +         and rebuild page tables.
>
>  config EFI_PARAMS_FROM_FDT
>         bool
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index d0537573501e..1cea7d913356 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -69,6 +69,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB)        += efi-stub.o fdt.o string.o \
>  lib-$(CONFIG_ARM)              += arm32-stub.o
>  lib-$(CONFIG_ARM64)            += arm64-stub.o
>  lib-$(CONFIG_X86)              += x86-stub.o
> +lib-$(CONFIG_EFI_STUB_EXTRACT_DIRECT)  += x86-extract-direct.o
>  lib-$(CONFIG_RISCV)            += riscv-stub.o
>  CFLAGS_arm32-stub.o            := -DTEXT_OFFSET=$(TEXT_OFFSET)
>
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 22fe28385db7..cdd1bb50c786 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -968,6 +968,11 @@ static inline void
>  efi_enable_reset_attack_mitigation(void) { }
>  #endif
>
> +#ifdef CONFIG_X86
> +unsigned long extract_kernel_direct(struct boot_params *boot_params);
> +void startup_32(struct boot_params *boot_params);
> +#endif
> +

Please put this somewhere else

>  void efi_retrieve_tpm2_eventlog(void);
>
>  #endif
> diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c b/drivers/firmware/efi/libstub/x86-extract-direct.c
> new file mode 100644
> index 000000000000..6076bd75cfd6
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
> @@ -0,0 +1,220 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/acpi.h>
> +#include <linux/efi.h>
> +#include <linux/elf.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/efi.h>
> +#include <asm/e820/types.h>
> +#include <asm/desc.h>
> +#include <asm/boot.h>
> +#include <asm/bootparam_utils.h>
> +#include <asm/shared/extract.h>
> +#include <asm/shared/pgtable.h>
> +
> +#include "efistub.h"
> +
> +static void do_puthex(unsigned long value);
> +static void do_putstr(const char *msg);
> +

Can we get rid of these forward declarations?

> +static unsigned long do_map_range(unsigned long start,
> +                                 unsigned long end,
> +                                 unsigned int flags)
> +{
> +       efi_status_t status;
> +
> +       unsigned long size = end - start;
> +
> +       if (flags & MAP_ALLOC) {
> +               if (start == (unsigned long)startup_32)
> +                       start = LOAD_PHYSICAL_ADDR;
> +
> +               unsigned long addr;
> +
> +               status = efi_low_alloc_above(size, CONFIG_PHYSICAL_ALIGN,
> +                                            &addr, start);
> +               if (status != EFI_SUCCESS)
> +                       efi_err("Unable to allocate memory for uncompressed kernel");
> +
> +               if (start != addr) {
> +                       efi_debug("Unable to allocate at given address"
> +                                 " (desired=0x%lx, actual=0x%lx)",
> +                                 (unsigned long)start, addr);
> +                       start = addr;
> +               }
> +       }
> +
> +       if (flags & (MAP_PROTECT | MAP_ALLOC)) {
> +               unsigned long attr = 0;
> +
> +               if (!(flags & MAP_EXEC))
> +                       attr |= EFI_MEMORY_XP;
> +
> +               if (!(flags & MAP_WRITE))
> +                       attr |= EFI_MEMORY_RO;
> +
> +               status = efi_adjust_memory_range_protection(start,
> +                                                           end - start,
> +                                                           attr);
> +               if (status != EFI_SUCCESS)
> +                       efi_err("Unable to protect memory range");
> +       }
> +
> +       return start;
> +}
> +
> +/*
> + * Trampoline takes 3 pages and can be loaded in first megabyte of memory
> + * with its end placed between 0 and 640k where BIOS might start.
> + * (see arch/x86/boot/compressed/pgtable_64.c)
> + */
> +
> +#ifdef CONFIG_64BIT
> +static efi_status_t prepare_trampoline(void)
> +{
> +       efi_status_t status;
> +
> +       status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
> +                                   (unsigned long *)&trampoline_32bit,
> +                                   TRAMPOLINE_32BIT_PLACEMENT_MAX);
> +
> +       if (status != EFI_SUCCESS)
> +               return status;
> +
> +       unsigned long trampoline_start = (unsigned long)trampoline_32bit;
> +
> +       memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
> +
> +       /* First page of trampoline is a top level page table */
> +       efi_adjust_memory_range_protection(trampoline_start,
> +                                          PAGE_SIZE,
> +                                          EFI_MEMORY_XP);
> +
> +       /* Second page of trampoline is the code (with a padding) */
> +
> +       void *caddr = (void *)trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET;
> +
> +       memcpy(caddr, trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> +
> +       efi_adjust_memory_range_protection((unsigned long)caddr,
> +                                          PAGE_SIZE,
> +                                          EFI_MEMORY_RO);
> +
> +       /* And the last page of trampoline is the stack */
> +
> +       efi_adjust_memory_range_protection(trampoline_start + 2 * PAGE_SIZE,
> +                                          PAGE_SIZE,
> +                                          EFI_MEMORY_XP);
> +
> +       return EFI_SUCCESS;
> +}
> +#else
> +static inline efi_status_t prepare_trampoline(void)
> +{
> +       return EFI_SUCCESS;
> +}
> +#endif
> +
> +static efi_status_t init_loader_data(struct boot_params *params)
> +{
> +       struct efi_info *efi = (void *)&params->efi_info;
> +       efi_status_t status;
> +
> +       unsigned long map_size, desc_size, buff_size;
> +       u32 desc_ver;
> +       efi_memory_desc_t *mmap;
> +
> +       struct efi_boot_memmap map = {
> +               .map            = &mmap,
> +               .map_size       = &map_size,
> +               .desc_size      = &desc_size,
> +               .desc_ver       = &desc_ver,
> +               .key_ptr        = NULL,
> +               .buff_size      = &buff_size,
> +       };
> +
> +       status = efi_get_memory_map(&map);

efi_get_memory_map() has been updated in the mean time, so this needs a rewrite.

> +
> +       if (status != EFI_SUCCESS) {
> +               efi_err("Unable to get EFI memory map...\n");
> +               return status;
> +       }
> +
> +       const char *signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
> +                                              : EFI32_LOADER_SIGNATURE;
> +
> +       memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
> +
> +       efi->efi_memdesc_size = desc_size;
> +       efi->efi_memdesc_version = desc_ver;
> +       efi->efi_memmap_size = map_size;
> +
> +       efi_set_u64_split((unsigned long)mmap,
> +                         &efi->efi_memmap, &efi->efi_memmap_hi);
> +
> +       efi_set_u64_split((unsigned long)efi_system_table,
> +                         &efi->efi_systab, &efi->efi_systab_hi);
> +
> +       return EFI_SUCCESS;
> +}
> +
> +static void free_loader_data(struct boot_params *params)
> +{
> +       struct efi_info *efi = (void *)&params->efi_info;
> +       unsigned long mmap = efi->efi_memmap;
> +
> +#ifdef CONFIG_64BIT
> +       mmap |= ((unsigned long)efi->efi_memmap_hi << 32);
> +#endif
> +
> +       efi_bs_call(free_pool, (void *)mmap);
> +
> +       efi->efi_memdesc_size = 0;
> +       efi->efi_memdesc_version = 0;
> +       efi->efi_memmap_size = 0;
> +       efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
> +}
> +
> +unsigned long extract_kernel_direct(struct boot_params *params)
> +{
> +
> +       extern unsigned char input_data[];
> +       extern unsigned int output_len, input_len;
> +
> +       void *res;
> +       efi_status_t status;
> +       struct efi_iofunc iof = { 0 };
> +
> +       status = prepare_trampoline();
> +
> +       if (status != EFI_SUCCESS)
> +               return 0;
> +
> +       /* Prepare environment for do_extract_kernel() call */
> +       status = init_loader_data(params);
> +
> +       if (status != EFI_SUCCESS)
> +               return 0;
> +
> +       iof.puthex = do_puthex;
> +       iof.putstr = do_putstr;
> +       iof.map_range = do_map_range;
> +
> +       res = efi_extract_kernel(params, &iof, input_data, input_len,
> +                                (unsigned char *)startup_32, output_len);
> +
> +       free_loader_data(params);
> +
> +       return (unsigned long)res;
> +}
> +
> +static void do_puthex(unsigned long value)
> +{
> +       efi_printk("%08lx", value);
> +}
> +
> +static void do_putstr(const char *msg)
> +{
> +       efi_printk("%s", msg);
> +}
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 678f9c2ccafc..680184034cb7 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -230,26 +230,25 @@ static void
>  setup_memory_protection(unsigned long image_base, unsigned long image_size)
>  {
>         /*
> -        * Allow execution of possible trampoline used
> -        * for switching between 4- and 5-level page tables
> -        * and relocated kernel image.
> -        */
> +       * Allow execution of possible trampoline used
> +       * for switching between 4- and 5-level page tables
> +       * and relocated kernel image.
> +       */
>

Drop this hunk please

>         efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
>                                            TRAMPOLINE_PLACEMENT_SIZE, 0);
>
>  #ifdef CONFIG_64BIT
> -       if (image_base != (unsigned long)startup_32)
> -               efi_adjust_memory_range_protection(image_base, image_size, 0);
> +       efi_adjust_memory_range_protection(image_base, image_size, 0);
>  #else
>         /*
> -        * Clear protection flags on a whole range of possible
> -        * addresses used for KASLR. We don't need to do that
> -        * on x86_64, since KASLR/extraction is performed after
> -        * dedicated identity page tables are built and we only
> -        * need to remove possible protection on relocated image
> -        * itself disregarding further relocations.
> -        */
> +       * Clear protection flags on a whole range of possible
> +       * addresses used for KASLR. We don't need to do that
> +       * on x86_64, since KASLR/extraction is performed after
> +       * dedicated identity page tables are built and we only
> +       * need to remove possible protection on relocated image
> +       * itself disregarding further relocations.
> +       */

Drop this hunk please

>         efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
>                                            KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
>                                            0);
> @@ -270,8 +269,10 @@ static void setup_quirks(struct boot_params *boot_params,
>                         retrieve_apple_device_properties(boot_params);
>         }
>
> -       if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
> +       if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES) &&
> +           !IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT)) {
>                 setup_memory_protection(image_base, image_size);
> +       }
>  }
>
>  /*
> @@ -710,8 +711,10 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
>  }
>
>  /*
> - * On success, we return the address of startup_32, which has potentially been
> - * relocated by efi_relocate_kernel.
> + * On success, we return:
> + *   - the address of startup_32, which has potentially been
> + *     relocated by efi_relocate_kernel, if libstub direct extraction is disabled.
> + *   - extracted kernel entry point if libstub direct extraction is enabled.
>   * On failure, we exit to the firmware via efi_exit instead of returning.
>   */
>  unsigned long efi_main(efi_handle_t handle,
> @@ -736,6 +739,7 @@ unsigned long efi_main(efi_handle_t handle,
>                 efi_dxe_table = NULL;
>         }
>
> +#ifndef CONFIG_EFI_STUB_EXTRACT_DIRECT
>         /*
>          * If the kernel isn't already loaded at a suitable address,
>          * relocate it.
> @@ -789,6 +793,7 @@ unsigned long efi_main(efi_handle_t handle,
>                  */
>                 image_offset = 0;
>         }
> +#endif
>
>  #ifdef CONFIG_CMDLINE_BOOL
>         status = efi_parse_options(CONFIG_CMDLINE);
> @@ -845,7 +850,13 @@ unsigned long efi_main(efi_handle_t handle,
>
>         setup_efi_pci(boot_params);
>
> -       setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
> +       setup_quirks(boot_params, buffer_start, buffer_end - buffer_start);
> +
> +#ifdef CONFIG_EFI_STUB_EXTRACT_DIRECT
> +       bzimage_addr = extract_kernel_direct(boot_params);
> +       if (!bzimage_addr)
> +               goto fail;
> +#endif
>
>         status = exit_boot(boot_params, handle);
>         if (status != EFI_SUCCESS) {
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/16] x86/build: Make generated PE more spec compliant
  2022-09-06 10:41 ` [PATCH 14/16] x86/build: Make generated PE more spec compliant Evgeniy Baskov
@ 2022-10-19  7:39   ` Ard Biesheuvel
  2022-10-20 13:07     ` Evgeniy Baskov
  0 siblings, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:39 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Currently kernel image is not fully compliant PE image, so it may
> fail to boot with stricter implementations of UEFI PE loaders.
>
> Set minimal alignments and sizes specified by PE documentation [1]
> referenced by UEFI specification [2]. Align PE header to 8 bytes.


> Generate '.reloc' section with 2 entries and set reloc data directory.

Why?


>
> To make code more readable refactor tools/build.c:
>         - Use mmap() to access kernel image.
>         - Generate sections dynamically.
>         - Setup sections protection. Since we cannot fit every
>           needed section, set a part of protection flags
>           dynamically during initialization. This step is omitted
>           if CONFIG_EFI_DXE_MEM_ATTRIBUTES is not set.
>

If the commit log of a patch contains a bulleted list of the changes
that it implements, it is a very strong indicator that it needs to be
split up. Presenting this as a big ball of changes makes the life of a
reviewed unnecessarily hard.

> Reduce boot sector error message size since the space for the PE header
> before the zero page beginning is constrained.
>
> Explicitly change sections permissions in efi_pe_entry in case
> of incorrect EFI implementations and to reduce access rights to
> compressed kernel blob. By default it is set executable due to
> restriction in maximum number of sections that can fit before zero
> page.
>
> [1] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
> [2] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  arch/x86/boot/Makefile                  |   2 +-
>  arch/x86/boot/header.S                  | 110 +----
>  arch/x86/boot/tools/build.c             | 575 +++++++++++++++---------
>  drivers/firmware/efi/libstub/x86-stub.c |  63 ++-
>  4 files changed, 452 insertions(+), 298 deletions(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index ffec8bb01ba8..828eb41c2603 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -90,7 +90,7 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
>
>  SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
>
> -sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|efi32_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
> +sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|startup_64\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|efi32_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|_rodata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
>
>  quiet_cmd_zoffset = ZOFFSET $@
>        cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index f912d7770130..05a75f0a1876 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -59,17 +59,16 @@ start2:
>         cld
>
>         movw    $bugger_off_msg, %si
> +       movw    $bugger_off_msg_size, %cx
>
>  msg_loop:
>         lodsb
> -       andb    %al, %al
> -       jz      bs_die
>         movb    $0xe, %ah
>         movw    $7, %bx
>         int     $0x10
> -       jmp     msg_loop
> +       decw    %cx
> +       jnz     msg_loop
>
> -bs_die:
>         # Allow the user to press a key, then reboot
>         xorw    %ax, %ax
>         int     $0x16
> @@ -89,12 +88,12 @@ bs_die:
>
>         .section ".bsdata", "a"
>  bugger_off_msg:
> -       .ascii  "Use a boot loader.\r\n"
> -       .ascii  "\n"
> -       .ascii  "Remove disk and press any key to reboot...\r\n"
> -       .byte   0
> +       .ascii  "Use a boot loader. "
> +       .ascii  "Press a key to reboot"
> +       .set bugger_off_msg_size, . - bugger_off_msg
>
>  #ifdef CONFIG_EFI_STUB
> +       .align 8
>  pe_header:
>         .long   PE_MAGIC
>
> @@ -108,7 +107,7 @@ coff_header:
>         .set    pe_opt_magic, PE_OPT_MAGIC_PE32PLUS
>         .word   IMAGE_FILE_MACHINE_AMD64
>  #endif
> -       .word   section_count                   # nr_sections
> +       .word   0                               # nr_sections
>         .long   0                               # TimeDateStamp
>         .long   0                               # PointerToSymbolTable
>         .long   1                               # NumberOfSymbols
> @@ -132,7 +131,7 @@ optional_header:
>         # Filled in by build.c
>         .long   0x0000                          # AddressOfEntryPoint
>
> -       .long   0x0200                          # BaseOfCode
> +       .long   0x1000                          # BaseOfCode
>  #ifdef CONFIG_X86_32
>         .long   0                               # data
>  #endif
> @@ -145,8 +144,8 @@ extra_header_fields:
>  #else
>         .quad   image_base                      # ImageBase
>  #endif
> -       .long   0x20                            # SectionAlignment
> -       .long   0x20                            # FileAlignment
> +       .long   0x1000                          # SectionAlignment
> +       .long   0x200                           # FileAlignment
>         .word   0                               # MajorOperatingSystemVersion
>         .word   0                               # MinorOperatingSystemVersion
>         .word   LINUX_EFISTUB_MAJOR_VERSION     # MajorImageVersion
> @@ -189,91 +188,14 @@ extra_header_fields:
>         .quad   0                               # CertificationTable
>         .quad   0                               # BaseRelocationTable
>
> -       # Section table
> -section_table:
> -       #
> -       # The offset & size fields are filled in by build.c.
> -       #
> -       .ascii  ".setup"
> -       .byte   0
> -       .byte   0
> -       .long   0
> -       .long   0x0                             # startup_{32,64}
> -       .long   0                               # Size of initialized data
> -                                               # on disk
> -       .long   0x0                             # startup_{32,64}
> -       .long   0                               # PointerToRelocations
> -       .long   0                               # PointerToLineNumbers
> -       .word   0                               # NumberOfRelocations
> -       .word   0                               # NumberOfLineNumbers
> -       .long   IMAGE_SCN_CNT_CODE              | \
> -               IMAGE_SCN_MEM_READ              | \
> -               IMAGE_SCN_MEM_EXECUTE           | \
> -               IMAGE_SCN_ALIGN_16BYTES         # Characteristics
> -
> -       #
> -       # The EFI application loader requires a relocation section
> -       # because EFI applications must be relocatable. The .reloc
> -       # offset & size fields are filled in by build.c.
>         #
> -       .ascii  ".reloc"
> -       .byte   0
> -       .byte   0
> -       .long   0
> -       .long   0
> -       .long   0                               # SizeOfRawData
> -       .long   0                               # PointerToRawData
> -       .long   0                               # PointerToRelocations
> -       .long   0                               # PointerToLineNumbers
> -       .word   0                               # NumberOfRelocations
> -       .word   0                               # NumberOfLineNumbers
> -       .long   IMAGE_SCN_CNT_INITIALIZED_DATA  | \
> -               IMAGE_SCN_MEM_READ              | \
> -               IMAGE_SCN_MEM_DISCARDABLE       | \
> -               IMAGE_SCN_ALIGN_1BYTES          # Characteristics
> -
> -#ifdef CONFIG_EFI_MIXED
> -       #
> -       # The offset & size fields are filled in by build.c.
> +       # Section table
> +       # It is generated by build.c and here we just need
> +       # to reserve some space for sections
>         #
> -       .asciz  ".compat"
> -       .long   0
> -       .long   0x0
> -       .long   0                               # Size of initialized data
> -                                               # on disk
> -       .long   0x0
> -       .long   0                               # PointerToRelocations
> -       .long   0                               # PointerToLineNumbers
> -       .word   0                               # NumberOfRelocations
> -       .word   0                               # NumberOfLineNumbers
> -       .long   IMAGE_SCN_CNT_INITIALIZED_DATA  | \
> -               IMAGE_SCN_MEM_READ              | \
> -               IMAGE_SCN_MEM_DISCARDABLE       | \
> -               IMAGE_SCN_ALIGN_1BYTES          # Characteristics
> -#endif
> +section_table:
> +       .fill 40*5, 1, 0
>
> -       #
> -       # The offset & size fields are filled in by build.c.
> -       #
> -       .ascii  ".text"
> -       .byte   0
> -       .byte   0
> -       .byte   0
> -       .long   0
> -       .long   0x0                             # startup_{32,64}
> -       .long   0                               # Size of initialized data
> -                                               # on disk
> -       .long   0x0                             # startup_{32,64}
> -       .long   0                               # PointerToRelocations
> -       .long   0                               # PointerToLineNumbers
> -       .word   0                               # NumberOfRelocations
> -       .word   0                               # NumberOfLineNumbers
> -       .long   IMAGE_SCN_CNT_CODE              | \
> -               IMAGE_SCN_MEM_READ              | \
> -               IMAGE_SCN_MEM_EXECUTE           | \
> -               IMAGE_SCN_ALIGN_16BYTES         # Characteristics
> -
> -       .set    section_count, (. - section_table) / 40
>  #endif /* CONFIG_EFI_STUB */
>
>         # Kernel attributes; used by setup.  This is part 1 of the
> diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
> index a3725ad46c5a..dc3a1efb290e 100644
> --- a/arch/x86/boot/tools/build.c
> +++ b/arch/x86/boot/tools/build.c
> @@ -40,6 +40,8 @@ typedef unsigned char  u8;
>  typedef unsigned short u16;
>  typedef unsigned int   u32;
>
> +#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
> +
>  #define DEFAULT_MAJOR_ROOT 0
>  #define DEFAULT_MINOR_ROOT 0
>  #define DEFAULT_ROOT_DEV (DEFAULT_MAJOR_ROOT << 8 | DEFAULT_MINOR_ROOT)
> @@ -59,12 +61,74 @@ u8 buf[SETUP_SECT_MAX*512];
>  #define PECOFF_COMPAT_RESERVE 0x0
>  #endif
>
> +#define PARAGRAPH_SIZE 16
> +#define SECTOR_SIZE 512
> +#define FILE_ALIGNMENT 512
> +#define SECTION_ALIGNMENT 4096
> +
> +#define RELOC_SECTION_SIZE 12
> +
> +#ifdef CONFIG_EFI_MIXED
> +#define COMPAT_SECTION_SIZE 8
> +#else
> +#define COMPAT_SECTION_SIZE 0
> +#endif
> +
> +#define DOS_PECOFF_HEADER_OFFSET 0x3c
> +
> +#define PECOFF_CODE_SIZE_OFFSET 0x1c
> +#define PECOFF_DATA_SIZE_OFFSET 0x20
> +#define PECOFF_IMAGE_SIZE_OFFSET 0x50
> +#define PECOFF_ENTRY_POINT_OFFSET 0x28
> +#define PECOFF_SECTIONS_COUNT_OFFSET 0x6
> +#define PECOFF_BASE_OF_CODE_OFFSET 0x2c
> +
> +#ifdef CONFIG_X86_32
> +#define PECOFF_SECTION_TABLE_OFFSET 0xa8
> +#define PECOFF_RELOC_DIR_OFFSET 0xa0
> +#else
> +#define PECOFF_SECTION_TABLE_OFFSET 0xb8
> +#define PECOFF_RELOC_DIR_OFFSET 0xb0
> +#endif
> +
> +#define PECOFF_SECTION_SIZE 0x28
> +
> +#define PECOFF_SCN_NAME_OFFSET 0x0
> +#define PECOFF_SCN_NAME_SIZE 8
> +#define PECOFF_SCN_MEMSZ_OFFSET 0x8
> +#define PECOFF_SCN_VADDR_OFFSET 0xc
> +#define PECOFF_SCN_FILESZ_OFFSET 0x10
> +#define PECOFF_SCN_OFFSET_OFFSET 0x14
> +#define PECOFF_SCN_FLAGS_OFFSET 0x24
> +
> +#define IMAGE_SCN_CNT_CODE     0x00000020 /* .text */
> +#define IMAGE_SCN_CNT_INITIALIZED_DATA 0x00000040 /* .data */
> +#define IMAGE_SCN_ALIGN_512BYTES 0x00a00000
> +#define IMAGE_SCN_ALIGN_4096BYTES 0x00d00000
> +#define IMAGE_SCN_MEM_DISCARDABLE 0x02000000 /* scn can be discarded */
> +#define IMAGE_SCN_MEM_EXECUTE  0x20000000 /* can be executed as code */
> +#define IMAGE_SCN_MEM_READ     0x40000000 /* readable */
> +#define IMAGE_SCN_MEM_WRITE    0x80000000 /* writeable */
> +

All of those defines need to go into a header file, probably include/linux/pe.h

> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | IMAGE_SCN_ALIGN_4096BYTES)
> +#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
> +#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
> +#else
> +/* With memory protection disabled all sections are RWX */
> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
> +               IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
> +#define SCN_RX SCN_RW
> +#define SCN_RO SCN_RW
> +#endif
> +
>  static unsigned long efi32_stub_entry;
>  static unsigned long efi64_stub_entry;
>  static unsigned long efi_pe_entry;
>  static unsigned long efi32_pe_entry;
>  static unsigned long kernel_info;
>  static unsigned long startup_64;
> +static unsigned long _rodata;
>  static unsigned long _ehead;
>  static unsigned long _end;
>
> @@ -152,91 +216,126 @@ static void usage(void)
>         die("Usage: build setup system zoffset.h image");
>  }
>
> -#ifdef CONFIG_EFI_STUB
> -
> -static void update_pecoff_section_header_fields(char *section_name, u32 vma, u32 size, u32 datasz, u32 offset)
> +static void *map_file(const char *path, size_t *psize)
>  {
> -       unsigned int pe_header;
> -       unsigned short num_sections;
> -       u8 *section;
> -
> -       pe_header = get_unaligned_le32(&buf[0x3c]);
> -       num_sections = get_unaligned_le16(&buf[pe_header + 6]);
> -
> -#ifdef CONFIG_X86_32
> -       section = &buf[pe_header + 0xa8];
> -#else
> -       section = &buf[pe_header + 0xb8];
> -#endif
> -
> -       while (num_sections > 0) {
> -               if (strncmp((char*)section, section_name, 8) == 0) {
> -                       /* section header size field */
> -                       put_unaligned_le32(size, section + 0x8);
> +       struct stat statbuf;
> +       size_t size;
> +       void *addr;
> +       int fd;
>
> -                       /* section header vma field */
> -                       put_unaligned_le32(vma, section + 0xc);
> +       fd = open(path, O_RDONLY);
> +       if (fd < 0)
> +               die("Unable to open `%s': %m", path);
> +       if (fstat(fd, &statbuf))
> +               die("Unable to stat `%s': %m", path);
>
> -                       /* section header 'size of initialised data' field */
> -                       put_unaligned_le32(datasz, section + 0x10);
> +       size = statbuf.st_size;
> +       /*
> +        * Map one byte more, to allow adding null-terminator
> +        * for text files.
> +        */
> +       addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> +       if (addr == MAP_FAILED)
> +               die("Unable to mmap '%s': %m", path);
>
> -                       /* section header 'file offset' field */
> -                       put_unaligned_le32(offset, section + 0x14);
> +       close(fd);
>
> -                       break;
> -               }
> -               section += 0x28;
> -               num_sections--;
> -       }
> +       *psize = size;
> +       return addr;
>  }
>
> -static void update_pecoff_section_header(char *section_name, u32 offset, u32 size)
> +static void unmap_file(void *addr, size_t size)
>  {
> -       update_pecoff_section_header_fields(section_name, offset, size, size, offset);
> +       munmap(addr, size + 1);
>  }
>
> -static void update_pecoff_setup_and_reloc(unsigned int size)
> +static void *map_output_file(const char *path, size_t size)
>  {
> -       u32 setup_offset = 0x200;
> -       u32 reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
> -#ifdef CONFIG_EFI_MIXED
> -       u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> -#endif
> -       u32 setup_size = reloc_offset - setup_offset;
> +       void *addr;
> +       int fd;
>
> -       update_pecoff_section_header(".setup", setup_offset, setup_size);
> -       update_pecoff_section_header(".reloc", reloc_offset, PECOFF_RELOC_RESERVE);
> +       fd = open(path, O_RDWR | O_CREAT, 0660);
> +       if (fd < 0)
> +               die("Unable to create `%s': %m", path);
>
> -       /*
> -        * Modify .reloc section contents with a single entry. The
> -        * relocation is applied to offset 10 of the relocation section.
> -        */
> -       put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
> -       put_unaligned_le32(10, &buf[reloc_offset + 4]);
> +       if (ftruncate(fd, size))
> +               die("Unable to resize `%s': %m", path);
>
> -#ifdef CONFIG_EFI_MIXED
> -       update_pecoff_section_header(".compat", compat_offset, PECOFF_COMPAT_RESERVE);
> +       addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +       if (addr == MAP_FAILED)
> +               die("Unable to mmap '%s': %m", path);
>
> -       /*
> -        * Put the IA-32 machine type (0x14c) and the associated entry point
> -        * address in the .compat section, so loaders can figure out which other
> -        * execution modes this image supports.
> -        */
> -       buf[compat_offset] = 0x1;
> -       buf[compat_offset + 1] = 0x8;
> -       put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
> -       put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset + 4]);
> -#endif
> +       return addr;
>  }
>
> -static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
> -                              unsigned int init_sz)
> +#ifdef CONFIG_EFI_STUB
> +
> +static unsigned int reloc_offset;
> +static unsigned int compat_offset;
> +
> +#define MAX_SECTIONS 5
> +
> +static void emit_pecoff_section(const char *section_name, unsigned int size,
> +                               unsigned int bss, unsigned int *file_offset,
> +                               unsigned int *mem_offset, u32 flags)
>  {
> +       unsigned int section_memsz, section_filesz;
>         unsigned int pe_header;
> -       unsigned int text_sz = file_sz - text_start;
> -       unsigned int bss_sz = init_sz - file_sz;
> +       unsigned short num_sections;
> +       u8 *pnum_sections;
> +       u8 *section;
> +
> +       pe_header = get_unaligned_le32(&buf[DOS_PECOFF_HEADER_OFFSET]);
> +       pnum_sections = &buf[pe_header + PECOFF_SECTIONS_COUNT_OFFSET];
> +       num_sections = get_unaligned_le16(pnum_sections);
> +       if (num_sections >= MAX_SECTIONS)
> +               die("Not enough space to generate all sections");
> +
> +       section = &buf[pe_header + PECOFF_SECTION_TABLE_OFFSET];
> +       section += PECOFF_SECTION_SIZE * num_sections;
> +
> +       if ((size & (FILE_ALIGNMENT - 1)) || (bss & (FILE_ALIGNMENT - 1)))
> +               die("Section '%s' is improperly aligned", section_name);
> +
> +       section_memsz = round_up(size + bss, SECTION_ALIGNMENT);
> +       section_filesz = round_up(size, FILE_ALIGNMENT);
> +
> +       /* Zero out all section fields */
> +       memset(section, 0, PECOFF_SECTION_SIZE);
> +
> +       /* Section header size field */
> +       strncpy((char *)(section + PECOFF_SCN_NAME_OFFSET),
> +               section_name, PECOFF_SCN_NAME_SIZE);
>
> -       pe_header = get_unaligned_le32(&buf[0x3c]);
> +       put_unaligned_le32(section_memsz, section + PECOFF_SCN_MEMSZ_OFFSET);
> +       put_unaligned_le32(*mem_offset, section + PECOFF_SCN_VADDR_OFFSET);
> +       put_unaligned_le32(section_filesz, section + PECOFF_SCN_FILESZ_OFFSET);
> +       put_unaligned_le32(*file_offset, section + PECOFF_SCN_OFFSET_OFFSET);
> +       put_unaligned_le32(flags, section + PECOFF_SCN_FLAGS_OFFSET);
> +
> +       put_unaligned_le16(num_sections + 1, pnum_sections);
> +
> +       *mem_offset += section_memsz;
> +       *file_offset += section_filesz;
> +
> +}
> +
> +#define BASE_RVA 0x1000
> +
> +static unsigned int update_pecoff_sections(unsigned int setup_size,
> +                                          unsigned int file_size,
> +                                          unsigned int init_size,
> +                                          unsigned int text_size)
> +{
> +       /* First section starts at 512 byes, after PE header */
> +       unsigned int mem_offset = BASE_RVA, file_offset = SECTOR_SIZE;
> +       unsigned int compat_size, reloc_size, image_size, text_rva;
> +       unsigned int pe_header, bss_size, text_rva_diff, reloc_rva;
> +
> +       pe_header = get_unaligned_le32(&buf[DOS_PECOFF_HEADER_OFFSET]);
> +
> +       if (get_unaligned_le32(&buf[pe_header + PECOFF_SECTIONS_COUNT_OFFSET]))
> +               die("Some sections present in PE file");
>
>         /*
>          * The PE/COFF loader may load the image at an address which is
> @@ -247,42 +346,121 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
>          * add slack to allow the buffer to be aligned within the declared size
>          * of the image.
>          */
> -       bss_sz  += CONFIG_PHYSICAL_ALIGN;
> -       init_sz += CONFIG_PHYSICAL_ALIGN;
> +       init_size += CONFIG_PHYSICAL_ALIGN;
> +       image_size = init_size;
> +
> +       reloc_size = round_up(RELOC_SECTION_SIZE, FILE_ALIGNMENT);
> +       compat_size = round_up(COMPAT_SECTION_SIZE, FILE_ALIGNMENT);
> +
> +       /*
> +        * Let's remove extra memory used by special sections
> +        * and use it as a part of bss.
> +        */
> +       init_size -= round_up(reloc_size, SECTION_ALIGNMENT);
> +       init_size -= round_up(compat_size, SECTION_ALIGNMENT);
> +       if (init_size < file_size + setup_size) {
> +               init_size = file_size + setup_size;
> +               image_size += round_up(reloc_size, SECTION_ALIGNMENT);
> +               image_size += round_up(compat_size, SECTION_ALIGNMENT);
> +       }
>
>         /*
> -        * Size of code: Subtract the size of the first sector (512 bytes)
> -        * which includes the header.
> +        * Update sections offsets.
> +        * NOTE: Order is important
>          */
> -       put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header + 0x1c]);
>
> -       /* Size of image */
> -       put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
> +       bss_size = init_size - file_size - setup_size;
> +
> +       emit_pecoff_section(".setup", setup_size - SECTOR_SIZE, 0,
> +                           &file_offset, &mem_offset, SCN_RO |
> +                           IMAGE_SCN_CNT_INITIALIZED_DATA);
> +
> +       text_rva_diff = mem_offset - file_offset;
> +       text_rva = mem_offset;
> +       emit_pecoff_section(".text", text_size, 0,
> +                           &file_offset, &mem_offset, SCN_RX |
> +                           IMAGE_SCN_CNT_CODE);
> +
> +       /* Check that kernel sections mapping is contiguous */
> +       if (text_rva_diff != mem_offset - file_offset)
> +               die("Kernel sections mapping is wrong: %#x != %#x",
> +                   mem_offset - file_offset, text_rva_diff);
> +
> +       emit_pecoff_section(".data", file_size - text_size, bss_size,
> +                           &file_offset, &mem_offset, SCN_RW |
> +                           IMAGE_SCN_CNT_INITIALIZED_DATA);
> +
> +       reloc_offset = file_offset;
> +       reloc_rva = mem_offset;
> +       emit_pecoff_section(".reloc", reloc_size, 0,
> +                           &file_offset, &mem_offset, SCN_RW |
> +                           IMAGE_SCN_CNT_INITIALIZED_DATA |
> +                           IMAGE_SCN_MEM_DISCARDABLE);
> +
> +       compat_offset = file_offset;
> +#ifdef CONFIG_EFI_MIXED
> +       emit_pecoff_section(".comat", compat_size, 0,
> +                           &file_offset, &mem_offset, SCN_RW |
> +                           IMAGE_SCN_CNT_INITIALIZED_DATA |
> +                           IMAGE_SCN_MEM_DISCARDABLE);
> +#endif
> +
> +       if (file_size + setup_size + reloc_size + compat_size != file_offset)
> +               die("file_size(%#x) != filesz(%#x)",
> +                   file_size + setup_size + reloc_size + compat_size, file_offset);
> +
> +       /* Size of code. */
> +       put_unaligned_le32(round_up(text_size, SECTION_ALIGNMENT),
> +                          &buf[pe_header + PECOFF_CODE_SIZE_OFFSET]);
> +       /*
> +        * Size of data.
> +        * Exclude text size and first sector, which contains PE header.
> +        */
> +       put_unaligned_le32(mem_offset - round_up(text_size, SECTION_ALIGNMENT),
> +                          &buf[pe_header + PECOFF_DATA_SIZE_OFFSET]);
> +
> +       /* Size of image. */
> +       put_unaligned_le32(mem_offset, &buf[pe_header + PECOFF_IMAGE_SIZE_OFFSET]);
>
>         /*
>          * Address of entry point for PE/COFF executable
>          */
> -       put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header + 0x28]);
> +       put_unaligned_le32(text_rva + efi_pe_entry, &buf[pe_header + PECOFF_ENTRY_POINT_OFFSET]);
>
> -       update_pecoff_section_header_fields(".text", text_start, text_sz + bss_sz,
> -                                           text_sz, text_start);
> -}
> +       /*
> +        * BaseOfCode for PE/COFF executable
> +        */
> +       put_unaligned_le32(text_rva, &buf[pe_header + PECOFF_BASE_OF_CODE_OFFSET]);
>
> -static int reserve_pecoff_reloc_section(int c)
> -{
> -       /* Reserve 0x20 bytes for .reloc section */
> -       memset(buf+c, 0, PECOFF_RELOC_RESERVE);
> -       return PECOFF_RELOC_RESERVE;
> +       /*
> +        * Since we have generated .reloc section, we need to
> +        * fill-in Reloc directory
> +        */
> +       put_unaligned_le32(reloc_rva, &buf[pe_header + PECOFF_RELOC_DIR_OFFSET]);
> +       put_unaligned_le32(RELOC_SECTION_SIZE, &buf[pe_header + PECOFF_RELOC_DIR_OFFSET + 4]);
> +
> +       return file_offset;
>  }
>
> -static void efi_stub_defaults(void)
> +static void generate_pecoff_section_data(u8 *output, unsigned int setup_size)
>  {
> -       /* Defaults for old kernel */
> -#ifdef CONFIG_X86_32
> -       efi_pe_entry = 0x10;
> -#else
> -       efi_pe_entry = 0x210;
> -       startup_64 = 0x200;
> +       /*
> +        * Modify .reloc section contents with a two entries. The
> +        * relocation is applied to offset 10 of the relocation section.
> +        */
> +       put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE, &output[reloc_offset]);
> +       put_unaligned_le32(RELOC_SECTION_SIZE, &output[reloc_offset + 4]);
> +
> +#ifdef CONFIG_EFI_MIXED
> +       /*
> +        * Put the IA-32 machine type (0x14c) and the associated entry point
> +        * address in the .compat section, so loaders can figure out which other
> +        * execution modes this image supports.
> +        */
> +       output[compat_offset] = 0x1;
> +       output[compat_offset + 1] = 0x8;
> +       put_unaligned_le16(0x14c, &output[compat_offset + 2]);
> +       put_unaligned_le32(efi32_pe_entry + setup_size, &output[compat_offset + 4]);
>  #endif
>  }
>
> @@ -297,33 +475,27 @@ static void efi_stub_entry_update(void)
>
>  #ifdef CONFIG_EFI_MIXED
>         if (efi32_stub_entry != addr)
> -               die("32-bit and 64-bit EFI entry points do not match\n");
> +               die("32-bit and 64-bit EFI entry points do not match");
>  #endif
>         put_unaligned_le32(addr, &buf[0x264]);
>  }
>
> +static void efi_stub_update_defaults(void)
> +{
> +       /* Defaults for old kernel */
> +#ifdef CONFIG_X86_32
> +       efi_pe_entry = 0x10;
> +#else
> +       efi_pe_entry = 0x210;
> +       startup_64 = 0x200;
> +#endif
> +}
>  #else
>
> -static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
> -static inline void update_pecoff_text(unsigned int text_start,
> -                                     unsigned int file_sz,
> -                                     unsigned int init_sz) {}
> -static inline void efi_stub_defaults(void) {}
> -static inline void efi_stub_entry_update(void) {}
> +static void efi_stub_update_defaults(void) {}
>
> -static inline int reserve_pecoff_reloc_section(int c)
> -{
> -       return 0;
> -}
>  #endif /* CONFIG_EFI_STUB */
>
> -static int reserve_pecoff_compat_section(int c)
> -{
> -       /* Reserve 0x20 bytes for .compat section */
> -       memset(buf+c, 0, PECOFF_COMPAT_RESERVE);
> -       return PECOFF_COMPAT_RESERVE;
> -}
> -
>  /*
>   * Parse zoffset.h and find the entry points. We could just #include zoffset.h
>   * but that would mean tools/build would have to be rebuilt every time. It's
> @@ -336,20 +508,15 @@ static int reserve_pecoff_compat_section(int c)
>
>  static void parse_zoffset(char *fname)
>  {
> -       FILE *file;
> -       char *p;
> -       int c;
> +       size_t size;
> +       char *data, *p;
>
> -       file = fopen(fname, "r");
> -       if (!file)
> -               die("Unable to open `%s': %m", fname);
> -       c = fread(buf, 1, sizeof(buf) - 1, file);
> -       if (ferror(file))
> -               die("read-error on `zoffset.h'");
> -       fclose(file);
> -       buf[c] = 0;
> +       data = map_file(fname, &size);
> +
> +       /* We can do that, since we mapped one byte more */
> +       data[size] = 0;
>
> -       p = (char *)buf;
> +       p = (char *)data;
>
>         while (p && *p) {
>                 PARSE_ZOFS(p, efi32_stub_entry);
> @@ -358,6 +525,7 @@ static void parse_zoffset(char *fname)
>                 PARSE_ZOFS(p, efi32_pe_entry);
>                 PARSE_ZOFS(p, kernel_info);
>                 PARSE_ZOFS(p, startup_64);
> +               PARSE_ZOFS(p, _rodata);
>                 PARSE_ZOFS(p, _ehead);
>                 PARSE_ZOFS(p, _end);
>
> @@ -365,82 +533,93 @@ static void parse_zoffset(char *fname)
>                 while (p && (*p == '\r' || *p == '\n'))
>                         p++;
>         }
> +
> +       unmap_file(data, size);
>  }
>
> -int main(int argc, char ** argv)
> +static unsigned int read_setup(char *path)
>  {
> -       unsigned int i, sz, setup_sectors, init_sz;
> -       int c;
> -       u32 sys_size;
> -       struct stat sb;
> -       FILE *file, *dest;
> -       int fd;
> -       void *kernel;
> -       u32 crc = 0xffffffffUL;
> -
> -       efi_stub_defaults();
> -
> -       if (argc != 5)
> -               usage();
> -       parse_zoffset(argv[3]);
> -
> -       dest = fopen(argv[4], "w");
> -       if (!dest)
> -               die("Unable to write `%s': %m", argv[4]);
> +       FILE *file;
> +       unsigned int setup_size, file_size;
>
>         /* Copy the setup code */
> -       file = fopen(argv[1], "r");
> +       file = fopen(path, "r");
>         if (!file)
> -               die("Unable to open `%s': %m", argv[1]);
> -       c = fread(buf, 1, sizeof(buf), file);
> +               die("Unable to open `%s': %m", path);
> +
> +       file_size = fread(buf, 1, sizeof(buf), file);
>         if (ferror(file))
>                 die("read-error on `setup'");
> -       if (c < 1024)
> +
> +       if (file_size < 2 * SECTOR_SIZE)
>                 die("The setup must be at least 1024 bytes");
> -       if (get_unaligned_le16(&buf[510]) != 0xAA55)
> +
> +       if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
>                 die("Boot block hasn't got boot flag (0xAA55)");
> -       fclose(file);
>
> -       c += reserve_pecoff_compat_section(c);
> -       c += reserve_pecoff_reloc_section(c);
> +       fclose(file);
>
>         /* Pad unused space with zeros */
> -       setup_sectors = (c + 511) / 512;
> -       if (setup_sectors < SETUP_SECT_MIN)
> -               setup_sectors = SETUP_SECT_MIN;
> -       i = setup_sectors*512;
> -       memset(buf+c, 0, i-c);
> +       setup_size = round_up(file_size, SECTOR_SIZE);
> +
> +       if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
> +               setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
>
> -       update_pecoff_setup_and_reloc(i);
> +       /*
> +        * Global buffer is already initialised
> +        * to 0, but just in case, zero out padding.
> +        */
> +
> +       memset(buf + file_size, 0, setup_size - file_size);
> +
> +       return setup_size;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       size_t kern_file_size;
> +       unsigned int setup_size;
> +       unsigned int setup_sectors;
> +       unsigned int init_size;
> +       unsigned int total_size;
> +       unsigned int kern_size;
> +       void *kernel;
> +       u32 crc = 0xffffffffUL;
> +       u8 *output;
> +
> +       if (argc != 5)
> +               usage();
> +
> +       efi_stub_update_defaults();
> +       parse_zoffset(argv[3]);
> +
> +       setup_size = read_setup(argv[1]);
> +
> +       setup_sectors = setup_size/SECTOR_SIZE;
>
>         /* Set the default root device */
>         put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
>
> -       /* Open and stat the kernel file */
> -       fd = open(argv[2], O_RDONLY);
> -       if (fd < 0)
> -               die("Unable to open `%s': %m", argv[2]);
> -       if (fstat(fd, &sb))
> -               die("Unable to stat `%s': %m", argv[2]);
> -       sz = sb.st_size;
> -       kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
> -       if (kernel == MAP_FAILED)
> -               die("Unable to mmap '%s': %m", argv[2]);
> -       /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
> -       sys_size = (sz + 15 + 4) / 16;
> +       /* Map kernel file to memory */
> +       kernel = map_file(argv[2], &kern_file_size);
> +
>  #ifdef CONFIG_EFI_STUB
> -       /*
> -        * COFF requires minimum 32-byte alignment of sections, and
> -        * adding a signature is problematic without that alignment.
> -        */
> -       sys_size = (sys_size + 1) & ~1;
> +       /* PE specification require 512-byte minimum section file alignment */
> +       kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
> +#else
> +       /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
> +       kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
>  #endif
>
>         /* Patch the setup code with the appropriate size parameters */
> -       buf[0x1f1] = setup_sectors-1;
> -       put_unaligned_le32(sys_size, &buf[0x1f4]);
> +       buf[0x1f1] = setup_sectors - 1;
> +       put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
> +
> +       /* Update kernel_info offset. */
> +       put_unaligned_le32(kernel_info, &buf[0x268]);
> +
> +       init_size = get_unaligned_le32(&buf[0x260]);
>
> -       init_sz = get_unaligned_le32(&buf[0x260]);
>  #ifdef CONFIG_EFI_STUB
>         /*
>          * The decompression buffer will start at ImageBase. When relocating
> @@ -456,45 +635,39 @@ int main(int argc, char ** argv)
>          * For future-proofing, increase init_sz if necessary.
>          */
>
> -       if (init_sz - _end < i + _ehead) {
> -               init_sz = (i + _ehead + _end + 4095) & ~4095;
> -               put_unaligned_le32(init_sz, &buf[0x260]);
> +       if (init_size - _end < setup_size + _ehead) {
> +               init_size = round_up(setup_size + _ehead + _end, SECTION_ALIGNMENT);
> +               put_unaligned_le32(init_size, &buf[0x260]);
>         }
> -#endif
> -       update_pecoff_text(setup_sectors * 512, i + (sys_size * 16), init_sz);
>
> -       efi_stub_entry_update();
> +       total_size = update_pecoff_sections(setup_size, kern_size, init_size, _rodata);
>
> -       /* Update kernel_info offset. */
> -       put_unaligned_le32(kernel_info, &buf[0x268]);
> +       efi_stub_entry_update();
> +#else
> +       (void)init_size;
> +       total_size = setup_size + kern_size;
> +#endif
>
> -       crc = partial_crc32(buf, i, crc);
> -       if (fwrite(buf, 1, i, dest) != i)
> -               die("Writing setup failed");
> +       output = map_output_file(argv[4], total_size);
>
> -       /* Copy the kernel code */
> -       crc = partial_crc32(kernel, sz, crc);
> -       if (fwrite(kernel, 1, sz, dest) != sz)
> -               die("Writing kernel failed");
> +       memcpy(output, buf, setup_size);
> +       memcpy(output + setup_size, kernel, kern_file_size);
> +       memset(output + setup_size + kern_file_size, 0, kern_size - kern_file_size);
>
> -       /* Add padding leaving 4 bytes for the checksum */
> -       while (sz++ < (sys_size*16) - 4) {
> -               crc = partial_crc32_one('\0', crc);
> -               if (fwrite("\0", 1, 1, dest) != 1)
> -                       die("Writing padding failed");
> -       }
> +#ifdef CONFIG_EFI_STUB
> +       generate_pecoff_section_data(output, setup_size);
> +#endif
>
> -       /* Write the CRC */
> -       put_unaligned_le32(crc, buf);
> -       if (fwrite(buf, 1, 4, dest) != 4)
> -               die("Writing CRC failed");
> +       /* Calculate and write kernel checksum. */
> +       crc = partial_crc32(output, total_size - 4, crc);
> +       put_unaligned_le32(crc, &output[total_size - 4]);
>
> -       /* Catch any delayed write failures */
> -       if (fclose(dest))
> -               die("Writing image failed");
> +       /* Catch any delayed write failures. */
> +       if (munmap(output, total_size) < 0)
> +               die("Writing kernel failed");
>
> -       close(fd);
> +       unmap_file(kernel, kern_file_size);
>
> -       /* Everything is OK */
> +       /* Everything is OK. */
>         return 0;
>  }
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 680184034cb7..914106d547a6 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -392,6 +392,60 @@ static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
>                 asm("hlt");
>  }
>
> +
> +/*
> + * Manually setup memory protection attributes for each ELF section
> + * since we cannot do it properly by using PE sections.
> + */
> +static void setup_sections_memory_protection(void *image_base,
> +                                            unsigned long init_size)
> +{
> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
> +       efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> +
> +       if (!efi_dxe_table ||
> +           efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> +               efi_warn("Unable to locate EFI DXE services table\n");
> +               efi_dxe_table = NULL;
> +               return;
> +       }
> +
> +       extern char _head[], _ehead[];
> +       extern char _compressed[], _ecompressed[];
> +       extern char _text[], _etext[];
> +       extern char _rodata[], _erodata[];
> +       extern char _data[];
> +
> +       /* .setup [image_base, _head] */
> +       efi_adjust_memory_range_protection((unsigned long)image_base,
> +                                          (unsigned long)_head - (unsigned long)image_base,
> +                                          EFI_MEMORY_RO | EFI_MEMORY_XP);
> +       /* .head.text [_head, _ehead] */
> +       efi_adjust_memory_range_protection((unsigned long)_head,
> +                                          (unsigned long)_ehead - (unsigned long)_head,
> +                                          EFI_MEMORY_RO);
> +       /* .rodata..compressed [_compressed, _ecompressed] */
> +       efi_adjust_memory_range_protection((unsigned long)_compressed,
> +                                          (unsigned long)_ecompressed - (unsigned long)_compressed,
> +                                          EFI_MEMORY_RO | EFI_MEMORY_XP);
> +       /* .text [_text, _etext] */
> +       efi_adjust_memory_range_protection((unsigned long)_text,
> +                                          (unsigned long)_etext - (unsigned long)_text,
> +                                          EFI_MEMORY_RO);
> +       /* .rodata [_rodata, _erodata] */
> +       efi_adjust_memory_range_protection((unsigned long)_rodata,
> +                                          (unsigned long)_erodata - (unsigned long)_rodata,
> +                                          EFI_MEMORY_RO | EFI_MEMORY_XP);
> +       /* .data, .bss [_data, image_base + init_size] */
> +       efi_adjust_memory_range_protection((unsigned long)_data,
> +                                          (unsigned long)image_base + init_size - (unsigned long)_rodata,
> +                                          EFI_MEMORY_XP);
> +#else
> +       (void)image_base;
> +       (void)init_size;
> +#endif
> +}
> +
>  void __noreturn efi_stub_entry(efi_handle_t handle,
>                                efi_system_table_t *sys_table_arg,
>                                struct boot_params *boot_params);
> @@ -438,10 +492,15 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
>
>         hdr = &boot_params->hdr;
>
> -       /* Copy the setup header from the second sector to boot_params */
> -       memcpy(&hdr->jump, image_base + 512,
> +       /*
> +        * Copy the setup header from the second sector
> +        * (mapped to image_base + 0x1000) to boot_params
> +        */
> +       memcpy(&hdr->jump, image_base + 0x1000,
>                sizeof(struct setup_header) - offsetof(struct setup_header, jump));
>
> +       setup_sections_memory_protection(image_base, hdr->init_size);
> +
>         /*
>          * Fill out some of the header fields ourselves because the
>          * EFI firmware loader doesn't load the first sector.
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 15/16] efi/libstub: Add memory attribute protocol definitions
  2022-09-06 10:41 ` [PATCH 15/16] efi/libstub: Add memory attribute protocol definitions Evgeniy Baskov
@ 2022-10-19  7:39   ` Ard Biesheuvel
  0 siblings, 0 replies; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:39 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> EFI_MEMORY_ATTRIBUTE_PROTOCOL servers as a better alternative to
> DXE services for setting memory attributes in EFI Boot Services
> environment. This protocol is better since it is a part of UEFI
> specification itself and not UEFI PI specification like DXE
> services.
>
> Add EFI_MEMORY_ATTRIBUTE_PROTOCOL definitions.
> Support mixed mode properly for its calls.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

Acked-by: Ard Biesheuvel <ardb@kernel.org>

> ---
>  arch/x86/include/asm/efi.h             |  7 +++++++
>  drivers/firmware/efi/libstub/efistub.h | 22 ++++++++++++++++++++++
>  include/linux/efi.h                    |  1 +
>  3 files changed, 30 insertions(+)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 233ae6986d6f..522ff2e443b3 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -325,6 +325,13 @@ static inline u32 efi64_convert_status(efi_status_t status)
>  #define __efi64_argmap_set_memory_space_attributes(phys, size, flags) \
>         (__efi64_split(phys), __efi64_split(size), __efi64_split(flags))
>
> +/* Memory Attribute Protocol */
> +#define __efi64_argmap_set_memory_attributes(protocol, phys, size, flags) \
> +       ((protocol), __efi64_split(phys), __efi64_split(size), __efi64_split(flags))
> +
> +#define __efi64_argmap_clear_memory_attributes(protocol, phys, size, flags) \
> +       ((protocol), __efi64_split(phys), __efi64_split(size), __efi64_split(flags))
> +
>  /*
>   * The macros below handle the plumbing for the argument mapping. To add a
>   * mapping for a specific EFI method, simply define a macro
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index cdd1bb50c786..87973f104731 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -39,6 +39,9 @@ extern const efi_system_table_t *efi_system_table;
>  typedef union efi_dxe_services_table efi_dxe_services_table_t;
>  extern const efi_dxe_services_table_t *efi_dxe_table;
>
> +typedef union efi_memory_attribute_protocol efi_memory_attribute_protocol_t;
> +extern efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
> +
>  efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
>                                    efi_system_table_t *sys_table_arg);
>
> @@ -403,6 +406,25 @@ union efi_dxe_services_table {
>         } mixed_mode;
>  };
>
> +union  efi_memory_attribute_protocol {
> +       struct {
> +               void *get_memory_attributes;
> +               efi_status_t (__efiapi *set_memory_attributes)(efi_memory_attribute_protocol_t *,
> +                                                               efi_physical_addr_t,
> +                                                               u64,
> +                                                               u64);
> +               efi_status_t (__efiapi *clear_memory_attributes)(efi_memory_attribute_protocol_t *,
> +                                                                 efi_physical_addr_t,
> +                                                                 u64,
> +                                                                 u64);
> +       };
> +       struct {
> +               u32 get_memory_attributes;
> +               u32 set_memory_attributes;
> +               u32 clear_memory_attributes;
> +       } mixed_mode;
> +};
> +
>  typedef union efi_uga_draw_protocol efi_uga_draw_protocol_t;
>
>  union efi_uga_draw_protocol {
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index d2b84c2fec39..d32368a32285 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -386,6 +386,7 @@ void efi_native_runtime_setup(void);
>  #define EFI_LOAD_FILE2_PROTOCOL_GUID           EFI_GUID(0x4006c0c1, 0xfcb3, 0x403e,  0x99, 0x6d, 0x4a, 0x6c, 0x87, 0x24, 0xe0, 0x6d)
>  #define EFI_RT_PROPERTIES_TABLE_GUID           EFI_GUID(0xeb66918a, 0x7eef, 0x402a,  0x84, 0x2e, 0x93, 0x1d, 0x21, 0xc3, 0x8a, 0xe9)
>  #define EFI_DXE_SERVICES_TABLE_GUID            EFI_GUID(0x05ad34ba, 0x6f02, 0x4214,  0x95, 0x2e, 0x4d, 0xa0, 0x39, 0x8e, 0x2b, 0xb9)
> +#define EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID     EFI_GUID(0xf4560cf6, 0x40ec, 0x4b4a,  0xa1, 0x92, 0xbf, 0x1d, 0x57, 0xd0, 0xb1, 0x89)
>
>  #define EFI_IMAGE_SECURITY_DATABASE_GUID       EFI_GUID(0xd719b2cb, 0x3d3a, 0x4596,  0xa3, 0xbc, 0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f)
>  #define EFI_SHIM_LOCK_GUID                     EFI_GUID(0x605dab50, 0xe046, 0x4300,  0xab, 0xb6, 0x3d, 0xd8, 0x10, 0xdd, 0x8b, 0x23)
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 16/16] efi/libstub: Use memory attribute protocol
  2022-09-06 10:41 ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Evgeniy Baskov
  2022-10-18 20:51   ` [PATCH] efi/libstub: make memory protection warnings include newlines Peter Jones
@ 2022-10-19  7:42   ` Ard Biesheuvel
  2022-10-20 13:13     ` Evgeniy Baskov
  1 sibling, 1 reply; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:42 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>
> Add EFI_MEMORY_ATTRIBUTE_PROTOCOL as preferred alternative to DXE
> services for changing memory attributes in the EFISTUB.
>
> Use DXE services only as a fallback in case aforementioned protocol
> is not supported by UEFI implementation.
>
> Move DXE services initialization code closer to the place they are used
> to match EFI_MEMORY_ATTRIBUTE_PROTOCOL initialization code.
>
> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> ---
>  drivers/firmware/efi/libstub/mem.c      | 166 ++++++++++++++++++------
>  drivers/firmware/efi/libstub/x86-stub.c |  17 ---
>  2 files changed, 127 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
> index 89ebc8ad2c22..8c8782993b30 100644
> --- a/drivers/firmware/efi/libstub/mem.c
> +++ b/drivers/firmware/efi/libstub/mem.c
> @@ -5,6 +5,9 @@
>
>  #include "efistub.h"
>
> +const efi_dxe_services_table_t *efi_dxe_table;
> +efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
> +
>  static inline bool mmap_has_headroom(unsigned long buff_size,
>                                      unsigned long map_size,
>                                      unsigned long desc_size)
> @@ -131,50 +134,32 @@ void efi_free(unsigned long size, unsigned long addr)
>         efi_bs_call(free_pages, addr, nr_pages);
>  }
>
> -/**
> - * efi_adjust_memory_range_protection() - change memory range protection attributes
> - * @start:     memory range start address
> - * @size:      memory range size
> - *
> - * Actual memory range for which memory attributes are modified is
> - * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
> - * that includes [start, start + size].
> - *
> - * @return: status code
> - */
> -efi_status_t efi_adjust_memory_range_protection(unsigned long start,
> -                                               unsigned long size,
> -                                               unsigned long attributes)
> +static void retrive_dxe_table(void)

retrieve

> +{
> +       efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> +       if (efi_dxe_table &&
> +           efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> +               efi_warn("Ignoring DXE services table: invalid signature\n");
> +               efi_dxe_table = NULL;
> +       }
> +}
> +
> +static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t rounded_start,
> +                                         efi_physical_addr_t rounded_end,
> +                                         unsigned long attributes)
>  {
>         efi_status_t status;
>         efi_gcd_memory_space_desc_t desc;
> -       efi_physical_addr_t end, next;
> -       efi_physical_addr_t rounded_start, rounded_end;
> +       efi_physical_addr_t end, next, start;
>         efi_physical_addr_t unprotect_start, unprotect_size;
>         int has_system_memory = 0;
>
> -       if (efi_dxe_table == NULL)
> -               return EFI_UNSUPPORTED;
> +       if (!efi_dxe_table) {
> +               retrive_dxe_table();

Same here

>
> -       /*
> -        * This function should not be used to modify attributes
> -        * other than writable/executable.
> -        */
> -
> -       if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
> -               return EFI_INVALID_PARAMETER;
> -
> -       /*
> -        * Disallow simultaniously executable and writable memory
> -        * to inforce W^X policy if direct extraction code is enabled.
> -        */
> -
> -       if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
> -           IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
> -               return EFI_INVALID_PARAMETER;
> -
> -       rounded_start = rounddown(start, EFI_PAGE_SIZE);
> -       rounded_end = roundup(start + size, EFI_PAGE_SIZE);
> +               if (!efi_dxe_table)
> +                       return EFI_UNSUPPORTED;
> +       }
>
>         /*
>          * Don't modify memory region attributes, they are
> @@ -182,14 +167,15 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>          * encounter firmware bugs.
>          */
>
> -       for (end = start + size; start < end; start = next) {
> +
> +       for (start = rounded_start, end = rounded_end; start < end; start = next) {
>
>                 status = efi_dxe_call(get_memory_space_descriptor,
>                                       start, &desc);
>
>                 if (status != EFI_SUCCESS) {
>                         efi_warn("Unable to get memory descriptor at %lx\n",
> -                                start);
> +                                (unsigned long)start);
>                         return status;
>                 }
>
> @@ -231,3 +217,105 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>
>         return EFI_SUCCESS;
>  }
> +
> +static void retrive_memory_attributes_proto(void)

and here

> +{
> +       efi_status_t status;
> +       efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
> +
> +       status = efi_bs_call(locate_protocol, &guid, NULL,
> +                            (void **)&efi_mem_attrib_proto);
> +       if (status != EFI_SUCCESS)
> +               efi_mem_attrib_proto = NULL;
> +}
> +
> +/**
> + * efi_adjust_memory_range_protection() - change memory range protection attributes
> + * @start:     memory range start address
> + * @size:      memory range size
> + *
> + * Actual memory range for which memory attributes are modified is
> + * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
> + * that includes [start, start + size].
> + *
> + * This function first attempts to use EFI_MEMORY_ATTRIBUTE_PROTOCOL,
> + * that is a part of UEFI Specification since version 2.10.
> + * If the protocol is unavailable it falls back to DXE services functions.
> + *
> + * @return: status code
> + */
> +efi_status_t efi_adjust_memory_range_protection(unsigned long start,
> +                                               unsigned long size,
> +                                               unsigned long attributes)
> +{
> +       efi_status_t status;
> +       efi_physical_addr_t rounded_start, rounded_end;
> +       unsigned long attr_clear;
> +
> +       /*
> +        * This function should not be used to modify attributes
> +        * other than writable/executable.
> +        */
> +
> +       if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
> +               return EFI_INVALID_PARAMETER;
> +
> +       /*
> +        * Disallow simultaniously executable and writable memory

simultaneously

> +        * to inforce W^X policy if direct extraction code is enabled.

enforce

> +        */
> +
> +       if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
> +           IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))

efi_adjust_memory_range_protection() is a generic routine, but here it
depends on a x86-specific Kconfig symbol. Is that really necessary?

> +               return EFI_INVALID_PARAMETER;
> +
> +       rounded_start = rounddown(start, EFI_PAGE_SIZE);
> +       rounded_end = roundup(start + size, EFI_PAGE_SIZE);
> +
> +       if (!efi_mem_attrib_proto) {
> +               retrive_memory_attributes_proto();

retrieve

> +
> +               /* Fall back to DXE services if unsupported */
> +               if (!efi_mem_attrib_proto) {
> +                       return adjust_mem_attrib_dxe(rounded_start,
> +                                                    rounded_end,
> +                                                    attributes);
> +               }
> +       }
> +
> +       /*
> +        * Unlike DXE services functions, EFI_MEMORY_ATTRIBUTE_PROTOCOL
> +        * does not clear unset protection bit, so it needs to be cleared
> +        * explcitly
> +        */
> +
> +       attr_clear = ~attributes &
> +                    (EFI_MEMORY_RO | EFI_MEMORY_XP | EFI_MEMORY_RP);
> +
> +       status = efi_call_proto(efi_mem_attrib_proto,
> +                               clear_memory_attributes,
> +                               rounded_start,
> +                               rounded_end - rounded_start,
> +                               attr_clear);
> +       if (status != EFI_SUCCESS) {
> +               efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",

Need \n at the end here

> +                        (unsigned long)rounded_start,
> +                        (unsigned long)rounded_end,
> +                        status);
> +               return status;
> +       }
> +
> +       status = efi_call_proto(efi_mem_attrib_proto,
> +                               set_memory_attributes,
> +                               rounded_start,
> +                               rounded_end - rounded_start,
> +                               attributes);
> +       if (status != EFI_SUCCESS) {
> +               efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",

and here

> +                        (unsigned long)rounded_start,
> +                        (unsigned long)rounded_end,
> +                        status);
> +       }
> +
> +       return status;
> +}
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 914106d547a6..dd1e1e663072 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -22,7 +22,6 @@
>  #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>
>  const efi_system_table_t *efi_system_table;
> -const efi_dxe_services_table_t *efi_dxe_table;
>  extern u32 image_offset;
>  static efi_loaded_image_t *image = NULL;
>
> @@ -401,15 +400,6 @@ static void setup_sections_memory_protection(void *image_base,
>                                              unsigned long init_size)
>  {
>  #ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
> -       efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> -
> -       if (!efi_dxe_table ||
> -           efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> -               efi_warn("Unable to locate EFI DXE services table\n");
> -               efi_dxe_table = NULL;
> -               return;
> -       }
> -
>         extern char _head[], _ehead[];
>         extern char _compressed[], _ecompressed[];
>         extern char _text[], _etext[];
> @@ -791,13 +781,6 @@ unsigned long efi_main(efi_handle_t handle,
>         if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
>                 efi_exit(handle, EFI_INVALID_PARAMETER);
>
> -       efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> -       if (efi_dxe_table &&
> -           efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> -               efi_warn("Ignoring DXE services table: invalid signature\n");
> -               efi_dxe_table = NULL;
> -       }
> -
>  #ifndef CONFIG_EFI_STUB_EXTRACT_DIRECT
>         /*
>          * If the kernel isn't already loaded at a suitable address,
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] efi/libstub: make memory protection warnings include newlines.
  2022-10-18 20:51   ` [PATCH] efi/libstub: make memory protection warnings include newlines Peter Jones
@ 2022-10-19  7:44     ` Ard Biesheuvel
  0 siblings, 0 replies; 51+ messages in thread
From: Ard Biesheuvel @ 2022-10-19  7:44 UTC (permalink / raw)
  To: Peter Jones
  Cc: Evgeniy Baskov, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

On Tue, 18 Oct 2022 at 22:51, Peter Jones <pjones@redhat.com> wrote:
>
> efi_warn() doesn't put newlines on messages, and that makes reading
> warnings without newlines hard to do.
>
> Signed-off-by: Peter Jones <pjones@redhat.com>

OK, so this applies on top of Evgeniy's series, right? Do we need a
version for 6.1-rc1 ?

> ---
>  drivers/firmware/efi/libstub/mem.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
> index 4d6c7f4fb7e..1b874096109 100644
> --- a/drivers/firmware/efi/libstub/mem.c
> +++ b/drivers/firmware/efi/libstub/mem.c
> @@ -293,7 +293,7 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>                                 rounded_end - rounded_start,
>                                 attr_clear);
>         if (status != EFI_SUCCESS) {
> -               efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",
> +               efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx\n",
>                          (unsigned long)rounded_start,
>                          (unsigned long)rounded_end,
>                          status);
> @@ -306,7 +306,7 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>                                 rounded_end - rounded_start,
>                                 attributes);
>         if (status != EFI_SUCCESS) {
> -               efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",
> +               efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx\n",
>                          (unsigned long)rounded_start,
>                          (unsigned long)rounded_end,
>                          status);
> --
> 2.37.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline
  2022-09-06 10:41 ` [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
  2022-10-19  7:06   ` Ard Biesheuvel
@ 2022-10-19  7:44   ` Andrew Cooper
  2022-10-20 13:25     ` Evgeniy Baskov
  1 sibling, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2022-10-19  7:44 UTC (permalink / raw)
  To: Evgeniy Baskov, Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening, Andrew Cooper

On 06/09/2022 11:41, Evgeniy Baskov wrote:
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index d33f060900d2..5273367283b7 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -619,9 +619,8 @@ SYM_CODE_START(trampoline_32bit_src)
>  	/* Set up new stack */
>  	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
>  
> -	/* Disable paging */
> -	movl	%cr0, %eax
> -	btrl	$X86_CR0_PG_BIT, %eax
> +	/* Disable paging and setup CR0 */
> +	movl	$(CR0_STATE & ~X86_CR0_PG), %eax

Why here?  WP is ignored when PG is disabled.

~Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/16] x86/boot: Setup memory protection for bzImage code
  2022-09-06 10:41 ` [PATCH 06/16] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
  2022-10-19  7:17   ` Ard Biesheuvel
@ 2022-10-19  7:57   ` Andrew Cooper
  2022-10-20 13:30     ` Evgeniy Baskov
  1 sibling, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2022-10-19  7:57 UTC (permalink / raw)
  To: Evgeniy Baskov, Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening, Andrew Cooper

On 06/09/2022 11:41, Evgeniy Baskov wrote:
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 5273367283b7..889ca7176aa7 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -602,6 +603,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>  	jmp	*%rax
>  SYM_FUNC_END(.Lrelocated)
>  
> +SYM_FUNC_START_LOCAL_NOALIGN(startup32_enable_nx_if_supported)
> +	pushq	%rbx
> +
> +	leaq	has_nx(%rip), %rcx
> +
> +	mov	$0x80000001, %eax
> +	cpuid
> +	btl	$20, %edx

btl $(X86_FEATURE_NX & 31), %edx

But also need to check for the availability of the extended leaf in the
first place.

> +	jnc	.Lnonx
> +
> +	movl	$1, (%rcx)

Your pointer has been clobbered with some feature flags.

movl $1, has_nx(%rip)

will work fine without needing the intermediary lea.

~Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/16] x86_64: Improvements at compressed kernel stage
  2022-10-18 21:04 ` [PATCH 00/16] x86_64: Improvements at compressed kernel stage Peter Jones
@ 2022-10-20 11:05   ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 11:05 UTC (permalink / raw)
  To: Peter Jones
  Cc: Ard Biesheuvel, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 00:04, Peter Jones wrote:
> On Tue, Sep 06, 2022 at 01:41:04PM +0300, Evgeniy Baskov wrote:
>> This patchset is aimed
>> * to improve UEFI compatibility of compressed kernel code for x86_64
>> * to setup proper memory access attributes for code and rodata 
>> sections
>> * to implement W^X protection policy throughout the whole execution
>>   of compressed kernel for EFISTUB code path.
> 
> Hi Evgeniy,
> 
> I've tested this set of patches with the Mu firmware that supports the 
> W^X
> feature and a modified bootloader to also support it, and also with an
> existing firmware and the grub2 build in fedora 36.  On the firmware
> without W^X support, this all works for me.  With W^X support, it works
> so long as I use CONFIG_EFI_STUB_EXTRACT_DIRECT, though I still need
> some changes in grub's loader.  IMO that's a big step forward.
> 
> I can't currently make it work with W^X enabled but without direct
> extraction, and I'm still investigating why not, but I figured I'd give
> you a heads up.

Hi Peter,

Thank you for testing!

Without direct extraction enabled this patch set does not implement 
total W^X
and needs to allocate RWX memory regions, since it should go through 
common
code path that relocates kernel. So if the firmware does not allow 
allocating
RWX regions, it might prevent the kernel from booting, I think. I will 
look
into that problem soon and let you know it I find anything.

Thanks,
Evgeniy Baskov

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/16] x86/boot: Align vmlinuz sections on page size
  2022-10-19  7:01   ` Ard Biesheuvel
@ 2022-10-20 11:13     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 11:13 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:01, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> To protect sections on page table level each section
>> needs to be aligned on page size (4KB).
>> 
>> Set sections alignment in linker script.
>> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>> ---
>>  arch/x86/boot/compressed/vmlinux.lds.S | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
>> b/arch/x86/boot/compressed/vmlinux.lds.S
>> index 112b2375d021..6be90f1a1198 100644
>> --- a/arch/x86/boot/compressed/vmlinux.lds.S
>> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
>> @@ -27,21 +27,27 @@ SECTIONS
>>                 HEAD_TEXT
>>                 _ehead = . ;
>>         }
>> +       . = ALIGN(PAGE_SIZE);
>>         .rodata..compressed : {
>> +               _compressed = .;
> 
> Why are you adding these?

It is used for address compressed kernel blob during memory protection 
setup.
Although it can be addressed via different symbols, I though that 
addressing
sections data in a common way (through linker generated symbols) would 
be better.
I can remove or mention the change in commit message (for now I will do 
the latter).

> 
>>                 *(.rodata..compressed)
>> +               _ecompressed = .;
>>         }
>> +       . = ALIGN(PAGE_SIZE);
> 
> On other EFI architectures, we only distinguish between R-X and RW-
> regions, and alignment between .rodata and .text is unnecessary. Do we
> really need to deviate from that here?

I though that leaving a huge compressed kernel blob executable is
undesirable, so I decided to split it out. I can make it either RW- or 
R-X
if it would be more acceptable.

> 
> 
>>         .text : {
>>                 _text = .;      /* Text */
>>                 *(.text)
>>                 *(.text.*)
>>                 _etext = . ;
>>         }
>> +       . = ALIGN(PAGE_SIZE);
>>         .rodata : {
>>                 _rodata = . ;
>>                 *(.rodata)       /* read-only data */
>>                 *(.rodata.*)
>>                 _erodata = . ;
>>         }
>> +       . = ALIGN(PAGE_SIZE);
>>         .data : {
>>                 _data = . ;
>>                 *(.data)
>> --
>> 2.35.1
>> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB
  2022-10-19  7:04   ` Ard Biesheuvel
@ 2022-10-20 11:15     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 11:15 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:04, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Avoid creating sections with maximal privileges to prepare for W^X
> 
> privileges
> 
>> implementation. Align sections on page size (4KB) to allow protecting
>> them in page table.
>> 
> 
> in the page tables.
> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>> ---
>>  arch/x86/kernel/vmlinux.lds.S | 15 ++++++++-------
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>> 
>> diff --git a/arch/x86/kernel/vmlinux.lds.S 
>> b/arch/x86/kernel/vmlinux.lds.S
>> index 15f29053cec4..6587e0201b50 100644
>> --- a/arch/x86/kernel/vmlinux.lds.S
>> +++ b/arch/x86/kernel/vmlinux.lds.S
>> @@ -102,12 +102,11 @@ jiffies = jiffies_64;
>>  PHDRS {
>>         text PT_LOAD FLAGS(5);          /* R_E */
>>         data PT_LOAD FLAGS(6);          /* RW_ */
>> -#ifdef CONFIG_X86_64
>> -#ifdef CONFIG_SMP
>> +#if defined(CONFIG_X86_64) && defined(CONFIG_SMP)
>>         percpu PT_LOAD FLAGS(6);        /* RW_ */
>>  #endif
>> -       init PT_LOAD FLAGS(7);          /* RWE */
>> -#endif
>> +       inittext PT_LOAD FLAGS(5);      /* R_E */
>> +       init PT_LOAD FLAGS(6);          /* RW_ */
> 
> Please explain in the commit log how this change affects X86_32
> 
>>         note PT_NOTE FLAGS(0);          /* ___ */
>>  }
>> 
>> @@ -226,9 +225,10 @@ SECTIONS
>>  #endif
>> 
>>         INIT_TEXT_SECTION(PAGE_SIZE)
>> -#ifdef CONFIG_X86_64
>> -       :init
>> -#endif
>> +       :inittext
>> +
>> +       . = ALIGN(PAGE_SIZE);
>> +
>> 
>>         /*
>>          * Section for code used exclusively before alternatives are 
>> run. All
>> @@ -240,6 +240,7 @@ SECTIONS
>>         .altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
>>                 *(.altinstr_aux)
>>         }
>> +       :init
>> 
>>         INIT_DATA_SECTION(16)
>> 
>> --
>> 2.35.1
>> 

Thanks for pointing you, I'll fix all mentioned nitpicks before sending
this again.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline
  2022-10-19  7:06   ` Ard Biesheuvel
@ 2022-10-20 11:23     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 11:23 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:06, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Ensure WP bit to be set to prevent boot code from writing to
>> non-writable memory pages.
>> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>> ---
>>  arch/x86/boot/compressed/head_64.S | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/boot/compressed/head_64.S 
>> b/arch/x86/boot/compressed/head_64.S
>> index d33f060900d2..5273367283b7 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -619,9 +619,8 @@ SYM_CODE_START(trampoline_32bit_src)
>>         /* Set up new stack */
>>         leal    TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
>> 
>> -       /* Disable paging */
>> -       movl    %cr0, %eax
>> -       btrl    $X86_CR0_PG_BIT, %eax
> 
> Why do we no longer care about CR0's prior value?

I think we don't need to preserve any of those flags
(we nether use floating point instructions nor call EFI functions
with this cr0 value) and it's better to set cr0 to the well-known
state. CR0 is also being set to the constant value while switching
from protected to long mode, so it is already done in one of the
code paths.

If I am missing something, let me know,
I will change it to only set WP and clear PG.

> 
>> +       /* Disable paging and setup CR0 */
>> +       movl    $(CR0_STATE & ~X86_CR0_PG), %eax
>>         movl    %eax, %cr0
>> 
>>         /* Check what paging mode we want to be in after the 
>> trampoline */
>> --
>> 2.35.1
>> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 04/16] x86/boot: Increase boot page table size
  2022-10-19  7:08   ` Ard Biesheuvel
@ 2022-10-20 11:29     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 11:29 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:08, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:41, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Previous calculations ignored pages implicitly mapped by ACPI code,
> 
> I'm not sure I understand what this means. Which ACPI code and which
> pages does it map?

Code from boot/compressed/{acpi.c,efi.c} that touches ACPI/EFI tables
is currently mapping pages that contain the tables implicitly by
causing page faults. And those mappings may require additional
memory for page tables. It became more apparent when I were removing
memory mapping from page fault handler.

> 
>> so theoretical upper limit is higher than was set.
...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping
  2022-10-19  7:11   ` Ard Biesheuvel
@ 2022-10-20 11:30     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 11:30 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:11, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Current identity mapping code only supports 2M and 1G pages.
>> 4KB pages are desirable for better memory protection granularity
>> in compressed kernel code.
>> 
>> Change identity mapping code to support 4KB pages and
>> memory remapping with different attributes.
>> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
> 
> This looks reasonable to me but someone on team-x86 will need to review 
> this.
> 
> One nit below

Thanks!

> 
>> ---
>>  arch/x86/include/asm/init.h |   1 +
>>  arch/x86/mm/ident_map.c     | 186 
>> +++++++++++++++++++++++++++++-------
>>  2 files changed, 155 insertions(+), 32 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>> index 5f1d3c421f68..a8277ee82c51 100644
>> --- a/arch/x86/include/asm/init.h
>> +++ b/arch/x86/include/asm/init.h
>> @@ -8,6 +8,7 @@ struct x86_mapping_info {
>>         unsigned long page_flag;         /* page flag for PMD or PUD 
>> entry */
>>         unsigned long offset;            /* ident mapping offset */
>>         bool direct_gbpages;             /* PUD level 1GB page support 
>> */
>> +       bool allow_4kpages;              /* Allow more granular 
>> mappings with 4K pages */
>>         unsigned long kernpg_flag;       /* kernel pagetable flag 
>> override */
>>  };
>> 
>> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
>> index 968d7005f4a7..ad455d4ef595 100644
>> --- a/arch/x86/mm/ident_map.c
>> +++ b/arch/x86/mm/ident_map.c
>> @@ -2,26 +2,130 @@
>>  /*
>>   * Helper routines for building identity mapping page tables. This is
>>   * included by both the compressed kernel and the regular kernel.
>> + *
> 
> Drop this change
> 
>>   */
>> 
>> -static void ident_pmd_init(struct x86_mapping_info *info, pmd_t 
>> *pmd_page,
>> -                          unsigned long addr, unsigned long end)
>> +static void ident_pte_init(struct x86_mapping_info *info, pte_t 
>> *pte_page,
>> +                          unsigned long addr, unsigned long end,
>> +                          unsigned long flags)
>>  {
>> -       addr &= PMD_MASK;
>> -       for (; addr < end; addr += PMD_SIZE) {
>> +       addr &= PAGE_MASK;
>> +       for (; addr < end; addr += PAGE_SIZE) {
>> +               pte_t *pte = pte_page + pte_index(addr);
>> +
>> +               set_pte(pte, __pte((addr - info->offset) | flags));
>> +       }
>> +}
>> +
>> +pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
>> +                            pmd_t *pmdp, unsigned long page_addr)
>> +{
>> +       unsigned long pmd_addr, page_flags;
>> +       pte_t *pte;
>> +
>> +       pte = (pte_t *)info->alloc_pgt_page(info->context);
>> +       if (!pte)
>> +               return NULL;
>> +
>> +       pmd_addr = page_addr & PMD_MASK;
>> +
>> +       /* Not a large page - clear PSE flag */
>> +       page_flags = pmd_flags(*pmdp) & ~_PSE;
>> +       ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE, 
>> page_flags);
>> +
>> +       return pte;
>> +}
>> +
>> +static int ident_pmd_init(struct x86_mapping_info *info, pmd_t 
>> *pmd_page,
>> +                         unsigned long addr, unsigned long end,
>> +                         unsigned long flags)
>> +{
>> +       unsigned long next;
>> +       bool new_table = 0;
>> +
>> +       for (; addr < end; addr = next) {
>>                 pmd_t *pmd = pmd_page + pmd_index(addr);
>> +               pte_t *pte;
>> 
>> -               if (pmd_present(*pmd))
>> +               next = (addr & PMD_MASK) + PMD_SIZE;
>> +               if (next > end)
>> +                       next = end;
>> +
>> +               /*
>> +                * Use 2M pages if 4k pages are not allowed or
>> +                * we are not mapping extra, i.e. address and size are 
>> aligned.
>> +                */
>> +
>> +               if (!info->allow_4kpages ||
>> +                   (!(addr & ~PMD_MASK) && next == addr + PMD_SIZE)) 
>> {
>> +
>> +                       pmd_t pmdval;
>> +
>> +                       addr &= PMD_MASK;
>> +                       pmdval = __pmd((addr - info->offset) | flags | 
>> _PSE);
>> +                       set_pmd(pmd, pmdval);
>>                         continue;
>> +               }
>> +
>> +               /*
>> +                * If currently mapped page is large, we need to split 
>> it.
>> +                * The case when we can remap 2M page to 2M page
>> +                * with different flags is already covered above.
>> +                *
>> +                * If there's nothing mapped to desired address,
>> +                * we need to allocate new page table.
>> +                */
>> 
>> -               set_pmd(pmd, __pmd((addr - info->offset) | 
>> info->page_flag));
>> +               if (pmd_large(*pmd)) {
>> +                       pte = ident_split_large_pmd(info, pmd, addr);
>> +                       new_table = 1;
>> +               } else if (!pmd_present(*pmd)) {
>> +                       pte = (pte_t 
>> *)info->alloc_pgt_page(info->context);
>> +                       new_table = 1;
>> +               } else {
>> +                       pte = pte_offset_kernel(pmd, 0);
>> +                       new_table = 0;
>> +               }
>> +
>> +               if (!pte)
>> +                       return -ENOMEM;
>> +
>> +               ident_pte_init(info, pte, addr, next, flags);
>> +
>> +               if (new_table)
>> +                       set_pmd(pmd, __pmd(__pa(pte) | 
>> info->kernpg_flag));
>>         }
>> +
>> +       return 0;
>>  }
>> 
>> +
>> +pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
>> +                            pud_t *pudp, unsigned long page_addr)
>> +{
>> +       unsigned long pud_addr, page_flags;
>> +       pmd_t *pmd;
>> +
>> +       pmd = (pmd_t *)info->alloc_pgt_page(info->context);
>> +       if (!pmd)
>> +               return NULL;
>> +
>> +       pud_addr = page_addr & PUD_MASK;
>> +
>> +       /* Not a large page - clear PSE flag */
>> +       page_flags = pud_flags(*pudp) & ~_PSE;
>> +       ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE, 
>> page_flags);
>> +
>> +       return pmd;
>> +}
>> +
>> +
>>  static int ident_pud_init(struct x86_mapping_info *info, pud_t 
>> *pud_page,
>>                           unsigned long addr, unsigned long end)
>>  {
>>         unsigned long next;
>> +       bool new_table = 0;
>> +       int result;
>> 
>>         for (; addr < end; addr = next) {
>>                 pud_t *pud = pud_page + pud_index(addr);
>> @@ -31,28 +135,39 @@ static int ident_pud_init(struct x86_mapping_info 
>> *info, pud_t *pud_page,
>>                 if (next > end)
>>                         next = end;
>> 
>> +               /* Use 1G pages only if forced, even if they are 
>> supported. */
>>                 if (info->direct_gbpages) {
>>                         pud_t pudval;
>> -
>> -                       if (pud_present(*pud))
>> -                               continue;
>> +                       unsigned long flags;
>> 
>>                         addr &= PUD_MASK;
>> -                       pudval = __pud((addr - info->offset) | 
>> info->page_flag);
>> +                       flags = info->page_flag | _PSE;
>> +                       pudval = __pud((addr - info->offset) | flags);
>> +
>>                         set_pud(pud, pudval);
>>                         continue;
>>                 }
>> 
>> -               if (pud_present(*pud)) {
>> +               if (pud_large(*pud)) {
>> +                       pmd = ident_split_large_pud(info, pud, addr);
>> +                       new_table = 1;
>> +               } else if (!pud_present(*pud)) {
>> +                       pmd = (pmd_t 
>> *)info->alloc_pgt_page(info->context);
>> +                       new_table = 1;
>> +               } else {
>>                         pmd = pmd_offset(pud, 0);
>> -                       ident_pmd_init(info, pmd, addr, next);
>> -                       continue;
>> +                       new_table = 0;
>>                 }
>> -               pmd = (pmd_t *)info->alloc_pgt_page(info->context);
>> +
>>                 if (!pmd)
>>                         return -ENOMEM;
>> -               ident_pmd_init(info, pmd, addr, next);
>> -               set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
>> +
>> +               result = ident_pmd_init(info, pmd, addr, next, 
>> info->page_flag);
>> +               if (result)
>> +                       return result;
>> +
>> +               if (new_table)
>> +                       set_pud(pud, __pud(__pa(pmd) | 
>> info->kernpg_flag));
>>         }
>> 
>>         return 0;
>> @@ -63,6 +178,7 @@ static int ident_p4d_init(struct x86_mapping_info 
>> *info, p4d_t *p4d_page,
>>  {
>>         unsigned long next;
>>         int result;
>> +       bool new_table = 0;
>> 
>>         for (; addr < end; addr = next) {
>>                 p4d_t *p4d = p4d_page + p4d_index(addr);
>> @@ -72,15 +188,14 @@ static int ident_p4d_init(struct x86_mapping_info 
>> *info, p4d_t *p4d_page,
>>                 if (next > end)
>>                         next = end;
>> 
>> -               if (p4d_present(*p4d)) {
>> +               if (!p4d_present(*p4d)) {
>> +                       pud = (pud_t 
>> *)info->alloc_pgt_page(info->context);
>> +                       new_table = 1;
>> +               } else {
>>                         pud = pud_offset(p4d, 0);
>> -                       result = ident_pud_init(info, pud, addr, 
>> next);
>> -                       if (result)
>> -                               return result;
>> -
>> -                       continue;
>> +                       new_table = 0;
>>                 }
>> -               pud = (pud_t *)info->alloc_pgt_page(info->context);
>> +
>>                 if (!pud)
>>                         return -ENOMEM;
>> 
>> @@ -88,19 +203,22 @@ static int ident_p4d_init(struct x86_mapping_info 
>> *info, p4d_t *p4d_page,
>>                 if (result)
>>                         return result;
>> 
>> -               set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
>> +               if (new_table)
>> +                       set_p4d(p4d, __p4d(__pa(pud) | 
>> info->kernpg_flag));
>>         }
>> 
>>         return 0;
>>  }
>> 
>> -int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t 
>> *pgd_page,
>> -                             unsigned long pstart, unsigned long 
>> pend)
>> +int kernel_ident_mapping_init(struct x86_mapping_info *info,
>> +                             pgd_t *pgd_page, unsigned long pstart,
>> +                             unsigned long pend)
>>  {
>>         unsigned long addr = pstart + info->offset;
>>         unsigned long end = pend + info->offset;
>>         unsigned long next;
>>         int result;
>> +       bool new_table;
>> 
>>         /* Set the default pagetable flags if not supplied */
>>         if (!info->kernpg_flag)
>> @@ -117,20 +235,24 @@ int kernel_ident_mapping_init(struct 
>> x86_mapping_info *info, pgd_t *pgd_page,
>>                 if (next > end)
>>                         next = end;
>> 
>> -               if (pgd_present(*pgd)) {
>> +               if (!pgd_present(*pgd)) {
>> +                       p4d = (p4d_t 
>> *)info->alloc_pgt_page(info->context);
>> +                       new_table = 1;
>> +               } else {
>>                         p4d = p4d_offset(pgd, 0);
>> -                       result = ident_p4d_init(info, p4d, addr, 
>> next);
>> -                       if (result)
>> -                               return result;
>> -                       continue;
>> +                       new_table = 0;
>>                 }
>> 
>> -               p4d = (p4d_t *)info->alloc_pgt_page(info->context);
>>                 if (!p4d)
>>                         return -ENOMEM;
>> +
>>                 result = ident_p4d_init(info, p4d, addr, next);
>>                 if (result)
>>                         return result;
>> +
>> +               if (!new_table)
>> +                       continue;
>> +
>>                 if (pgtable_l5_enabled()) {
>>                         set_pgd(pgd, __pgd(__pa(p4d) | 
>> info->kernpg_flag));
>>                 } else {
>> --
>> 2.35.1
>> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/16] x86/boot: Setup memory protection for bzImage code
  2022-10-19  7:17   ` Ard Biesheuvel
@ 2022-10-20 12:07     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 12:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:17, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Use previously added code to use 4KB pages for mapping. Map compressed
>> and uncompressed kernel with appropriate memory protection attributes.
>> For compressed kernel set them up manually. For uncompressed kernel
>> used flags specified in ELF header.
>> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
...
>> 
>>  /*
>>   * Locally defined symbols should be marked hidden:
>> @@ -578,6 +578,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>         pushq   %rsi
>>         call    load_stage2_idt
>> 
>> +       call    startup32_enable_nx_if_supported
>>         /* Pass boot_params to initialize_identity_maps() */
>>         movq    (%rsp), %rdi
>>         call    initialize_identity_maps
>> @@ -602,6 +603,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>         jmp     *%rax
>>  SYM_FUNC_END(.Lrelocated)
>> 
>> +SYM_FUNC_START_LOCAL_NOALIGN(startup32_enable_nx_if_supported)
> 
> Why the startup32_ prefix for this function name?

Oh, right there is no reasons, I will remove it.
...
>>  /*
>>   * Adds the specified range to the identity mappings.
>>   */
>> -void kernel_add_identity_map(unsigned long start, unsigned long end)
>> +unsigned long kernel_add_identity_map(unsigned long start,
>> +                                     unsigned long end,
>> +                                     unsigned int flags)
>>  {
>>         int ret;
>> 
>>         /* Align boundary to 2M. */
>> -       start = round_down(start, PMD_SIZE);
>> -       end = round_up(end, PMD_SIZE);
>> +       start = round_down(start, PAGE_SIZE);
>> +       end = round_up(end, PAGE_SIZE);
>>         if (start >= end)
>> -               return;
>> +               return start;
>> +
>> +       /* Enforce W^X -- just stop booting with error on violation. 
>> */
>> +       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) &&
>> +           (flags & (MAP_EXEC | MAP_WRITE)) == (MAP_EXEC | 
>> MAP_WRITE))
>> +               error("Error: W^X violation\n");
>> +
> 
> Do we need to add a new failure mode here?

It seems reasonable to me to leave it here to avoid unintentionally 
introducing
RWX mappings. And this function can already fail on OOM situation.
I can change it to warning if failure is too harsh in this situation.
> 
>> +       bool nx = !(flags & MAP_EXEC) && has_nx;
>> +       bool ro = !(flags & MAP_WRITE);
>> +
...
>> -       kernel_add_identity_map((unsigned long)_head, (unsigned 
>> long)_end);
>> -       boot_params = rmode;
>> -       kernel_add_identity_map((unsigned long)boot_params, (unsigned 
>> long)(boot_params + 1));
>> +       extern char _head[], _ehead[];
> 
> Please move these extern declarations out of the function scope (at
> the very least)

I will move it to misc.h then, there are already some of these 
declarations present.

> 
>> +       kernel_add_identity_map((unsigned long)_head,
>> +                               (unsigned long)_ehead, MAP_EXEC | 
>> MAP_NOFLUSH);
>> +
>> +       extern char _compressed[], _ecompressed[];
>> +       kernel_add_identity_map((unsigned long)_compressed,
>> +                               (unsigned long)_ecompressed, MAP_WRITE 
>> | MAP_NOFLUSH);
>> +
>> +       extern char _text[], _etext[];
>> +       kernel_add_identity_map((unsigned long)_text,
>> +                               (unsigned long)_etext, MAP_EXEC | 
>> MAP_NOFLUSH);
>> +
>> +       extern char _rodata[], _erodata[];
>> +       kernel_add_identity_map((unsigned long)_rodata,
>> +                               (unsigned long)_erodata, MAP_NOFLUSH);
>> +
> 
> Same question as before: do we really need three different regions for
> rodata+text here?

As I already told, I think, its undesirable to leave compressed kernel 
blob
(and .rodata) executable, as it it will provide higher attack surface if 
some
control flow interception vulnerability in this code would be 
discovered,
and though I am not aware of such vulnerabilities to be present 
currently,
I think, additional security is not redundant, since it can be provided 
almost
for free.

I can merge these regions, if you think it does not worth it.

> 
...
>> +                       /*
>> +                        * Simultaneously readable and writable 
>> segments are
>> +                        * violating W^X, and should not be present in 
>> vmlinux image.
>> +                        */
>> +                       if ((phdr->p_flags & (PF_X | PF_W)) == (PF_X | 
>> PF_W))
>> +                               error("W^X violation for ELF 
>> segment");
>> +
> 
> Can we catch this at build time instead?

Thanks, thats great idea! I will implement that in tools/build.c

> 
...
>> +#else
>> +static inline unsigned long kernel_add_identity_map(unsigned long 
>> start,
>> +                                                   unsigned long end,
>> +                                                   unsigned int 
>> flags)
>> +{
>> +       (void)flags;
>> +       (void)end;
> 
> Why these (void) casts? Can we just drop them?
> 

Unused parameters used to cause warnings for me here somehow...
I will drop them.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 10/16] x86/boot: Make console interface more abstract
  2022-10-19  7:23   ` Ard Biesheuvel
@ 2022-10-20 12:10     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 12:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:23, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> To be able to extract kernel from EFI, console output functions
>> need to be replaceable by alternative implementations.
>> 
>> Make all of those functions pointers.
>> Move serial console code to separate file.
>> 
> 
> What does kernel_add_identity_map() have to do with the above? Should
> that be a separate patch?

It used to be dependent, but no longer is.
I'll split making kernel_add_identity_map() a function pointer
out into separate patch.

> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>> ---
>>  arch/x86/boot/compressed/Makefile       |   2 +-
>>  arch/x86/boot/compressed/ident_map_64.c |  15 ++-
>>  arch/x86/boot/compressed/misc.c         | 109 +--------------------
>>  arch/x86/boot/compressed/misc.h         |  13 ++-
>>  arch/x86/boot/compressed/putstr.c       | 124 
>> ++++++++++++++++++++++++
>>  5 files changed, 146 insertions(+), 117 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/putstr.c
...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 12/16] x86/boot: Add EFI kernel extraction interface
  2022-10-19  7:27   ` Ard Biesheuvel
@ 2022-10-20 12:14     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 12:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:27, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
...
>> 
>> +void init_efi_console(struct efi_iofunc *iofunc)
> 
> struct efi_iofunc does not exist yet
> 

My bad, will move the definition from the next patch to this one.
Thanks for pointing out.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 13/16] efi/x86: Support extracting kernel from libstub
  2022-10-19  7:35   ` Ard Biesheuvel
@ 2022-10-20 12:36     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 12:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:35, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Doing it that way allows setting up stricter memory attributes,
>> simplifies boot code path and removes potential relocation
>> of kernel image.
>> 
>> Wire up required interfaces and minimally initialize zero page
>> fields needed for it to function correctly.
>> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>> 
>>  create mode 100644 arch/x86/include/asm/shared/extract.h
>>  create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
>> ---
>>  arch/x86/boot/compressed/head_32.S            |   6 +-
>>  arch/x86/boot/compressed/head_64.S            |  45 ++++
>>  arch/x86/include/asm/shared/extract.h         |  25 ++
>>  drivers/firmware/efi/Kconfig                  |  14 ++
>>  drivers/firmware/efi/libstub/Makefile         |   1 +
>>  drivers/firmware/efi/libstub/efistub.h        |   5 +
>>  .../firmware/efi/libstub/x86-extract-direct.c | 220 
>> ++++++++++++++++++
>>  drivers/firmware/efi/libstub/x86-stub.c       |  45 ++--
>>  8 files changed, 343 insertions(+), 18 deletions(-)
>>  create mode 100644 arch/x86/include/asm/shared/extract.h
>>  create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
>> 
>> diff --git a/arch/x86/boot/compressed/head_32.S 
>> b/arch/x86/boot/compressed/head_32.S
>> index b46a1c4109cf..d2866f06bc9f 100644
>> --- a/arch/x86/boot/compressed/head_32.S
>> +++ b/arch/x86/boot/compressed/head_32.S
>> @@ -155,7 +155,11 @@ SYM_FUNC_START(efi32_stub_entry)
>>         add     $0x4, %esp
>>         movl    8(%esp), %esi   /* save boot_params pointer */
>>         call    efi_main
>> -       /* efi_main returns the possibly relocated address of 
>> startup_32 */
>> +
>> +       /*
>> +        * efi_main returns the possibly
>> +        * relocated address of exteracted kernel entry point.
> 
> extracted

Thanks, will fix.
> 

...
>> diff --git a/arch/x86/include/asm/shared/extract.h 
>> b/arch/x86/include/asm/shared/extract.h
>> new file mode 100644
>> index 000000000000..163678145884
>> --- /dev/null
>> +++ b/arch/x86/include/asm/shared/extract.h
>> @@ -0,0 +1,25 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef ASM_SHARED_EXTRACT_H
>> +#define ASM_SHARED_EXTRACT_H
>> +
>> +#define MAP_WRITE      0x02 /* Writable memory */
>> +#define MAP_EXEC       0x04 /* Executable memory */
>> +#define MAP_ALLOC      0x10 /* Range needs to be allocated */
>> +#define MAP_PROTECT    0x20 /* Set exact memory attributes for memory 
>> range */
>> +
>> +struct efi_iofunc {
>> +       void (*putstr)(const char *msg);
>> +       void (*puthex)(unsigned long x);
>> +       unsigned long (*map_range)(unsigned long start,
>> +                                  unsigned long end,
>> +                                  unsigned int flags);
> 
> This looks a bit random - having a map_range() routine as a member of
> the console I/O struct. Can we make this abstraction a bit more
> natural?

Hmm, I can either change the name of this stucture
to something more generic (like efi_extract_callbacks) or
split map_range separately as a separate function argument.
(Renaming seems simpler, so I will do that for now.)

>> +};
>> +
>> +void *efi_extract_kernel(struct boot_params *rmode,
>> +                        struct efi_iofunc *iofunc,
>> +                        unsigned char *input_data,
>> +                        unsigned long input_len,
>> +                        unsigned char *output,
>> +                        unsigned long output_len);
>> +
>> +#endif /* ASM_SHARED_EXTRACT_H */
>> diff --git a/drivers/firmware/efi/Kconfig 
>> b/drivers/firmware/efi/Kconfig
>> index 6cb7384ad2ac..2418402a0bda 100644
>> --- a/drivers/firmware/efi/Kconfig
>> +++ b/drivers/firmware/efi/Kconfig
>> @@ -91,6 +91,20 @@ config EFI_DXE_MEM_ATTRIBUTES
>>           Use DXE services to check and alter memory protection
>>           attributes during boot via EFISTUB to ensure that memory
>>           ranges used by the kernel are writable and executable.
>> +         This option also enables stricter memory attributes
>> +         on compressed kernel PE image.
>> +
>> +config EFI_STUB_EXTRACT_DIRECT
>> +       bool "Extract kernel directly from UEFI environment"
>> +       depends on EFI && EFI_STUB && X86_64
>> +       default y
> 
> What is the reason for making this configurable? Couldn't we just
> enable it unconditionally?
> 
When I first implemented it it was too hackish, but now it seems OK, so
I can make it unconditional, and it will make things simpler in several
places. Although making it work on x86_32 will require some additional 
work.

Also kernel with EFI_STUB_EXTRACT_DIRECT disabled breaks boot process 
with Mu
firmware when W^X enabled, as pointed out by Peter.

So, I guess, I will just remove the switch.

>> 
>> +#ifdef CONFIG_X86
>> +unsigned long extract_kernel_direct(struct boot_params *boot_params);
>> +void startup_32(struct boot_params *boot_params);
>> +#endif
>> +
> 
> Please put this somewhere else
> 

Will adding little x86-specific header file for these be appropriate?

>> +
>> +#include "efistub.h"
>> +
>> +static void do_puthex(unsigned long value);
>> +static void do_putstr(const char *msg);
>> +
> 
> Can we get rid of these forward declarations?
> 

Yes, I will move those functions here and remove declarations.

...
>> +       /* First page of trampoline is a top level page table */
>> +       efi_adjust_memory_range_protection(trampoline_start,
>> +                                          PAGE_SIZE,
>> +                                          EFI_MEMORY_XP);
>> +
>> +       /* Second page of trampoline is the code (with a padding) */
>> +       status = efi_get_memory_map(&map);
> 
> efi_get_memory_map() has been updated in the mean time, so this needs a 
> rewrite.

Yep, it needs a rebase now.
> 
...
>>  setup_memory_protection(unsigned long image_base, unsigned long 
>> image_size)
>>  {
>>         /*
>> -        * Allow execution of possible trampoline used
>> -        * for switching between 4- and 5-level page tables
>> -        * and relocated kernel image.
>> -        */
>> +       * Allow execution of possible trampoline used
>> +       * for switching between 4- and 5-level page tables
>> +       * and relocated kernel image.
>> +       */
>> 
> 
> Drop this hunk please

That was unintentional, thanks.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 14/16] x86/build: Make generated PE more spec compliant
  2022-10-19  7:39   ` Ard Biesheuvel
@ 2022-10-20 13:07     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 13:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:39, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Currently kernel image is not fully compliant PE image, so it may
>> fail to boot with stricter implementations of UEFI PE loaders.
>> 
>> Set minimal alignments and sizes specified by PE documentation [1]
>> referenced by UEFI specification [2]. Align PE header to 8 bytes.
> 
> 
>> Generate '.reloc' section with 2 entries and set reloc data directory.
> 
> Why?

It seems to me that I saw minimal size requirement in MS documentation,
but now I cannot find the proof of my words, so I've probably misread.
So I'll drop this change.
> 
> 
>> 
>> To make code more readable refactor tools/build.c:
>>         - Use mmap() to access kernel image.
>>         - Generate sections dynamically.
>>         - Setup sections protection. Since we cannot fit every
>>           needed section, set a part of protection flags
>>           dynamically during initialization. This step is omitted
>>           if CONFIG_EFI_DXE_MEM_ATTRIBUTES is not set.
>> 
> 
> If the commit log of a patch contains a bulleted list of the changes
> that it implements, it is a very strong indicator that it needs to be
> split up. Presenting this as a big ball of changes makes the life of a
> reviewed unnecessarily hard.
> 

Sorry for that, I'll try to separate this into several patches.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 16/16] efi/libstub: Use memory attribute protocol
  2022-10-19  7:42   ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Ard Biesheuvel
@ 2022-10-20 13:13     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 13:13 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov, lvc-project,
	x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:42, Ard Biesheuvel wrote:
> On Tue, 6 Sept 2022 at 12:42, Evgeniy Baskov <baskov@ispras.ru> wrote:
>> 
>> Add EFI_MEMORY_ATTRIBUTE_PROTOCOL as preferred alternative to DXE
>> services for changing memory attributes in the EFISTUB.
>> 
>> Use DXE services only as a fallback in case aforementioned protocol
>> is not supported by UEFI implementation.
>> 
>> Move DXE services initialization code closer to the place they are 
>> used
>> to match EFI_MEMORY_ATTRIBUTE_PROTOCOL initialization code.
>> 
>> Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
>> ---
>>  drivers/firmware/efi/libstub/mem.c      | 166 
>> ++++++++++++++++++------
>>  drivers/firmware/efi/libstub/x86-stub.c |  17 ---
>>  2 files changed, 127 insertions(+), 56 deletions(-)
>> 
>> diff --git a/drivers/firmware/efi/libstub/mem.c 
>> b/drivers/firmware/efi/libstub/mem.c
>> index 89ebc8ad2c22..8c8782993b30 100644
>> --- a/drivers/firmware/efi/libstub/mem.c
>> +++ b/drivers/firmware/efi/libstub/mem.c
>> @@ -5,6 +5,9 @@
>> 
>>  #include "efistub.h"
>> 
>> +const efi_dxe_services_table_t *efi_dxe_table;
>> +efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
>> +
>>  static inline bool mmap_has_headroom(unsigned long buff_size,
>>                                      unsigned long map_size,
>>                                      unsigned long desc_size)
>> @@ -131,50 +134,32 @@ void efi_free(unsigned long size, unsigned long 
>> addr)
>>         efi_bs_call(free_pages, addr, nr_pages);
>>  }
>> 
>> -/**
>> - * efi_adjust_memory_range_protection() - change memory range 
>> protection attributes
>> - * @start:     memory range start address
>> - * @size:      memory range size
>> - *
>> - * Actual memory range for which memory attributes are modified is
>> - * the smallest ranged with start address and size aligned to 
>> EFI_PAGE_SIZE
>> - * that includes [start, start + size].
>> - *
>> - * @return: status code
>> - */
>> -efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>> -                                               unsigned long size,
>> -                                               unsigned long 
>> attributes)
>> +static void retrive_dxe_table(void)
> 
> retrieve
> 
>> +{
>> +       efi_dxe_table = 
>> get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>> +       if (efi_dxe_table &&
>> +           efi_dxe_table->hdr.signature != 
>> EFI_DXE_SERVICES_TABLE_SIGNATURE) {
>> +               efi_warn("Ignoring DXE services table: invalid 
>> signature\n");
>> +               efi_dxe_table = NULL;
>> +       }
>> +}
>> +
>> +static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t 
>> rounded_start,
>> +                                         efi_physical_addr_t 
>> rounded_end,
>> +                                         unsigned long attributes)
>>  {
>>         efi_status_t status;
>>         efi_gcd_memory_space_desc_t desc;
>> -       efi_physical_addr_t end, next;
>> -       efi_physical_addr_t rounded_start, rounded_end;
>> +       efi_physical_addr_t end, next, start;
>>         efi_physical_addr_t unprotect_start, unprotect_size;
>>         int has_system_memory = 0;
>> 
>> -       if (efi_dxe_table == NULL)
>> -               return EFI_UNSUPPORTED;
>> +       if (!efi_dxe_table) {
>> +               retrive_dxe_table();
> 
> Same here
> 
>> 
>> -       /*
>> -        * This function should not be used to modify attributes
>> -        * other than writable/executable.
>> -        */
>> -
>> -       if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
>> -               return EFI_INVALID_PARAMETER;
>> -
>> -       /*
>> -        * Disallow simultaniously executable and writable memory
>> -        * to inforce W^X policy if direct extraction code is enabled.
>> -        */
>> -
>> -       if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
>> -           IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
>> -               return EFI_INVALID_PARAMETER;
>> -
>> -       rounded_start = rounddown(start, EFI_PAGE_SIZE);
>> -       rounded_end = roundup(start + size, EFI_PAGE_SIZE);
>> +               if (!efi_dxe_table)
>> +                       return EFI_UNSUPPORTED;
>> +       }
>> 
>>         /*
>>          * Don't modify memory region attributes, they are
>> @@ -182,14 +167,15 @@ efi_status_t 
>> efi_adjust_memory_range_protection(unsigned long start,
>>          * encounter firmware bugs.
>>          */
>> 
>> -       for (end = start + size; start < end; start = next) {
>> +
>> +       for (start = rounded_start, end = rounded_end; start < end; 
>> start = next) {
>> 
>>                 status = efi_dxe_call(get_memory_space_descriptor,
>>                                       start, &desc);
>> 
>>                 if (status != EFI_SUCCESS) {
>>                         efi_warn("Unable to get memory descriptor at 
>> %lx\n",
>> -                                start);
>> +                                (unsigned long)start);
>>                         return status;
>>                 }
>> 
>> @@ -231,3 +217,105 @@ efi_status_t 
>> efi_adjust_memory_range_protection(unsigned long start,
>> 
>>         return EFI_SUCCESS;
>>  }
>> +
>> +static void retrive_memory_attributes_proto(void)
> 
> and here
> 
>> +{
>> +       efi_status_t status;
>> +       efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
>> +
>> +       status = efi_bs_call(locate_protocol, &guid, NULL,
>> +                            (void **)&efi_mem_attrib_proto);
>> +       if (status != EFI_SUCCESS)
>> +               efi_mem_attrib_proto = NULL;
>> +}
>> +
>> +/**
>> + * efi_adjust_memory_range_protection() - change memory range 
>> protection attributes
>> + * @start:     memory range start address
>> + * @size:      memory range size
>> + *
>> + * Actual memory range for which memory attributes are modified is
>> + * the smallest ranged with start address and size aligned to 
>> EFI_PAGE_SIZE
>> + * that includes [start, start + size].
>> + *
>> + * This function first attempts to use EFI_MEMORY_ATTRIBUTE_PROTOCOL,
>> + * that is a part of UEFI Specification since version 2.10.
>> + * If the protocol is unavailable it falls back to DXE services 
>> functions.
>> + *
>> + * @return: status code
>> + */
>> +efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>> +                                               unsigned long size,
>> +                                               unsigned long 
>> attributes)
>> +{
>> +       efi_status_t status;
>> +       efi_physical_addr_t rounded_start, rounded_end;
>> +       unsigned long attr_clear;
>> +
>> +       /*
>> +        * This function should not be used to modify attributes
>> +        * other than writable/executable.
>> +        */
>> +
>> +       if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
>> +               return EFI_INVALID_PARAMETER;
>> +
>> +       /*
>> +        * Disallow simultaniously executable and writable memory
> 
> simultaneously
> 
>> +        * to inforce W^X policy if direct extraction code is enabled.
> 
> enforce
> 
>> +        */
>> +
>> +       if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0 &&
>> +           IS_ENABLED(CONFIG_EFI_STUB_EXTRACT_DIRECT))
> 
> efi_adjust_memory_range_protection() is a generic routine, but here it
> depends on a x86-specific Kconfig symbol. Is that really necessary?

Since, I will make direct extraction unconditional, I will remove this 
as
well, since it will be always true.

> 
>> +               return EFI_INVALID_PARAMETER;
>> +
>> +       rounded_start = rounddown(start, EFI_PAGE_SIZE);
>> +       rounded_end = roundup(start + size, EFI_PAGE_SIZE);
>> +
>> +       if (!efi_mem_attrib_proto) {
>> +               retrive_memory_attributes_proto();
> 
> retrieve
> 
>> +
>> +               /* Fall back to DXE services if unsupported */
>> +               if (!efi_mem_attrib_proto) {
>> +                       return adjust_mem_attrib_dxe(rounded_start,
>> +                                                    rounded_end,
>> +                                                    attributes);
>> +               }
>> +       }
>> +
>> +       /*
>> +        * Unlike DXE services functions, 
>> EFI_MEMORY_ATTRIBUTE_PROTOCOL
>> +        * does not clear unset protection bit, so it needs to be 
>> cleared
>> +        * explcitly
>> +        */
>> +
>> +       attr_clear = ~attributes &
>> +                    (EFI_MEMORY_RO | EFI_MEMORY_XP | EFI_MEMORY_RP);
>> +
>> +       status = efi_call_proto(efi_mem_attrib_proto,
>> +                               clear_memory_attributes,
>> +                               rounded_start,
>> +                               rounded_end - rounded_start,
>> +                               attr_clear);
>> +       if (status != EFI_SUCCESS) {
>> +               efi_warn("Failed to clear memory attributes at 
>> [%08lx,%08lx]: %lx",
> 
> Need \n at the end here
> 
>> +                        (unsigned long)rounded_start,
>> +                        (unsigned long)rounded_end,
>> +                        status);
>> +               return status;
>> +       }
>> +
>> +       status = efi_call_proto(efi_mem_attrib_proto,
>> +                               set_memory_attributes,
>> +                               rounded_start,
>> +                               rounded_end - rounded_start,
>> +                               attributes);
>> +       if (status != EFI_SUCCESS) {
>> +               efi_warn("Failed to set memory attributes at 
>> [%08lx,%08lx]: %lx",
> 
> and here
> 
>> +                        (unsigned long)rounded_start,
>> +                        (unsigned long)rounded_end,
>> +                        status);
>> +       }
>> +
>> +       return status;
>> +}
>> diff --git a/drivers/firmware/efi/libstub/x86-stub.c 
>> b/drivers/firmware/efi/libstub/x86-stub.c
>> index 914106d547a6..dd1e1e663072 100644
>> --- a/drivers/firmware/efi/libstub/x86-stub.c
>> +++ b/drivers/firmware/efi/libstub/x86-stub.c
>> @@ -22,7 +22,6 @@
>>  #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>> 
>>  const efi_system_table_t *efi_system_table;
>> -const efi_dxe_services_table_t *efi_dxe_table;
>>  extern u32 image_offset;
>>  static efi_loaded_image_t *image = NULL;
>> 
>> @@ -401,15 +400,6 @@ static void setup_sections_memory_protection(void 
>> *image_base,
>>                                              unsigned long init_size)
>>  {
>>  #ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
>> -       efi_dxe_table = 
>> get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>> -
>> -       if (!efi_dxe_table ||
>> -           efi_dxe_table->hdr.signature != 
>> EFI_DXE_SERVICES_TABLE_SIGNATURE) {
>> -               efi_warn("Unable to locate EFI DXE services table\n");
>> -               efi_dxe_table = NULL;
>> -               return;
>> -       }
>> -
>>         extern char _head[], _ehead[];
>>         extern char _compressed[], _ecompressed[];
>>         extern char _text[], _etext[];
>> @@ -791,13 +781,6 @@ unsigned long efi_main(efi_handle_t handle,
>>         if (efi_system_table->hdr.signature != 
>> EFI_SYSTEM_TABLE_SIGNATURE)
>>                 efi_exit(handle, EFI_INVALID_PARAMETER);
>> 
>> -       efi_dxe_table = 
>> get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>> -       if (efi_dxe_table &&
>> -           efi_dxe_table->hdr.signature != 
>> EFI_DXE_SERVICES_TABLE_SIGNATURE) {
>> -               efi_warn("Ignoring DXE services table: invalid 
>> signature\n");
>> -               efi_dxe_table = NULL;
>> -       }
>> -
>>  #ifndef CONFIG_EFI_STUB_EXTRACT_DIRECT
>>         /*
>>          * If the kernel isn't already loaded at a suitable address,
>> --
>> 2.35.1
>> 

Thanks, for nitpicks, I will fix all the typos. \n's are fixed by 
Peter's patch,
so I will rely on it.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline
  2022-10-19  7:44   ` Andrew Cooper
@ 2022-10-20 13:25     ` Evgeniy Baskov
  0 siblings, 0 replies; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 13:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Ard Biesheuvel, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:44, Andrew Cooper wrote:
> On 06/09/2022 11:41, Evgeniy Baskov wrote:
>> diff --git a/arch/x86/boot/compressed/head_64.S 
>> b/arch/x86/boot/compressed/head_64.S
>> index d33f060900d2..5273367283b7 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -619,9 +619,8 @@ SYM_CODE_START(trampoline_32bit_src)
>>  	/* Set up new stack */
>>  	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
>> 
>> -	/* Disable paging */
>> -	movl	%cr0, %eax
>> -	btrl	$X86_CR0_PG_BIT, %eax
>> +	/* Disable paging and setup CR0 */
>> +	movl	$(CR0_STATE & ~X86_CR0_PG), %eax
> 
> Why here?  WP is ignored when PG is disabled.
> 
> ~Andrew

PG is enabled lower in this function, so WP can also be set there,
it should not make any difference. The only important thing is that
WP supposed to be set in trampoline code.

If you think, that it would be more logical to set PG and WP
simultaneously, I can change it to be that way.

Thanks,
Evgeniy Baskov

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/16] x86/boot: Setup memory protection for bzImage code
  2022-10-19  7:57   ` Andrew Cooper
@ 2022-10-20 13:30     ` Evgeniy Baskov
  2022-10-20 16:51       ` Andrew Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Evgeniy Baskov @ 2022-10-20 13:30 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Ard Biesheuvel, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening

On 2022-10-19 10:57, Andrew Cooper wrote:
> On 06/09/2022 11:41, Evgeniy Baskov wrote:
>> diff --git a/arch/x86/boot/compressed/head_64.S 
>> b/arch/x86/boot/compressed/head_64.S
>> index 5273367283b7..889ca7176aa7 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -602,6 +603,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>  	jmp	*%rax
>>  SYM_FUNC_END(.Lrelocated)
>> 
>> +SYM_FUNC_START_LOCAL_NOALIGN(startup32_enable_nx_if_supported)
>> +	pushq	%rbx
>> +
>> +	leaq	has_nx(%rip), %rcx
>> +
>> +	mov	$0x80000001, %eax
>> +	cpuid
>> +	btl	$20, %edx
> 
> btl $(X86_FEATURE_NX & 31), %edx
> 
> But also need to check for the availability of the extended leaf in the
> first place.

Yes, thank you for suggestion, that looks more readable. I will
also add the leaf node check. Is there any processor though that
supports long mode and does not support 0x80000001 leaf node?

> 
>> +	jnc	.Lnonx
>> +
>> +	movl	$1, (%rcx)
> 
> Your pointer has been clobbered with some feature flags.

Thanks, I apparently forgot to include fix for this into a patch set...

> 
> movl $1, has_nx(%rip)
> 
> will work fine without needing the intermediary lea.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 06/16] x86/boot: Setup memory protection for bzImage code
  2022-10-20 13:30     ` Evgeniy Baskov
@ 2022-10-20 16:51       ` Andrew Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Andrew Cooper @ 2022-10-20 16:51 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Ard Biesheuvel, Borislav Petkov, Andy Lutomirski, Dave Hansen,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Alexey Khoroshilov,
	lvc-project, x86, linux-efi, linux-kernel, linux-hardening,
	Andrew Cooper

On 20/10/2022 14:30, Evgeniy Baskov wrote:
> On 2022-10-19 10:57, Andrew Cooper wrote:
>> On 06/09/2022 11:41, Evgeniy Baskov wrote:
>>> diff --git a/arch/x86/boot/compressed/head_64.S
>>> b/arch/x86/boot/compressed/head_64.S
>>> index 5273367283b7..889ca7176aa7 100644
>>> --- a/arch/x86/boot/compressed/head_64.S
>>> +++ b/arch/x86/boot/compressed/head_64.S
>>> @@ -602,6 +603,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>>      jmp    *%rax
>>>  SYM_FUNC_END(.Lrelocated)
>>>
>>> +SYM_FUNC_START_LOCAL_NOALIGN(startup32_enable_nx_if_supported)
>>> +    pushq    %rbx
>>> +
>>> +    leaq    has_nx(%rip), %rcx
>>> +
>>> +    mov    $0x80000001, %eax
>>> +    cpuid
>>> +    btl    $20, %edx
>>
>> btl $(X86_FEATURE_NX & 31), %edx
>>
>> But also need to check for the availability of the extended leaf in the
>> first place.
>
> Yes, thank you for suggestion, that looks more readable. I will
> also add the leaf node check. Is there any processor though that
> supports long mode and does not support 0x80000001 leaf node?

No, good point.  The Long Mode feature bit is in this leaf too.

~Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2022-10-20 16:51 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-06 10:41 [PATCH 00/16] x86_64: Improvements at compressed kernel stage Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 01/16] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
2022-10-19  7:01   ` Ard Biesheuvel
2022-10-20 11:13     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 02/16] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
2022-10-19  7:04   ` Ard Biesheuvel
2022-10-20 11:15     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 03/16] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
2022-10-19  7:06   ` Ard Biesheuvel
2022-10-20 11:23     ` Evgeniy Baskov
2022-10-19  7:44   ` Andrew Cooper
2022-10-20 13:25     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 04/16] x86/boot: Increase boot page table size Evgeniy Baskov
2022-10-19  7:08   ` Ard Biesheuvel
2022-10-20 11:29     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 05/16] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
2022-10-19  7:11   ` Ard Biesheuvel
2022-10-20 11:30     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 06/16] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
2022-10-19  7:17   ` Ard Biesheuvel
2022-10-20 12:07     ` Evgeniy Baskov
2022-10-19  7:57   ` Andrew Cooper
2022-10-20 13:30     ` Evgeniy Baskov
2022-10-20 16:51       ` Andrew Cooper
2022-09-06 10:41 ` [PATCH 07/16] x86/boot: Map memory explicitly Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 08/16] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
2022-10-19  7:20   ` Ard Biesheuvel
2022-09-06 10:41 ` [PATCH 09/16] efi/libstub: Move helper function to related file Evgeniy Baskov
2022-10-19  7:21   ` Ard Biesheuvel
2022-09-06 10:41 ` [PATCH 10/16] x86/boot: Make console interface more abstract Evgeniy Baskov
2022-10-19  7:23   ` Ard Biesheuvel
2022-10-20 12:10     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 11/16] x86/boot: Split trampoline and pt init code Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 12/16] x86/boot: Add EFI kernel extraction interface Evgeniy Baskov
2022-10-19  7:27   ` Ard Biesheuvel
2022-10-20 12:14     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 13/16] efi/x86: Support extracting kernel from libstub Evgeniy Baskov
2022-10-19  7:35   ` Ard Biesheuvel
2022-10-20 12:36     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 14/16] x86/build: Make generated PE more spec compliant Evgeniy Baskov
2022-10-19  7:39   ` Ard Biesheuvel
2022-10-20 13:07     ` Evgeniy Baskov
2022-09-06 10:41 ` [PATCH 15/16] efi/libstub: Add memory attribute protocol definitions Evgeniy Baskov
2022-10-19  7:39   ` Ard Biesheuvel
2022-09-06 10:41 ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Evgeniy Baskov
2022-10-18 20:51   ` [PATCH] efi/libstub: make memory protection warnings include newlines Peter Jones
2022-10-19  7:44     ` Ard Biesheuvel
2022-10-19  7:42   ` [PATCH 16/16] efi/libstub: Use memory attribute protocol Ard Biesheuvel
2022-10-20 13:13     ` Evgeniy Baskov
2022-10-18 21:04 ` [PATCH 00/16] x86_64: Improvements at compressed kernel stage Peter Jones
2022-10-20 11:05   ` Evgeniy Baskov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).