linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
@ 2022-08-01 16:38 Evgeniy Baskov
  2022-08-01 16:38 ` [PATCH 1/8] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

This is the first half of changes aimed to increase security of early
boot code of compressed kernel for x86_64 by enforcing memory protection
on page table level.

It applies memory protection to the compressed kernel code executing
outside EFI environment and makes all identity mappings explicit
to reduce probability of hiding erroneous memory accesses.

Second half makes kernel more compliant PE image and enforces memory
protection for EFISTUB code, thus completing W^X support for compressed
kernel.

I'll send second half for review later.

Evgeniy Baskov (8):
  x86/boot: Align vmlinuz sections on page size
  x86/build: Remove RWX sections and align on 4KB
  x86/boot: Set cr0 to known state in trampoline
  x86/boot: Increase boot page table size
  x86/boot: Support 4KB pages for identity mapping
  x86/boot: Setup memory protection for bzImage code
  x86/boot: Map memory explicitly
  x86/boot: Remove mapping from page fault handler

 arch/x86/boot/compressed/acpi.c         |  21 ++-
 arch/x86/boot/compressed/efi.c          |  19 ++-
 arch/x86/boot/compressed/head_64.S      |   7 +-
 arch/x86/boot/compressed/ident_map_64.c | 128 ++++++++++------
 arch/x86/boot/compressed/kaslr.c        |   4 +
 arch/x86/boot/compressed/misc.c         |  52 ++++++-
 arch/x86/boot/compressed/misc.h         |  16 +-
 arch/x86/boot/compressed/pgtable.h      |  20 ---
 arch/x86/boot/compressed/pgtable_64.c   |   2 +-
 arch/x86/boot/compressed/sev.c          |   6 +-
 arch/x86/boot/compressed/vmlinux.lds.S  |   6 +
 arch/x86/include/asm/boot.h             |  26 ++--
 arch/x86/include/asm/init.h             |   1 +
 arch/x86/include/asm/shared/pgtable.h   |  29 ++++
 arch/x86/kernel/vmlinux.lds.S           |  15 +-
 arch/x86/mm/ident_map.c                 | 186 ++++++++++++++++++++----
 16 files changed, 403 insertions(+), 135 deletions(-)
 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h

-- 
2.35.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/8] x86/boot: Align vmlinuz sections on page size
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
@ 2022-08-01 16:38 ` Evgeniy Baskov
  2022-08-01 16:38 ` [PATCH 2/8] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

To protect sections on page table level each section
needs to be aligned on page size (4KB).

Set sections alignment in linker script.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 112b2375d021..6be90f1a1198 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -27,21 +27,27 @@ SECTIONS
 		HEAD_TEXT
 		_ehead = . ;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.rodata..compressed : {
+		_compressed = .;
 		*(.rodata..compressed)
+		_ecompressed = .;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.text :	{
 		_text = .; 	/* Text */
 		*(.text)
 		*(.text.*)
 		_etext = . ;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.rodata : {
 		_rodata = . ;
 		*(.rodata)	 /* read-only data */
 		*(.rodata.*)
 		_erodata = . ;
 	}
+	. = ALIGN(PAGE_SIZE);
 	.data :	{
 		_data = . ;
 		*(.data)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/8] x86/build: Remove RWX sections and align on 4KB
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
  2022-08-01 16:38 ` [PATCH 1/8] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
@ 2022-08-01 16:38 ` Evgeniy Baskov
  2022-08-01 16:39 ` [PATCH 3/8] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

Avoid creating sections with maximal privileges to prepare for W^X
implementation. Align sections on page size (4KB) to allow protecting
them in page table.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 15f29053cec4..6587e0201b50 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -102,12 +102,11 @@ jiffies = jiffies_64;
 PHDRS {
 	text PT_LOAD FLAGS(5);          /* R_E */
 	data PT_LOAD FLAGS(6);          /* RW_ */
-#ifdef CONFIG_X86_64
-#ifdef CONFIG_SMP
+#if defined(CONFIG_X86_64) && defined(CONFIG_SMP)
 	percpu PT_LOAD FLAGS(6);        /* RW_ */
 #endif
-	init PT_LOAD FLAGS(7);          /* RWE */
-#endif
+	inittext PT_LOAD FLAGS(5);      /* R_E */
+	init PT_LOAD FLAGS(6);          /* RW_ */
 	note PT_NOTE FLAGS(0);          /* ___ */
 }
 
@@ -226,9 +225,10 @@ SECTIONS
 #endif
 
 	INIT_TEXT_SECTION(PAGE_SIZE)
-#ifdef CONFIG_X86_64
-	:init
-#endif
+	:inittext
+
+	. = ALIGN(PAGE_SIZE);
+
 
 	/*
 	 * Section for code used exclusively before alternatives are run. All
@@ -240,6 +240,7 @@ SECTIONS
 	.altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
 		*(.altinstr_aux)
 	}
+	:init
 
 	INIT_DATA_SECTION(16)
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/8] x86/boot: Set cr0 to known state in trampoline
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
  2022-08-01 16:38 ` [PATCH 1/8] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
  2022-08-01 16:38 ` [PATCH 2/8] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
@ 2022-08-01 16:39 ` Evgeniy Baskov
  2022-08-01 16:39 ` [PATCH 4/8] x86/boot: Increase boot page table size Evgeniy Baskov
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

Ensure WP bit to be set to prevent boot code from writing to
non-writable memory pages.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d33f060900d2..5273367283b7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -619,9 +619,8 @@ SYM_CODE_START(trampoline_32bit_src)
 	/* Set up new stack */
 	leal	TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
 
-	/* Disable paging */
-	movl	%cr0, %eax
-	btrl	$X86_CR0_PG_BIT, %eax
+	/* Disable paging and setup CR0 */
+	movl	$(CR0_STATE & ~X86_CR0_PG), %eax
 	movl	%eax, %cr0
 
 	/* Check what paging mode we want to be in after the trampoline */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/8] x86/boot: Increase boot page table size
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
                   ` (2 preceding siblings ...)
  2022-08-01 16:39 ` [PATCH 3/8] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
@ 2022-08-01 16:39 ` Evgeniy Baskov
  2022-08-01 16:39 ` [PATCH 5/8] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

Previous calculations ignored pages implicitly mapped by ACPI code,
so theoretical upper limit is higher than was set.

Using 4KB pages is desirable for better memory protection granularity.
Approximately twice as much memory is required for those.

Increase initial page table size to 64 4KB page tables.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3..024d972c248e 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -41,22 +41,24 @@
 # define BOOT_STACK_SIZE	0x4000
 
 # define BOOT_INIT_PGT_SIZE	(6*4096)
-# ifdef CONFIG_RANDOMIZE_BASE
 /*
  * Assuming all cross the 512GB boundary:
  * 1 page for level4
- * (2+2)*4 pages for kernel, param, cmd_line, and randomized kernel
- * 2 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
- * Total is 19 pages.
+ * (3+3)*2 pages for param and cmd_line
+ * (2+2+S)*2 pages for kernel and randomized kernel, where S is total number
+ *     of sections of kernel. Explanation: 2+2 are upper level page tables.
+ *     We can have only S unaligned parts of section: 1 at the end of the kernel
+ *     and (S-1) at the section borders. The start address of the kernel is
+ *     aligned, so an extra page table. There are at most S=6 sections in
+ *     vmlinux ELF image.
+ * 3 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
+ * Total is 36 pages.
+ *
+ * Some pages are also required for UEFI memory map and
+ * ACPI table mappings, so we need to add extra space.
+ * FIXME: Figure out exact amount of pages.
  */
-#  ifdef CONFIG_X86_VERBOSE_BOOTUP
-#   define BOOT_PGT_SIZE	(19*4096)
-#  else /* !CONFIG_X86_VERBOSE_BOOTUP */
-#   define BOOT_PGT_SIZE	(17*4096)
-#  endif
-# else /* !CONFIG_RANDOMIZE_BASE */
-#  define BOOT_PGT_SIZE		BOOT_INIT_PGT_SIZE
-# endif
+# define BOOT_PGT_SIZE		(64*4096)
 
 #else /* !CONFIG_X86_64 */
 # define BOOT_STACK_SIZE	0x1000
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/8] x86/boot: Support 4KB pages for identity mapping
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
                   ` (3 preceding siblings ...)
  2022-08-01 16:39 ` [PATCH 4/8] x86/boot: Increase boot page table size Evgeniy Baskov
@ 2022-08-01 16:39 ` Evgeniy Baskov
  2022-08-01 16:39 ` [PATCH 6/8] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

Current identity mapping code only supports 2M and 1G pages.
4KB pages are desirable for better memory protection granularity
in compressed kernel code.

Change identity mapping code to support 4KB pages and
memory remapping with different attributes.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 5f1d3c421f68..a8277ee82c51 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -8,6 +8,7 @@ struct x86_mapping_info {
 	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
 	unsigned long offset;		 /* ident mapping offset */
 	bool direct_gbpages;		 /* PUD level 1GB page support */
+	bool allow_4kpages;		 /* Allow more granular mappings with 4K pages */
 	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
 };
 
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..177cd43c8db9 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -2,26 +2,130 @@
 /*
  * Helper routines for building identity mapping page tables. This is
  * included by both the compressed kernel and the regular kernel.
+ *
  */
 
-static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
-			   unsigned long addr, unsigned long end)
+static void ident_pte_init(struct x86_mapping_info *info, pte_t *pte_page,
+			   unsigned long addr, unsigned long end,
+			   unsigned long flags)
 {
-	addr &= PMD_MASK;
-	for (; addr < end; addr += PMD_SIZE) {
+	addr &= PAGE_MASK;
+	for (; addr < end; addr += PAGE_SIZE) {
+		pte_t *pte = pte_page + pte_index(addr);
+
+		set_pte(pte, __pte((addr - info->offset) | flags));
+	}
+}
+
+pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
+			     pmd_t *pmdp, unsigned long page_addr)
+{
+	unsigned long pmd_addr, page_flags;
+	pte_t *pte;
+
+	pte = (pte_t *)info->alloc_pgt_page(info->context);
+	if (!pte)
+		return NULL;
+
+	pmd_addr = page_addr & PMD_MASK;
+
+	/* Not a large page - clear PSE flag */
+	page_flags = pmd_flags(*pmdp) & ~_PSE;
+	ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE, page_flags);
+
+	return pte;
+}
+
+static int ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
+			  unsigned long addr, unsigned long end,
+			  unsigned long flags)
+{
+	unsigned long next;
+	bool new_table = 0;
+
+	for (; addr < end; addr = next) {
 		pmd_t *pmd = pmd_page + pmd_index(addr);
+		pte_t *pte;
 
-		if (pmd_present(*pmd))
+		next = (addr & PMD_MASK) + PMD_SIZE;
+		if (next > end)
+			next = end;
+
+		/*
+		 * Use 2M pages if 4k pages are not allowed or
+		 * we are not mapping extra, i.e. address and size are aligned.
+		 */
+
+		if (!info->allow_4kpages ||
+		    (!(addr & PMD_MASK) && next == addr + PMD_SIZE)) {
+
+			pmd_t pmdval;
+
+			addr &= PMD_MASK;
+			pmdval = __pmd((addr - info->offset) | flags | _PSE);
+			set_pmd(pmd, pmdval);
 			continue;
+		}
+
+		/*
+		 * If currently mapped page is large, we need to split it.
+		 * The case when we don't can remap 2M page to 2M page
+		 * with different flags is already covered above.
+		 *
+		 * If there's nothing mapped to desired address,
+		 * we need to allocate new page table.
+		 */
 
-		set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
+		if (pmd_large(*pmd)) {
+			pte = ident_split_large_pmd(info, pmd, addr);
+			new_table = 1;
+		} else if (!pmd_present(*pmd)) {
+			pte = (pte_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
+			pte = pte_offset_kernel(pmd, 0);
+			new_table = 0;
+		}
+
+		if (!pte)
+			return -ENOMEM;
+
+		ident_pte_init(info, pte, addr, next, flags);
+
+		if (new_table)
+			set_pmd(pmd, __pmd(__pa(pte) | info->kernpg_flag));
 	}
+
+	return 0;
 }
 
+
+pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
+			     pud_t *pudp, unsigned long page_addr)
+{
+	unsigned long pud_addr, page_flags;
+	pmd_t *pmd;
+
+	pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+	if (!pmd)
+		return NULL;
+
+	pud_addr = page_addr & PUD_MASK;
+
+	/* Not a large page - clear PSE flag */
+	page_flags = pud_flags(*pudp) & ~_PSE;
+	ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE, page_flags);
+
+	return pmd;
+}
+
+
 static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 			  unsigned long addr, unsigned long end)
 {
 	unsigned long next;
+	bool new_table = 0;
+	int result;
 
 	for (; addr < end; addr = next) {
 		pud_t *pud = pud_page + pud_index(addr);
@@ -31,28 +135,39 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		if (next > end)
 			next = end;
 
+		/* Use 1G pages only if forced, even if they are supported. */
 		if (info->direct_gbpages) {
 			pud_t pudval;
-
-			if (pud_present(*pud))
-				continue;
+			unsigned long flags;
 
 			addr &= PUD_MASK;
-			pudval = __pud((addr - info->offset) | info->page_flag);
+			flags = info->page_flag | _PSE;
+			pudval = __pud((addr - info->offset) | flags);
+
 			set_pud(pud, pudval);
 			continue;
 		}
 
-		if (pud_present(*pud)) {
+		if (pud_large(*pud)) {
+			pmd = ident_split_large_pud(info, pud, addr);
+			new_table = 1;
+		} else if (!pud_present(*pud)) {
+			pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
 			pmd = pmd_offset(pud, 0);
-			ident_pmd_init(info, pmd, addr, next);
-			continue;
+			new_table = 0;
 		}
-		pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+
 		if (!pmd)
 			return -ENOMEM;
-		ident_pmd_init(info, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
+
+		result = ident_pmd_init(info, pmd, addr, next, info->page_flag);
+		if (result)
+			return result;
+
+		if (new_table)
+			set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
 	}
 
 	return 0;
@@ -63,6 +178,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 {
 	unsigned long next;
 	int result;
+	bool new_table = 0;
 
 	for (; addr < end; addr = next) {
 		p4d_t *p4d = p4d_page + p4d_index(addr);
@@ -72,15 +188,14 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		if (next > end)
 			next = end;
 
-		if (p4d_present(*p4d)) {
+		if (!p4d_present(*p4d)) {
+			pud = (pud_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
 			pud = pud_offset(p4d, 0);
-			result = ident_pud_init(info, pud, addr, next);
-			if (result)
-				return result;
-
-			continue;
+			new_table = 0;
 		}
-		pud = (pud_t *)info->alloc_pgt_page(info->context);
+
 		if (!pud)
 			return -ENOMEM;
 
@@ -88,19 +203,22 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		if (result)
 			return result;
 
-		set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
+		if (new_table)
+			set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
 	}
 
 	return 0;
 }
 
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
-			      unsigned long pstart, unsigned long pend)
+int kernel_ident_mapping_init(struct x86_mapping_info *info,
+			      pgd_t *pgd_page, unsigned long pstart,
+			      unsigned long pend)
 {
 	unsigned long addr = pstart + info->offset;
 	unsigned long end = pend + info->offset;
 	unsigned long next;
 	int result;
+	bool new_table;
 
 	/* Set the default pagetable flags if not supplied */
 	if (!info->kernpg_flag)
@@ -117,20 +235,24 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		if (next > end)
 			next = end;
 
-		if (pgd_present(*pgd)) {
+		if (!pgd_present(*pgd)) {
+			p4d = (p4d_t *)info->alloc_pgt_page(info->context);
+			new_table = 1;
+		} else {
 			p4d = p4d_offset(pgd, 0);
-			result = ident_p4d_init(info, p4d, addr, next);
-			if (result)
-				return result;
-			continue;
+			new_table = 0;
 		}
 
-		p4d = (p4d_t *)info->alloc_pgt_page(info->context);
 		if (!p4d)
 			return -ENOMEM;
+
 		result = ident_p4d_init(info, p4d, addr, next);
 		if (result)
 			return result;
+
+		if (!new_table)
+			continue;
+
 		if (pgtable_l5_enabled()) {
 			set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
 		} else {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/8] x86/boot: Setup memory protection for bzImage code
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
                   ` (4 preceding siblings ...)
  2022-08-01 16:39 ` [PATCH 5/8] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
@ 2022-08-01 16:39 ` Evgeniy Baskov
  2022-08-01 16:39 ` [PATCH 7/8] x86/boot: Map memory explicitly Evgeniy Baskov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

Use previously added code to use 4KB pages for mapping. Map compressed
and uncompressed kernel with appropriate memory protection attributes.
For compressed kernel set them up manually. For uncompressed kernel
used flags specified in ELF header.

Move 'boot/compressed/pgtable.h' to common headers to make it accessible
from EFISTUB code.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 5273367283b7..4cc1463b98e8 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -35,7 +35,7 @@
 #include <asm/bootparam.h>
 #include <asm/desc_defs.h>
 #include <asm/trapnr.h>
-#include "pgtable.h"
+#include <asm/shared/pgtable.h>
 
 /*
  * Locally defined symbols should be marked hidden:
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index d4a314cc50d6..04022c080114 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -28,6 +28,7 @@
 #include <asm/trap_pf.h>
 #include <asm/trapnr.h>
 #include <asm/init.h>
+#include <asm/shared/pgtable.h>
 /* Use the static base for this part of the boot process */
 #undef __PAGE_OFFSET
 #define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -86,24 +87,45 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
  * Due to relocation, pointers must be assigned at run time not build time.
  */
 static struct x86_mapping_info mapping_info;
+static bool has_nx;
 
 /*
  * Adds the specified range to the identity mappings.
  */
-void kernel_add_identity_map(unsigned long start, unsigned long end)
+unsigned long kernel_add_identity_map(unsigned long start,
+				      unsigned long end,
+				      unsigned int flags)
 {
 	int ret;
 
 	/* Align boundary to 2M. */
-	start = round_down(start, PMD_SIZE);
-	end = round_up(end, PMD_SIZE);
+	start = round_down(start, PAGE_SIZE);
+	end = round_up(end, PAGE_SIZE);
 	if (start >= end)
-		return;
+		return start;
+
+	/* Enforce W^X -- just stop booting with error on violation. */
+	if ((flags & (MAP_EXEC | MAP_WRITE)) == (MAP_EXEC | MAP_WRITE))
+		error("Error: W^X violation\n");
+
+	bool nx = !(flags & MAP_EXEC) && has_nx;
+	bool ro = !(flags & MAP_WRITE);
+
+	mapping_info.page_flag = sme_me_mask | (nx ?
+		(ro ? __PAGE_KERNEL_RO : __PAGE_KERNEL) :
+		(ro ? __PAGE_KERNEL_ROX : __PAGE_KERNEL_EXEC));
 
 	/* Build the mapping. */
-	ret = kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, start, end);
+	ret = kernel_ident_mapping_init(&mapping_info,
+					(pgd_t *)top_level_pgt,
+					start, end);
 	if (ret)
 		error("Error: kernel_ident_mapping_init() failed\n");
+
+	if (!(flags & MAP_NOFLUSH))
+		write_cr3(top_level_pgt);
+
+	return start;
 }
 
 /* Locates and clears a region for a new top level page table. */
@@ -112,14 +134,17 @@ void initialize_identity_maps(void *rmode)
 	unsigned long cmdline;
 	struct setup_data *sd;
 
+	boot_params = rmode;
+
 	/* Exclude the encryption mask from __PHYSICAL_MASK */
 	physical_mask &= ~sme_me_mask;
 
 	/* Init mapping_info with run-time function/buffer pointers. */
 	mapping_info.alloc_pgt_page = alloc_pgt_page;
 	mapping_info.context = &pgt_data;
-	mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask;
+	mapping_info.page_flag = __PAGE_KERNEL_EXEC | sme_me_mask;
 	mapping_info.kernpg_flag = _KERNPG_TABLE;
+	mapping_info.allow_4kpages = 1;
 
 	/*
 	 * It should be impossible for this not to already be true,
@@ -151,18 +176,46 @@ void initialize_identity_maps(void *rmode)
 		top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
 	}
 
+	/*
+	 * Check if this CPU supports NX flag and use
+	 * it appropriately for identity mappings.
+	 */
+
+	has_nx = native_cpuid_edx(0x80000001) & (1 << 20);
+	if (!has_nx)
+		debug_putstr("NX bit is not supported.\n");
+
 	/*
 	 * New page-table is set up - map the kernel image, boot_params and the
 	 * command line. The uncompressed kernel requires boot_params and the
-	 * command line to be mapped in the identity mapping. Map them
-	 * explicitly here in case the compressed kernel does not touch them,
-	 * or does not touch all the pages covering them.
+	 * command line to be mapped in the identity mapping.
+	 * Every other accessed memory region is mapped later, if required.
 	 */
-	kernel_add_identity_map((unsigned long)_head, (unsigned long)_end);
-	boot_params = rmode;
-	kernel_add_identity_map((unsigned long)boot_params, (unsigned long)(boot_params + 1));
+	extern char _head[], _ehead[];
+	kernel_add_identity_map((unsigned long)_head,
+				(unsigned long)_ehead, MAP_EXEC | MAP_NOFLUSH);
+
+	extern char _compressed[], _ecompressed[];
+	kernel_add_identity_map((unsigned long)_compressed,
+				(unsigned long)_ecompressed, MAP_WRITE | MAP_NOFLUSH);
+
+	extern char _text[], _etext[];
+	kernel_add_identity_map((unsigned long)_text,
+				(unsigned long)_etext, MAP_EXEC | MAP_NOFLUSH);
+
+	extern char _rodata[], _erodata[];
+	kernel_add_identity_map((unsigned long)_rodata,
+				(unsigned long)_erodata, MAP_NOFLUSH);
+
+	extern char _data[], _end[];
+	kernel_add_identity_map((unsigned long)_data,
+				(unsigned long)_end, MAP_WRITE | MAP_NOFLUSH);
+
+	kernel_add_identity_map((unsigned long)boot_params,
+				(unsigned long)(boot_params + 1), MAP_WRITE | MAP_NOFLUSH);
+
 	cmdline = get_cmd_line_ptr();
-	kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
+	kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE, MAP_NOFLUSH);
 
 	/*
 	 * Also map the setup_data entries passed via boot_params in case they
@@ -172,7 +225,7 @@ void initialize_identity_maps(void *rmode)
 	while (sd) {
 		unsigned long sd_addr = (unsigned long)sd;
 
-		kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len);
+		kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len, MAP_NOFLUSH);
 		sd = (struct setup_data *)sd->next;
 	}
 
@@ -185,26 +238,11 @@ void initialize_identity_maps(void *rmode)
 static pte_t *split_large_pmd(struct x86_mapping_info *info,
 			      pmd_t *pmdp, unsigned long __address)
 {
-	unsigned long page_flags;
-	unsigned long address;
-	pte_t *pte;
-	pmd_t pmd;
-	int i;
-
-	pte = (pte_t *)info->alloc_pgt_page(info->context);
+	unsigned long address = __address & PMD_MASK;
+	pte_t *pte = ident_split_large_pmd(info, pmdp, address);
 	if (!pte)
 		return NULL;
 
-	address     = __address & PMD_MASK;
-	/* No large page - clear PSE flag */
-	page_flags  = info->page_flag & ~_PAGE_PSE;
-
-	/* Populate the PTEs */
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		set_pte(&pte[i], __pte(address | page_flags));
-		address += PAGE_SIZE;
-	}
-
 	/*
 	 * Ideally we need to clear the large PMD first and do a TLB
 	 * flush before we write the new PMD. But the 2M range of the
@@ -214,7 +252,7 @@ static pte_t *split_large_pmd(struct x86_mapping_info *info,
 	 * also the only user of the page-table, so there is no chance
 	 * of a TLB multihit.
 	 */
-	pmd = __pmd((unsigned long)pte | info->kernpg_flag);
+	pmd_t pmd = __pmd((unsigned long)pte | info->kernpg_flag);
 	set_pmd(pmdp, pmd);
 	/* Flush TLB to establish the new PMD */
 	write_cr3(top_level_pgt);
@@ -377,5 +415,5 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 * Error code is sane - now identity map the 2M region around
 	 * the faulting address.
 	 */
-	kernel_add_identity_map(address, end);
+	kernel_add_identity_map(address, end, MAP_WRITE);
 }
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..49f6cc7a7bde 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -14,10 +14,10 @@
 
 #include "misc.h"
 #include "error.h"
-#include "pgtable.h"
 #include "../string.h"
 #include "../voffset.h"
 #include <asm/bootparam_utils.h>
+#include <asm/shared/pgtable.h>
 
 /*
  * WARNING!!
@@ -277,7 +277,8 @@ static inline void handle_relocations(void *output, unsigned long output_len,
 { }
 #endif
 
-static void parse_elf(void *output)
+static void parse_elf(void *output, unsigned long output_len,
+		      unsigned long virt_addr)
 {
 #ifdef CONFIG_X86_64
 	Elf64_Ehdr ehdr;
@@ -287,6 +288,7 @@ static void parse_elf(void *output)
 	Elf32_Phdr *phdrs, *phdr;
 #endif
 	void *dest;
+	unsigned long addr;
 	int i;
 
 	memcpy(&ehdr, output, sizeof(ehdr));
@@ -323,7 +325,43 @@ static void parse_elf(void *output)
 #endif
 			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
 			break;
-		default: /* Ignore other PT_* */ break;
+		default:
+			/* Ignore other PT_* */
+			break;
+		}
+	}
+
+	handle_relocations(output, output_len, virt_addr);
+
+	for (i = 0; i < ehdr.e_phnum; i++) {
+		phdr = &phdrs[i];
+
+		switch (phdr->p_type) {
+		case PT_LOAD:
+#ifdef CONFIG_RELOCATABLE
+			addr = (unsigned long)output;
+			addr += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+#else
+			addr = phdr->p_paddr;
+#endif
+			/*
+			 * Simultaneously readable and writable segments are
+			 * violating W^X, and should not be present in vmlinux image.
+			 */
+			if ((phdr->p_flags & (PF_X | PF_W)) == (PF_X | PF_W))
+				error("W^X violation for ELF segment");
+
+			unsigned int flags = MAP_PROTECT;
+			if (phdr->p_flags & PF_X)
+				flags |= MAP_EXEC;
+			if (phdr->p_flags & PF_W)
+				flags |= MAP_WRITE;
+
+			kernel_add_identity_map(addr, addr + phdr->p_memsz, flags);
+			break;
+		default:
+			/* Ignore other PT_* */
+			break;
 		}
 	}
 
@@ -434,6 +472,11 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 				needed_size,
 				&virt_addr);
 
+	unsigned long phys_addr = (unsigned long)output;
+	output = (unsigned char *)kernel_add_identity_map(phys_addr,
+							  phys_addr + needed_size,
+							  MAP_ALLOC | MAP_WRITE);
+
 	/* Validate memory location choices. */
 	if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
 		error("Destination physical address inappropriately aligned");
@@ -456,8 +499,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 	debug_putstr("\nDecompressing Linux... ");
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
-	parse_elf(output);
-	handle_relocations(output, output_len, virt_addr);
+	parse_elf(output, output_len, virt_addr);
 	debug_putstr("done.\nBooting the kernel.\n");
 
 	/* Disable exception handling before booting the kernel */
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 4910bf230d7b..699b87b7813a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -161,8 +161,20 @@ static inline int count_immovable_mem_regions(void) { return 0; }
 #ifdef CONFIG_X86_5LEVEL
 extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
 #endif
-extern void kernel_add_identity_map(unsigned long start, unsigned long end);
-
+#ifdef CONFIG_X86_64
+extern unsigned long kernel_add_identity_map(unsigned long start,
+					     unsigned long end,
+					     unsigned int flags);
+#else
+static inline unsigned long kernel_add_identity_map(unsigned long start,
+						    unsigned long end,
+						    unsigned int flags)
+{
+	(void)flags;
+	(void)end;
+	return start;
+}
+#endif
 /* Used by PAGE_KERN* macros: */
 extern pteval_t __default_kernel_pte_mask;
 
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
deleted file mode 100644
index cc9b2529a086..000000000000
--- a/arch/x86/boot/compressed/pgtable.h
+++ /dev/null
@@ -1,20 +0,0 @@
-#ifndef BOOT_COMPRESSED_PAGETABLE_H
-#define BOOT_COMPRESSED_PAGETABLE_H
-
-#define TRAMPOLINE_32BIT_SIZE		(2 * PAGE_SIZE)
-
-#define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
-
-#define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE	0x80
-
-#define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
-
-#ifndef __ASSEMBLER__
-
-extern unsigned long *trampoline_32bit;
-
-extern void trampoline_32bit_src(void *return_ptr);
-
-#endif /* __ASSEMBLER__ */
-#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111b..c7cf5a1059a8 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -2,7 +2,7 @@
 #include "misc.h"
 #include <asm/e820/types.h>
 #include <asm/processor.h>
-#include "pgtable.h"
+#include <asm/shared/pgtable.h>
 #include "../string.h"
 #include "efi.h"
 
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 52f989f6acc2..82a10395cf22 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -13,6 +13,7 @@
 #include "misc.h"
 
 #include <asm/pgtable_types.h>
+#include <asm/shared/pgtable.h>
 #include <asm/sev.h>
 #include <asm/trapnr.h>
 #include <asm/trap_pf.h>
@@ -427,10 +428,11 @@ void sev_prep_identity_maps(unsigned long top_level_pgt)
 		unsigned long cc_info_pa = boot_params->cc_blob_address;
 		struct cc_blob_sev_info *cc_info;
 
-		kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info));
+		kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info), MAP_NOFLUSH);
 
 		cc_info = (struct cc_blob_sev_info *)cc_info_pa;
-		kernel_add_identity_map(cc_info->cpuid_phys, cc_info->cpuid_phys + cc_info->cpuid_len);
+		kernel_add_identity_map(cc_info->cpuid_phys,
+					cc_info->cpuid_phys + cc_info->cpuid_len, MAP_NOFLUSH);
 	}
 
 	sev_verify_cbit(top_level_pgt);
diff --git a/arch/x86/include/asm/shared/pgtable.h b/arch/x86/include/asm/shared/pgtable.h
new file mode 100644
index 000000000000..6527dadf39d6
--- /dev/null
+++ b/arch/x86/include/asm/shared/pgtable.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ASM_SHARED_PAGETABLE_H
+#define ASM_SHARED_PAGETABLE_H
+
+/* Boot time memory mapping flags used in compressed kernel */
+#define MAP_WRITE	0x02 /* Writable memory */
+#define MAP_EXEC	0x04 /* Executable memory */
+#define MAP_ALLOC	0x10 /* Range needs to be allocated */
+#define MAP_PROTECT	0x20 /* Set exact memory attributes for memory range */
+#define MAP_NOFLUSH	0x40 /* Avoid flushing TLB */
+
+#define TRAMPOLINE_32BIT_SIZE		(3 * PAGE_SIZE)
+
+#define TRAMPOLINE_32BIT_PLACEMENT_MAX	(0xA0000)
+
+#define TRAMPOLINE_32BIT_PGTABLE_OFFSET	0
+
+#define TRAMPOLINE_32BIT_CODE_OFFSET	PAGE_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE	0x80
+
+#define TRAMPOLINE_32BIT_STACK_END	TRAMPOLINE_32BIT_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern unsigned long *trampoline_32bit;
+
+extern void trampoline_32bit_src(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* ASM_SHARED_PAGETABLE_H */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/8] x86/boot: Map memory explicitly
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
                   ` (5 preceding siblings ...)
  2022-08-01 16:39 ` [PATCH 6/8] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
@ 2022-08-01 16:39 ` Evgeniy Baskov
  2022-08-01 16:39 ` [PATCH 8/8] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
  2022-08-01 16:48 ` [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Dave Hansen
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

Implicit mappings hide possible memory errors, e.g. allocations for
ACPI tables were not included in boot page table size.

Replace all implicit mappings from page fault handler with
explicit mappings.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 9caf89063e77..633ac56262ee 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -93,6 +93,8 @@ static u8 *scan_mem_for_rsdp(u8 *start, u32 length)
 
 	end = start + length;
 
+	kernel_add_identity_map((unsigned long)start, (unsigned long)end, 0);
+
 	/* Search from given start address for the requested length */
 	for (address = start; address < end; address += ACPI_RSDP_SCAN_STEP) {
 		/*
@@ -128,6 +130,9 @@ static acpi_physical_address bios_get_rsdp_addr(void)
 	unsigned long address;
 	u8 *rsdp;
 
+	kernel_add_identity_map((unsigned long)ACPI_EBDA_PTR_LOCATION,
+				(unsigned long)ACPI_EBDA_PTR_LOCATION + 2, 0);
+
 	/* Get the location of the Extended BIOS Data Area (EBDA) */
 	address = *(u16 *)ACPI_EBDA_PTR_LOCATION;
 	address <<= 4;
@@ -215,6 +220,9 @@ static unsigned long get_acpi_srat_table(void)
 	if (!rsdp)
 		return 0;
 
+	kernel_add_identity_map((unsigned long)rsdp,
+				(unsigned long)(rsdp + 1), 0);
+
 	/* Get ACPI root table from RSDP.*/
 	if (!(cmdline_find_option("acpi", arg, sizeof(arg)) == 4 &&
 	    !strncmp(arg, "rsdt", 4)) &&
@@ -235,6 +243,9 @@ static unsigned long get_acpi_srat_table(void)
 	if (len < sizeof(struct acpi_table_header) + size)
 		return 0;
 
+	kernel_add_identity_map((unsigned long)header,
+				(unsigned long)header + len, 0);
+
 	num_entries = (len - sizeof(struct acpi_table_header)) / size;
 	entry = (u8 *)(root_table + sizeof(struct acpi_table_header));
 
@@ -247,8 +258,16 @@ static unsigned long get_acpi_srat_table(void)
 		if (acpi_table) {
 			header = (struct acpi_table_header *)acpi_table;
 
-			if (ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_SRAT))
+			kernel_add_identity_map(acpi_table,
+						acpi_table + sizeof(*header),
+						0);
+
+			if (ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_SRAT)) {
+				kernel_add_identity_map(acpi_table,
+							acpi_table + header->length,
+							0);
 				return acpi_table;
+			}
 		}
 		entry += size;
 	}
diff --git a/arch/x86/boot/compressed/efi.c b/arch/x86/boot/compressed/efi.c
index 6edd034b0b30..ce70103fbbc0 100644
--- a/arch/x86/boot/compressed/efi.c
+++ b/arch/x86/boot/compressed/efi.c
@@ -57,10 +57,14 @@ enum efi_type efi_get_type(struct boot_params *bp)
  */
 unsigned long efi_get_system_table(struct boot_params *bp)
 {
-	unsigned long sys_tbl_pa;
+	static unsigned long sys_tbl_pa __section(".data");
 	struct efi_info *ei;
+	unsigned long sys_tbl_size;
 	enum efi_type et;
 
+	if (sys_tbl_pa)
+		return sys_tbl_pa;
+
 	/* Get systab from boot params. */
 	ei = &bp->efi_info;
 #ifdef CONFIG_X86_64
@@ -73,6 +77,13 @@ unsigned long efi_get_system_table(struct boot_params *bp)
 		return 0;
 	}
 
+	if (efi_get_type(bp) == EFI_TYPE_64)
+		sys_tbl_size = sizeof(efi_system_table_64_t);
+	else
+		sys_tbl_size = sizeof(efi_system_table_32_t);
+
+	kernel_add_identity_map(sys_tbl_pa, sys_tbl_pa + sys_tbl_size, 0);
+
 	return sys_tbl_pa;
 }
 
@@ -92,6 +103,10 @@ static struct efi_setup_data *get_kexec_setup_data(struct boot_params *bp,
 
 	pa_data = bp->hdr.setup_data;
 	while (pa_data) {
+		unsigned long pa_data_end = pa_data + sizeof(struct setup_data)
+					  + sizeof(struct efi_setup_data);
+		kernel_add_identity_map(pa_data, pa_data_end, 0);
+
 		data = (struct setup_data *)pa_data;
 		if (data->type == SETUP_EFI) {
 			esd = (struct efi_setup_data *)(pa_data + sizeof(struct setup_data));
@@ -160,6 +175,8 @@ int efi_get_conf_table(struct boot_params *bp, unsigned long *cfg_tbl_pa,
 		return -EINVAL;
 	}
 
+	kernel_add_identity_map(*cfg_tbl_pa, *cfg_tbl_pa + *cfg_tbl_len, 0);
+
 	return 0;
 }
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 4a3f223973f4..073c7cfbd785 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -687,6 +687,8 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
 	u32 nr_desc;
 	int i;
 
+	kernel_add_identity_map((unsigned long)e, (unsigned long)(e + 1), 0);
+
 	signature = (char *)&e->efi_loader_signature;
 	if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
 	    strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
@@ -703,6 +705,8 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
 	pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
 #endif
 
+	kernel_add_identity_map(pmap, pmap + e->efi_memmap_size, 0);
+
 	nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
 	for (i = 0; i < nr_desc; i++) {
 		md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 8/8] x86/boot: Remove mapping from page fault handler
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
                   ` (6 preceding siblings ...)
  2022-08-01 16:39 ` [PATCH 7/8] x86/boot: Map memory explicitly Evgeniy Baskov
@ 2022-08-01 16:39 ` Evgeniy Baskov
  2022-08-01 16:48 ` [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Dave Hansen
  8 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Evgeniy Baskov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

After every implicit mapping is removed, this code is no longer needed.

Remove memory mapping from page fault handler to ensure that there are
no hidden invalid memory accesses.

Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 04022c080114..ad24289cc224 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -393,27 +393,21 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
 	unsigned long address = native_read_cr2();
 	unsigned long end;
-	bool ghcb_fault;
+	char *msg;
 
-	ghcb_fault = sev_es_check_ghcb_fault(address);
+	if (sev_es_check_ghcb_fault(address))
+		msg = "Page-fault on GHCB page:";
+	else
+		msg = "Unexpected page-fault:";
 
 	address   &= PMD_MASK;
 	end        = address + PMD_SIZE;
 
 	/*
-	 * Check for unexpected error codes. Unexpected are:
-	 *	- Faults on present pages
-	 *	- User faults
-	 *	- Reserved bits set
-	 */
-	if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
-		do_pf_error("Unexpected page-fault:", error_code, address, regs->ip);
-	else if (ghcb_fault)
-		do_pf_error("Page-fault on GHCB page:", error_code, address, regs->ip);
-
-	/*
-	 * Error code is sane - now identity map the 2M region around
-	 * the faulting address.
+	 * Since all memory allocations are made explicit
+	 * now, every page fault at this stage is an
+	 * error and the error handler is there only
+	 * for debug purposes.
 	 */
-	kernel_add_identity_map(address, end, MAP_WRITE);
+	do_pf_error(msg, error_code, address, regs->ip);
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
                   ` (7 preceding siblings ...)
  2022-08-01 16:39 ` [PATCH 8/8] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
@ 2022-08-01 16:48 ` Dave Hansen
  2022-08-02  0:25   ` Evgeniy Baskov
  8 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2022-08-01 16:48 UTC (permalink / raw)
  To: Evgeniy Baskov, Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Thomas Gleixner, Andy Lutomirski,
	Peter Zijlstra, x86, linux-kernel, Alexey Khoroshilov

On 8/1/22 09:38, Evgeniy Baskov wrote:
> This is the first half of changes aimed to increase security of early
> boot code of compressed kernel for x86_64 by enforcing memory protection
> on page table level.

Could you share a little more background here?  Hardening is good, but
you _can_ have too much of a good thing.

Is this part of the boot cycle becoming a target for attackers in
trusted boot environments?  Do emerging confidential computing
technologies like SEV and TDX cause increased reliance on compressed
kernel security?

In other words, why is *THIS* important versus all the other patches
floating around out there?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-01 16:48 ` [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Dave Hansen
@ 2022-08-02  0:25   ` Evgeniy Baskov
  2022-08-02  2:41     ` Dave Hansen
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-02  0:25 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

On 2022-08-01 19:48, Dave Hansen wrote:
> On 8/1/22 09:38, Evgeniy Baskov wrote:
>> This is the first half of changes aimed to increase security of early
>> boot code of compressed kernel for x86_64 by enforcing memory 
>> protection
>> on page table level.
> 
> Could you share a little more background here?  Hardening is good, but
> you _can_ have too much of a good thing.
> 
> Is this part of the boot cycle becoming a target for attackers in
> trusted boot environments?  Do emerging confidential computing
> technologies like SEV and TDX cause increased reliance on compressed
> kernel security?
> 
> In other words, why is *THIS* important versus all the other patches
> floating around out there?

Now compressed kernel code becomes larger, partially because of adding
SEV and TDX, so it worth adding memory protection here.

First part implements partial memory protection for every way of booting
the kernel, and second adds full W^X implementation specifically for the
UEFI code path. First part also contains prerequisite changes for the
second part like adding explicit memory allocation in extraction code
and adjusting ld-script to produce ELF images suitable for mapping PE
sections on top of them with appropriate access rights.

One of pros for this patch set is that is would reveal invalid memory
accesses by removing implicit memory mapping and reducing access rights
for mapped memory. So it makes further development of the compressed
kernel code less error prone.

Furthermore, memory protection indeed makes it harder to attack kernel
during boot cycle. And unlike TDX and SEV it does not not only aim to
protect kernel from attacks outside of virtualized environments but it
also makes attacking the kernel booting on bare metal harder. If some
code injection vulnerability lives inside compressed kernel code this
will likely make it harder to exploit.

Another thing is that it should not bring any noticeable overhead.
Second part can actually reduce overhead slightly by removing the need
to copy the kernel image around during boot process and extracting the
kernel before exiting EFI boot services.

Second part also makes kernel more spec compliant PE image as a part
of an implementation of memory protection in EFI environment. This will
allow booting the kernel with stricter implementations of PE loaders,
e.g. [1]. And stricter PE loader is really desired, since current EDK II
implementation contains numerous problems [2].

[1] https://github.com/acidanthera/audk/tree/secure_pe
[2] https://arxiv.org/pdf/2012.05471.pdf

Thanks,
Evgeniy Baskov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-02  0:25   ` Evgeniy Baskov
@ 2022-08-02  2:41     ` Dave Hansen
  2022-08-02 23:45       ` Evgeniy Baskov
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2022-08-02  2:41 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov

On 8/1/22 17:25, Evgeniy Baskov wrote:
> On 2022-08-01 19:48, Dave Hansen wrote:
>> On 8/1/22 09:38, Evgeniy Baskov wrote:
>>> This is the first half of changes aimed to increase security of early
>>> boot code of compressed kernel for x86_64 by enforcing memory protection
>>> on page table level.
>>
>> Could you share a little more background here?  Hardening is good, but
>> you _can_ have too much of a good thing.
>>
>> Is this part of the boot cycle becoming a target for attackers in
>> trusted boot environments?  Do emerging confidential computing
>> technologies like SEV and TDX cause increased reliance on compressed
>> kernel security?
>>
>> In other words, why is *THIS* important versus all the other patches
>> floating around out there?
> 
> Now compressed kernel code becomes larger, partially because of adding
> SEV and TDX, so it worth adding memory protection here.
...

Is it fair to say that the problems here are on the potential,
theoretical side rather than driven by practical, known issues that our
users face?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-02  2:41     ` Dave Hansen
@ 2022-08-02 23:45       ` Evgeniy Baskov
  2022-08-03 14:05         ` Dave Hansen
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-02 23:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov, linux-hardening

On 2022-08-02 05:41, Dave Hansen wrote:
> On 8/1/22 17:25, Evgeniy Baskov wrote:
>> On 2022-08-01 19:48, Dave Hansen wrote:
>>> On 8/1/22 09:38, Evgeniy Baskov wrote:
>>>> This is the first half of changes aimed to increase security of 
>>>> early
>>>> boot code of compressed kernel for x86_64 by enforcing memory 
>>>> protection
>>>> on page table level.
>>> 
>>> Could you share a little more background here?  Hardening is good, 
>>> but
>>> you _can_ have too much of a good thing.
>>> 
>>> Is this part of the boot cycle becoming a target for attackers in
>>> trusted boot environments?  Do emerging confidential computing
>>> technologies like SEV and TDX cause increased reliance on compressed
>>> kernel security?
>>> 
>>> In other words, why is *THIS* important versus all the other patches
>>> floating around out there?
>> 
>> Now compressed kernel code becomes larger, partially because of adding
>> SEV and TDX, so it worth adding memory protection here.
> ...
> 
> Is it fair to say that the problems here are on the potential,
> theoretical side rather than driven by practical, known issues that our
> users face?

Partially. We do have known issues because kernel PE image is not 
compliant
with the MS PE and COFF specification v8.3 referenced by the UEFI 
specification.
UEFI implementations with stricter PE loaders (e.g. mentioned above) 
fail to
boot Linux kernel.

As for hardening side, these improvements are indeed just nice-to-haves.
But we believe it is good to have them if they are available for free.

Thanks,
Evgeniy Baskov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-02 23:45       ` Evgeniy Baskov
@ 2022-08-03 14:05         ` Dave Hansen
  2022-08-04 10:41           ` Evgeniy Baskov
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Hansen @ 2022-08-03 14:05 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov, linux-hardening

On 8/2/22 16:45, Evgeniy Baskov wrote:
> Partially. We do have known issues because kernel PE image is not
> compliant with the MS PE and COFF specification v8.3 referenced by
> the UEFI specification. UEFI implementations with stricter PE loaders
> (e.g. mentioned above) fail to boot Linux kernel.

That shows me that it's _possible_ to build a more strict PE loader that
wouldn't load Linux.  But, in practice is anyone using a more strict PE
loader?  Does anyone actually want that in practice?  Or, again, is this
more strict PE loader just an academic demonstration?

The README starts:

	This branch demonstrates...

That doesn't seem like something that's _important_ to deal with.
Sounds like a proof-of-concept.

Don't get me wrong, I'm all for improving thing, even if the benefits
are far off.  But, let's not fool ourselves.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-03 14:05         ` Dave Hansen
@ 2022-08-04 10:41           ` Evgeniy Baskov
  2022-08-04 11:22             ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-04 10:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Alexey Khoroshilov, linux-hardening

On 2022-08-03 17:05, Dave Hansen wrote:
> 
> That shows me that it's _possible_ to build a more strict PE loader 
> that
> wouldn't load Linux.  But, in practice is anyone using a more strict PE
> loader?  Does anyone actually want that in practice?  Or, again, is 
> this
> more strict PE loader just an academic demonstration?
> 
> The README starts:
> 
> 	This branch demonstrates...
> 
> That doesn't seem like something that's _important_ to deal with.
> Sounds like a proof-of-concept.
> 
> Don't get me wrong, I'm all for improving thing, even if the benefits
> are far off.  But, let's not fool ourselves.

We have commercial closed-source UEFI firmware implementation at ISP RAS
that follows the behavior of the secure_pe branch. That firmware is used
as a part of [1].

[1] https://www.ispras.ru/en/technologies/asperitas/

Thanks,
Evgeniy Baskov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-04 10:41           ` Evgeniy Baskov
@ 2022-08-04 11:22             ` Greg KH
  2022-08-04 14:26               ` Evgeniy Baskov
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2022-08-04 11:22 UTC (permalink / raw)
  To: Evgeniy Baskov
  Cc: Dave Hansen, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, x86,
	linux-kernel, Alexey Khoroshilov, linux-hardening

On Thu, Aug 04, 2022 at 01:41:58PM +0300, Evgeniy Baskov wrote:
> On 2022-08-03 17:05, Dave Hansen wrote:
> > 
> > That shows me that it's _possible_ to build a more strict PE loader that
> > wouldn't load Linux.  But, in practice is anyone using a more strict PE
> > loader?  Does anyone actually want that in practice?  Or, again, is this
> > more strict PE loader just an academic demonstration?
> > 
> > The README starts:
> > 
> > 	This branch demonstrates...
> > 
> > That doesn't seem like something that's _important_ to deal with.
> > Sounds like a proof-of-concept.
> > 
> > Don't get me wrong, I'm all for improving thing, even if the benefits
> > are far off.  But, let's not fool ourselves.
> 
> We have commercial closed-source UEFI firmware implementation at ISP RAS
> that follows the behavior of the secure_pe branch. That firmware is used
> as a part of [1].
> 
> [1] https://www.ispras.ru/en/technologies/asperitas/

Are there any plans on getting those changes merged back upstream to the
main UEFI codebase so that others can test this type of functionality
out?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1
  2022-08-04 11:22             ` Greg KH
@ 2022-08-04 14:26               ` Evgeniy Baskov
  0 siblings, 0 replies; 17+ messages in thread
From: Evgeniy Baskov @ 2022-08-04 14:26 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Borislav Petkov, Dave Hansen, Ingo Molnar,
	Thomas Gleixner, Andy Lutomirski, Peter Zijlstra, x86,
	linux-kernel, Alexey Khoroshilov, linux-hardening

On 2022-08-04 14:22, Greg KH wrote:
...
> Are there any plans on getting those changes merged back upstream to 
> the
> main UEFI codebase so that others can test this type of functionality
> out?
> 
> thanks,
> 
> greg k-h

The initial prototype of the changes were published as a part of the
tianocore/edk2-staging[1], and a more up-to-date open source version
was published as a part of the acidanthera/audk. This version is 
currently
being integrated with the EDK II build system, and its malfunctioning
is currently the main technical issue for integrating changes
into the main branch.

It is hard to estimate when the merge with the edk2 mainline happens,
but we are committed to doing this. The amount of changes needed
is quite large, and simply getting approval from all the maintainers
will take time even if they are all willing to get this in.
On the good side, several parties, Microsoft in particular,
were interested in upstreaming this code, so we have moderate
optimism for the future.

In case you are interested in the details, there is also academic
material available, describing the issues and the changes made,
which can help to shed some light on the implementation[3][4].

[1] 
https://github.com/tianocore/edk2-staging/tree/2021-gsoc-secure-loader
[2] https://github.com/acidanthera/audk/tree/secure_pe
[3] https://arxiv.org/pdf/2012.05471.pdf
[4] https://github.com/mhaeuser/ISPRASOpen-SecurePE

Thanks,
Evgeniy Baskov

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-08-04 14:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-01 16:38 [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Evgeniy Baskov
2022-08-01 16:38 ` [PATCH 1/8] x86/boot: Align vmlinuz sections on page size Evgeniy Baskov
2022-08-01 16:38 ` [PATCH 2/8] x86/build: Remove RWX sections and align on 4KB Evgeniy Baskov
2022-08-01 16:39 ` [PATCH 3/8] x86/boot: Set cr0 to known state in trampoline Evgeniy Baskov
2022-08-01 16:39 ` [PATCH 4/8] x86/boot: Increase boot page table size Evgeniy Baskov
2022-08-01 16:39 ` [PATCH 5/8] x86/boot: Support 4KB pages for identity mapping Evgeniy Baskov
2022-08-01 16:39 ` [PATCH 6/8] x86/boot: Setup memory protection for bzImage code Evgeniy Baskov
2022-08-01 16:39 ` [PATCH 7/8] x86/boot: Map memory explicitly Evgeniy Baskov
2022-08-01 16:39 ` [PATCH 8/8] x86/boot: Remove mapping from page fault handler Evgeniy Baskov
2022-08-01 16:48 ` [RFC PATCH 0/8] x86_64: Harden compressed kernel, part 1 Dave Hansen
2022-08-02  0:25   ` Evgeniy Baskov
2022-08-02  2:41     ` Dave Hansen
2022-08-02 23:45       ` Evgeniy Baskov
2022-08-03 14:05         ` Dave Hansen
2022-08-04 10:41           ` Evgeniy Baskov
2022-08-04 11:22             ` Greg KH
2022-08-04 14:26               ` Evgeniy Baskov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).