linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT
@ 2017-12-11 23:40 Andy Lutomirski
  2017-12-12 10:09 ` Ingo Molnar
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-11 23:40 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

This should fix some existing 5-level bugs and get VSYSCALL and LDT
working with PTI.

Changes from v1:
 - vsyscalls actually work.
 - Added the "Warn and fail" patch to prevent the testing goof I had on v1.
 - Lots of cleanups
 
Andy Lutomirski (10):
  x86/espfix/64: Fix espfix double-fault handling on 5-level systems
  x86/pti: Vastly simplify pgd synchronization
  x86/pti/64: Fix ESPFIX64 user mapping
  Revert "x86/mm/pti: Disable native VSYSCALL"
  x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
  x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
  x86/pti: Map the vsyscall page if needed
  x86/mm/64: Improve the memory map documentation
  x86/mm/64: Make a full PGD-entry size hole in the memory map
  x86/pti: Put the LDT in its own PGD if PTI is on

 Documentation/x86/x86_64/mm.txt         |  15 +--
 arch/x86/Kconfig                        |   8 --
 arch/x86/entry/vsyscall/vsyscall_64.c   |  37 +++++++-
 arch/x86/include/asm/mmu_context.h      |  48 +++++++++-
 arch/x86/include/asm/pgtable.h          |   6 +-
 arch/x86/include/asm/pgtable_64.h       |  77 +++++++---------
 arch/x86/include/asm/pgtable_64_types.h |   8 +-
 arch/x86/include/asm/processor.h        |  23 +++--
 arch/x86/include/asm/vsyscall.h         |   1 +
 arch/x86/kernel/espfix_64.c             |  16 ----
 arch/x86/kernel/ldt.c                   | 155 +++++++++++++++++++++++++++++--
 arch/x86/kernel/traps.c                 |   2 +-
 arch/x86/mm/dump_pagetables.c           |  12 +++
 arch/x86/mm/pti.c                       | 157 ++++++++++++++++++++++----------
 init/main.c                             |  11 ++-
 15 files changed, 426 insertions(+), 150 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
@ 2017-12-12 10:09 ` Ingo Molnar
  2017-12-12 15:58   ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems Andy Lutomirski
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2017-12-12 10:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra


* Andy Lutomirski <luto@kernel.org> wrote:

> This should fix some existing 5-level bugs and get VSYSCALL and LDT
> working with PTI.
> 
> Changes from v1:
>  - vsyscalls actually work.
>  - Added the "Warn and fail" patch to prevent the testing goof I had on v1.
>  - Lots of cleanups
>  
> Andy Lutomirski (10):
>   x86/espfix/64: Fix espfix double-fault handling on 5-level systems
>   x86/pti: Vastly simplify pgd synchronization
>   x86/pti/64: Fix ESPFIX64 user mapping
>   Revert "x86/mm/pti: Disable native VSYSCALL"
>   x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
>   x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
>   x86/pti: Map the vsyscall page if needed
>   x86/mm/64: Improve the memory map documentation
>   x86/mm/64: Make a full PGD-entry size hole in the memory map
>   x86/pti: Put the LDT in its own PGD if PTI is on
> 
>  Documentation/x86/x86_64/mm.txt         |  15 +--
>  arch/x86/Kconfig                        |   8 --
>  arch/x86/entry/vsyscall/vsyscall_64.c   |  37 +++++++-
>  arch/x86/include/asm/mmu_context.h      |  48 +++++++++-
>  arch/x86/include/asm/pgtable.h          |   6 +-
>  arch/x86/include/asm/pgtable_64.h       |  77 +++++++---------
>  arch/x86/include/asm/pgtable_64_types.h |   8 +-
>  arch/x86/include/asm/processor.h        |  23 +++--
>  arch/x86/include/asm/vsyscall.h         |   1 +
>  arch/x86/kernel/espfix_64.c             |  16 ----
>  arch/x86/kernel/ldt.c                   | 155 +++++++++++++++++++++++++++++--
>  arch/x86/kernel/traps.c                 |   2 +-
>  arch/x86/mm/dump_pagetables.c           |  12 +++
>  arch/x86/mm/pti.c                       | 157 ++++++++++++++++++++++----------
>  init/main.c                             |  11 ++-
>  15 files changed, 426 insertions(+), 150 deletions(-)

Hm, I only received the 0/10 boilerplate email, not any of the patches - and 
Thomas tells me he too only got the cover letter.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
  2017-12-12 10:09 ` Ingo Molnar
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 17:18   ` Kirill A. Shutemov
  2017-12-15 18:34   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 02/10] x86/pti: Vastly simplify pgd synchronization Andy Lutomirski
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski, stable,
	Kirill A. Shutemov, Dave Hansen

Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems
was wrong, and it resulted in panics due to unhandled double faults.
Use P4D_SHIFT instead, which is correct on 4-level and 5-level
machines.

This fixes a panic when running x86 selftests on 5-level machines.

Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging")
Cc: stable@vger.kernel.org
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 74136fd16f49..c4e5b0a7516f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -360,7 +360,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 	 *
 	 * No need for ist_enter here because we don't use RCU.
 	 */
-	if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
+	if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
 		regs->cs == __KERNEL_CS &&
 		regs->ip == (unsigned long)native_irq_return_iret)
 	{
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 02/10] x86/pti: Vastly simplify pgd synchronization
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
  2017-12-12 10:09 ` Ingo Molnar
  2017-12-12 15:56 ` [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping Andy Lutomirski
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

Back when we would dynamically add mappings to the usermode tables,
we needed to preallocate all the high top-level entries in the
usermode tables.  We don't need this in recent versions of PTI, so
get rid of preallocation.

With preallocation gone, the comments in pti_set_user_pgd() make
even less sense.  Rearrange the function to make it entirely obvious
what it does and does not do.  FWIW, I haven't even tried to wrap my
head around the old logic, since it seemed to be somewhere between
incomprehensible and wrong.

I admit that a bit of the earlier complexity was based on my
suggestions.  Mea culpa.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/pgtable_64.h | 74 +++++++++++++++------------------------
 arch/x86/mm/pti.c                 | 52 ++++++---------------------
 2 files changed, 39 insertions(+), 87 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index f5adf92091c6..be8d086de927 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -195,14 +195,6 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 }
 
 /*
- * Does this PGD allow access from userspace?
- */
-static inline bool pgd_userspace_access(pgd_t pgd)
-{
-	return pgd.pgd & _PAGE_USER;
-}
-
-/*
  * Take a PGD location (pgdp) and a pgd value that needs to be set there.
  * Populates the user and returns the resulting PGD that must be set in
  * the kernel copy of the page tables.
@@ -213,50 +205,42 @@ static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
 	if (!static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI))
 		return pgd;
 
-	if (pgd_userspace_access(pgd)) {
-		if (pgdp_maps_userspace(pgdp)) {
-			/*
-			 * The user page tables get the full PGD,
-			 * accessible from userspace:
-			 */
-			kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd;
-			/*
-			 * For the copy of the pgd that the kernel uses,
-			 * make it unusable to userspace.  This ensures on
-			 * in case that a return to userspace with the
-			 * kernel CR3 value, userspace will crash instead
-			 * of running.
-			 *
-			 * Note: NX might be not available or disabled.
-			 */
-			if (__supported_pte_mask & _PAGE_NX)
-				pgd.pgd |= _PAGE_NX;
-		}
-	} else if (pgd_userspace_access(*pgdp)) {
+	if (pgdp_maps_userspace(pgdp)) {
 		/*
-		 * We are clearing a _PAGE_USER PGD for which we presumably
-		 * populated the user PGD.  We must now clear the user PGD
-		 * entry.
+		 * The user page tables get the full PGD,
+		 * accessible from userspace:
 		 */
-		if (pgdp_maps_userspace(pgdp)) {
-			kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd;
-		} else {
-			/*
-			 * Attempted to clear a _PAGE_USER PGD which is in
-			 * the kernel portion of the address space.  PGDs
-			 * are pre-populated and we never clear them.
-			 */
-			WARN_ON_ONCE(1);
-		}
+		kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd;
+
+		/*
+		 * If this is normal user memory, make it NX in the kernel
+		 * pagetables so that, if we somehow screw up and return to
+		 * usermode with the kernel CR3 loaded, we'll get a page
+		 * fault instead of allowing user code to execute with
+		 * the wrong CR3.
+		 *
+		 * As exceptions, we don't set NX if:
+		 *  - this is EFI or similar, the kernel may execute from it
+		 *  - we don't have NX support
+		 *  - we're clearing the PGD (i.e. pgd.pgd == 0).
+		 */
+		if ((pgd.pgd & _PAGE_USER) && (__supported_pte_mask & _PAGE_NX))
+			pgd.pgd |= _PAGE_NX;
 	} else {
 		/*
-		 * _PAGE_USER was not set in either the PGD being set or
-		 * cleared.  All kernel PGDs should be pre-populated so
-		 * this should never happen after boot.
+		 * Changes to the high (kernel) portion of the kernelmode
+		 * page tables are not automatically propagated to the
+		 * usermode tables.
+		 *
+		 * Users should keep in mind that, unlike the kernelmode
+		 * tables, there is no vmalloc_fault equivalent for the
+		 * usermode tables.  Top-level entries added to init_mm's
+		 * usermode pgd after boot will not be automatically
+		 * propagated to other mms.
 		 */
-		WARN_ON_ONCE(system_state == SYSTEM_RUNNING);
 	}
 #endif
+
 	/* return the copy of the PGD we want the kernel to use: */
 	return pgd;
 }
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index bd5d042adb3c..f48645d2f3fd 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -83,8 +83,16 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 	}
 
 	if (pgd_none(*pgd)) {
-		WARN_ONCE(1, "All user pgds should have been populated\n");
-		return NULL;
+		unsigned long new_p4d_page = __get_free_page(gfp);
+		if (!new_p4d_page)
+			return NULL;
+
+		if (pgd_none(*pgd)) {
+			set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
+			new_p4d_page = 0;
+		}
+		if (new_p4d_page)
+			free_page(new_p4d_page);
 	}
 	BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -193,45 +201,6 @@ static void __init pti_clone_entry_text(void)
 }
 
 /*
- * Ensure that the top level of the user page tables are entirely
- * populated.  This ensures that all processes that get forked have the
- * same entries.  This way, we do not have to ever go set up new entries in
- * older processes.
- *
- * Note: we never free these, so there are no updates to them after this.
- */
-static void __init pti_init_all_pgds(void)
-{
-	pgd_t *pgd;
-	int i;
-
-	pgd = kernel_to_user_pgdp(pgd_offset_k(0UL));
-	for (i = PTRS_PER_PGD / 2; i < PTRS_PER_PGD; i++) {
-		/*
-		 * Each PGD entry moves up PGDIR_SIZE bytes through the
-		 * address space, so get the first virtual address mapped
-		 * by PGD #i:
-		 */
-		unsigned long addr = i * PGDIR_SIZE;
-#if CONFIG_PGTABLE_LEVELS > 4
-		p4d_t *p4d = p4d_alloc_one(&init_mm, addr);
-		if (!p4d) {
-			WARN_ON(1);
-			break;
-		}
-		set_pgd(pgd + i, __pgd(_KERNPG_TABLE | __pa(p4d)));
-#else /* CONFIG_PGTABLE_LEVELS <= 4 */
-		pud_t *pud = pud_alloc_one(&init_mm, addr);
-		if (!pud) {
-			WARN_ON(1);
-			break;
-		}
-		set_pgd(pgd + i, __pgd(_KERNPG_TABLE | __pa(pud)));
-#endif /* CONFIG_PGTABLE_LEVELS */
-	}
-}
-
-/*
  * Initialize kernel page table isolation
  */
 void __init pti_init(void)
@@ -241,7 +210,6 @@ void __init pti_init(void)
 
 	pr_info("enabled\n");
 
-	pti_init_all_pgds();
 	pti_clone_user_shared();
 	pti_clone_entry_text();
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (2 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 02/10] x86/pti: Vastly simplify pgd synchronization Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-13 13:12   ` Kirill A. Shutemov
  2017-12-12 15:56 ` [PATCH PTI v3 04/10] Revert "x86/mm/pti: Disable native VSYSCALL" Andy Lutomirski
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

The ESPFIX64 user mapping belongs in pti.c just like all the other
user mappings.  Move it there and make it work correctly while we're
at it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/espfix_64.c | 16 ----------------
 arch/x86/mm/pti.c           | 42 +++++++++++++++++++++++++++++++++++++-----
 init/main.c                 | 11 +++++++----
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 1c44e72ed1bc..9c4e7ba6870c 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -129,22 +129,6 @@ void __init init_espfix_bsp(void)
 	p4d = p4d_alloc(&init_mm, pgd, ESPFIX_BASE_ADDR);
 	p4d_populate(&init_mm, p4d, espfix_pud_page);
 
-	/*
-	 * Just copy the top-level PGD that is mapping the espfix area to
-	 * ensure it is mapped into the user page tables.
-	 *
-	 * For 5-level paging, the espfix pgd was populated when
-	 * pti_init() pre-populated all the pgd entries.  The above
-	 * p4d_alloc() would never do anything and the p4d_populate() would
-	 * be done to a p4d already mapped in the userspace pgd.
-	 */
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	if (CONFIG_PGTABLE_LEVELS <= 4) {
-		set_pgd(kernel_to_user_pgdp(pgd),
-			__pgd(_KERNPG_TABLE | (p4d_pfn(*p4d) << PAGE_SHIFT)));
-	}
-#endif
-
 	/* Randomize the locations */
 	init_espfix_random();
 
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index f48645d2f3fd..e01c4aa3ec73 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -68,14 +68,12 @@ void __init pti_check_boottime_disable(void)
  * Walk the user copy of the page tables (optionally) trying to allocate
  * page table pages on the way down.
  *
- * Returns a pointer to a PMD on success, or NULL on failure.
+ * Returns a pointer to a P4D on success, or NULL on failure.
  */
-static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
 {
 	pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
 	gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
-	pud_t *pud;
-	p4d_t *p4d;
 
 	if (address < PAGE_OFFSET) {
 		WARN_ONCE(1, "attempt to walk user address\n");
@@ -96,7 +94,21 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 	}
 	BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
-	p4d = p4d_offset(pgd, address);
+	return p4d_offset(pgd, address);
+}
+
+/*
+ * Walk the user copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.
+ *
+ * Returns a pointer to a PMD on success, or NULL on failure.
+ */
+static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+{
+	gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+	p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
+	pud_t *pud;
+
 	BUILD_BUG_ON(p4d_large(*p4d) != 0);
 	if (p4d_none(*p4d)) {
 		unsigned long new_pud_page = __get_free_page(gfp);
@@ -174,6 +186,25 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 	}
 }
 
+static void __init pti_setup_espfix64(void)
+{
+#ifdef CONFIG_X86_ESPFIX64
+	/*
+	 * ESPFIX64 uses a single p4d (i.e. a top-level entry on 4-level
+	 * systems and a next-level entry on 5-level systems.  Share that
+	 * entry between the user and kernel pagetables.
+	 */
+	pgd_t *kernel_pgd;
+	p4d_t *kernel_p4d, *user_p4d;
+
+	pr_err("espfix64 base = %lx\n", ESPFIX_BASE_ADDR);
+	user_p4d = pti_user_pagetable_walk_p4d(ESPFIX_BASE_ADDR);
+	kernel_pgd = pgd_offset_k(ESPFIX_BASE_ADDR);
+	kernel_p4d = p4d_offset(kernel_pgd, ESPFIX_BASE_ADDR);
+	*user_p4d = *kernel_p4d;
+#endif
+}
+
 /*
  * Clone the populated PMDs of the user shared fixmaps into the user space
  * visible page table.
@@ -212,4 +243,5 @@ void __init pti_init(void)
 
 	pti_clone_user_shared();
 	pti_clone_entry_text();
+	pti_setup_espfix64();
 }
diff --git a/init/main.c b/init/main.c
index 64b00b89e9e1..ce18b938c382 100644
--- a/init/main.c
+++ b/init/main.c
@@ -505,6 +505,13 @@ static void __init mm_init(void)
 	pgtable_init();
 	vmalloc_init();
 	ioremap_huge_init();
+
+#ifdef CONFIG_X86_ESPFIX64
+	/* Should be run before the first non-init thread is created */
+	init_espfix_bsp();
+#endif
+
+	/* Should be run after espfix64 is set up. */
 	pti_init();
 }
 
@@ -676,10 +683,6 @@ asmlinkage __visible void __init start_kernel(void)
 	if (efi_enabled(EFI_RUNTIME_SERVICES))
 		efi_enter_virtual_mode();
 #endif
-#ifdef CONFIG_X86_ESPFIX64
-	/* Should be run before the first non-init thread is created */
-	init_espfix_bsp();
-#endif
 	thread_stack_cache_init();
 	cred_init();
 	fork_init();
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 04/10] Revert "x86/mm/pti: Disable native VSYSCALL"
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (3 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 05/10] x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy Andy Lutomirski
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

This reverts commit 6a7b4041b853ecc653e2c1dda5b736ab5fd29357.

With the PGD-propagation logic simplified, there's no need for this.
---
 arch/x86/Kconfig | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 411838058194..babb1e53b0a6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2233,9 +2233,6 @@ choice
 
 	config LEGACY_VSYSCALL_NATIVE
 		bool "Native"
-		# The VSYSCALL page comes from the kernel page tables
-		# and is not available when PAGE_TABLE_ISOLATION is enabled.
-		depends on !PAGE_TABLE_ISOLATION
 		help
 		  Actual executable code is located in the fixed vsyscall
 		  address mapping, implementing time() efficiently. Since
@@ -2253,11 +2250,6 @@ choice
 		  exploits. This configuration is recommended when userspace
 		  still uses the vsyscall area.
 
-		  When PAGE_TABLE_ISOLATION is enabled, the vsyscall area will become
-		  unreadable.  This emulation option still works, but PAGE_TABLE_ISOLATION
-		  will make it harder to do things like trace code using the
-		  emulation.
-
 	config LEGACY_VSYSCALL_NONE
 		bool "None"
 		help
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 05/10] x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (4 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 04/10] Revert "x86/mm/pti: Disable native VSYSCALL" Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 06/10] x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode Andy Lutomirski
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

The kernel is very erratic as to which pagetables have _PAGE_USER
set.  The vsyscall page gets lucky: it seems that all of the
relevant pagetables are among the apparently arbitrary ones that set
_PAGE_USER.  Rather than relying on chance, just explicitly set
_PAGE_USER.

This will let us clean up pagetable setup to stop setting
_PAGE_USER.  The added code can also be reused by pagetable
isolation to manage the _PAGE_USER bit in the usermode tables.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index f279ba2643dc..bc88a0540347 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -329,16 +329,47 @@ int in_gate_area_no_mm(unsigned long addr)
 	return vsyscall_mode != NONE && (addr & PAGE_MASK) == VSYSCALL_ADDR;
 }
 
+/*
+ * The VSYSCALL page is the only user-accessible page in the kernel address
+ * range.  Normally, the kernel page tables can have _PAGE_USER clear, but
+ * the tables covering VSYSCALL_ADDR need _PAGE_USER set if vsyscalls
+ * are enabled.
+ *
+ * Some day we may create a "minimal" vsyscall mode in which we emulate
+ * vsyscalls but leave the page not present.  If so, we skip calling
+ * this.
+ */
+static void __init set_vsyscall_pgtable_user_bits(void)
+{
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd = pgd_offset_k(VSYSCALL_ADDR);
+	pgd->pgd |= _PAGE_USER;
+	p4d = p4d_offset(pgd, VSYSCALL_ADDR);
+#if CONFIG_PGTABLE_LEVELS >= 5
+	p4d->p4d |= _PAGE_USER;
+#endif
+	pud = pud_offset(p4d, VSYSCALL_ADDR);
+	pud->pud |= _PAGE_USER;
+	pmd = pmd_offset(pud, VSYSCALL_ADDR);
+	pmd->pmd |= _PAGE_USER;
+}
+
 void __init map_vsyscall(void)
 {
 	extern char __vsyscall_page;
 	unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page);
 
-	if (vsyscall_mode != NONE)
+	if (vsyscall_mode != NONE) {
 		__set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall,
 			     vsyscall_mode == NATIVE
 			     ? PAGE_KERNEL_VSYSCALL
 			     : PAGE_KERNEL_VVAR);
+		set_vsyscall_pgtable_user_bits();
+	}
 
 	BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
 		     (unsigned long)VSYSCALL_ADDR);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 06/10] x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (5 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 05/10] x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 07/10] x86/pti: Map the vsyscall page if needed Andy Lutomirski
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

If something goes wrong with pagetable setup, vsyscall=native will
accidentally fall back to emulation.  Make it warn and fail so that
we notice.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index bc88a0540347..a06f2ae09ad6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -138,6 +138,10 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 
 	WARN_ON_ONCE(address != regs->ip);
 
+	/* This should be unreachable in NATIVE mode. */
+	if (WARN_ON(vsyscall_mode == NATIVE))
+		return false;
+
 	if (vsyscall_mode == NONE) {
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall attempted with vsyscall=none");
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 07/10] x86/pti: Map the vsyscall page if needed
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (6 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 06/10] x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 08/10] x86/mm/64: Improve the memory map documentation Andy Lutomirski
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski

Make VSYSCALLs work fully in PTI mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/vsyscall/vsyscall_64.c |  6 ++--
 arch/x86/include/asm/pgtable.h        |  6 +++-
 arch/x86/include/asm/pgtable_64.h     |  9 +++--
 arch/x86/include/asm/vsyscall.h       |  1 +
 arch/x86/mm/pti.c                     | 63 +++++++++++++++++++++++++++++++++++
 5 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index a06f2ae09ad6..e4a6fe8354f0 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -343,14 +343,14 @@ int in_gate_area_no_mm(unsigned long addr)
  * vsyscalls but leave the page not present.  If so, we skip calling
  * this.
  */
-static void __init set_vsyscall_pgtable_user_bits(void)
+void __init set_vsyscall_pgtable_user_bits(pgd_t *root)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
 	pud_t *pud;
 	pmd_t *pmd;
 
-	pgd = pgd_offset_k(VSYSCALL_ADDR);
+	pgd = pgd_offset_pgd(root, VSYSCALL_ADDR);
 	pgd->pgd |= _PAGE_USER;
 	p4d = p4d_offset(pgd, VSYSCALL_ADDR);
 #if CONFIG_PGTABLE_LEVELS >= 5
@@ -372,7 +372,7 @@ void __init map_vsyscall(void)
 			     vsyscall_mode == NATIVE
 			     ? PAGE_KERNEL_VSYSCALL
 			     : PAGE_KERNEL_VVAR);
-		set_vsyscall_pgtable_user_bits();
+		set_vsyscall_pgtable_user_bits(swapper_pg_dir);
 	}
 
 	BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 83c0c77e7365..a8a8fc15ca16 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -920,7 +920,11 @@ static inline int pgd_none(pgd_t pgd)
  * pgd_offset() returns a (pgd_t *)
  * pgd_index() is used get the offset into the pgd page's array of pgd_t's;
  */
-#define pgd_offset(mm, address) ((mm)->pgd + pgd_index((address)))
+#define pgd_offset_pgd(pgd, address) (pgd + pgd_index((address)))
+/*
+ * a shortcut to get a pgd_t in a given mm
+ */
+#define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address))
 /*
  * a shortcut which implies the use of the kernel's pgd, instead
  * of a process's
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index be8d086de927..a2fb3f8bc985 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -220,11 +220,14 @@ static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
 		 * the wrong CR3.
 		 *
 		 * As exceptions, we don't set NX if:
-		 *  - this is EFI or similar, the kernel may execute from it
+		 *  - _PAGE_USER is not set.  This could be an executable
+		 *     EFI runtime mapping or something similar, and the kernel
+		 *     may execute from it
 		 *  - we don't have NX support
-		 *  - we're clearing the PGD (i.e. pgd.pgd == 0).
+		 *  - we're clearing the PGD (i.e. the new pgd is not present).
 		 */
-		if ((pgd.pgd & _PAGE_USER) && (__supported_pte_mask & _PAGE_NX))
+		if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) &&
+		    (__supported_pte_mask & _PAGE_NX))
 			pgd.pgd |= _PAGE_NX;
 	} else {
 		/*
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index d9a7c659009c..b986b2ca688a 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -7,6 +7,7 @@
 
 #ifdef CONFIG_X86_VSYSCALL_EMULATION
 extern void map_vsyscall(void);
+extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
 
 /*
  * Called on instruction fetch fault in vsyscall page.
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index e01c4aa3ec73..b984e2311969 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -38,6 +38,7 @@
 
 #include <asm/cpufeature.h>
 #include <asm/hypervisor.h>
+#include <asm/vsyscall.h>
 #include <asm/cmdline.h>
 #include <asm/pti.h>
 #include <asm/pgtable.h>
@@ -145,6 +146,48 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 	return pmd_offset(pud, address);
 }
 
+/*
+ * Walk the shadow copy of the page tables (optionally) trying to allocate
+ * page table pages on the way down.  Does not support large pages.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables.  It is never used for userspace data.
+ *
+ * Returns a pointer to a PTE on success, or NULL on failure.
+ */
+static pte_t *pti_user_pagetable_walk_pte(unsigned long address)
+{
+	gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
+	pmd_t *pmd = pti_user_pagetable_walk_pmd(address);
+	pte_t *pte;
+
+	/* We can't do anything sensible if we hit a large mapping. */
+	if (pmd_large(*pmd)) {
+		WARN_ON(1);
+		return NULL;
+	}
+
+	if (pmd_none(*pmd)) {
+		unsigned long new_pte_page = __get_free_page(gfp);
+		if (!new_pte_page)
+			return NULL;
+
+		if (pmd_none(*pmd)) {
+			set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
+			new_pte_page = 0;
+		}
+		if (new_pte_page)
+			free_page(new_pte_page);
+	}
+
+	pte = pte_offset_kernel(pmd, address);
+	if (pte_flags(*pte) & _PAGE_USER) {
+		WARN_ONCE(1, "attempt to walk to user pte\n");
+		return NULL;
+	}
+	return pte;
+}
+
 static void __init
 pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 {
@@ -205,6 +248,25 @@ static void __init pti_setup_espfix64(void)
 #endif
 }
 
+static void __init pti_setup_vsyscall(void)
+{
+#ifdef CONFIG_X86_VSYSCALL_EMULATION
+	pte_t *pte, *target_pte;
+	unsigned int level;
+
+	pte = lookup_address(VSYSCALL_ADDR, &level);
+	if (!pte || WARN_ON(level != PG_LEVEL_4K))
+		return;
+
+	target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR);
+	if (WARN_ON(!target_pte))
+		return;
+
+	*target_pte = *pte;
+	set_vsyscall_pgtable_user_bits(kernel_to_user_pgdp(swapper_pg_dir));
+#endif
+}
+
 /*
  * Clone the populated PMDs of the user shared fixmaps into the user space
  * visible page table.
@@ -244,4 +306,5 @@ void __init pti_init(void)
 	pti_clone_user_shared();
 	pti_clone_entry_text();
 	pti_setup_espfix64();
+	pti_setup_vsyscall();
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 08/10] x86/mm/64: Improve the memory map documentation
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (7 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 07/10] x86/pti: Map the vsyscall page if needed Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map Andy Lutomirski
  2017-12-12 15:56 ` [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on Andy Lutomirski
  10 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski, Kirill A. Shutemov,
	Dave Hansen

The old docs had the vsyscall range wrong* and were missing the
fixmap.  Fix both.

* There used to be 8 MB reserved for future vsyscalls, but that's
  long gone.

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 Documentation/x86/x86_64/mm.txt | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 2d7d6590ade8..63a41671d25b 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -17,8 +17,9 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
 ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
 ... unused hole ...
 ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
-ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space (variable)
-ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
+ffffffffa0000000 - [fixmap start]   (~1526 MB) module mapping space (variable)
+[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
+ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
 ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
 
 Virtual memory map with 5 level page tables:
@@ -39,8 +40,9 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
 ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
 ... unused hole ...
 ffffffff80000000 - ffffffff9fffffff (=512 MB)  kernel text mapping, from phys 0
-ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space
-ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
+ffffffffa0000000 - [fixmap start]   (~1526 MB) module mapping space
+[fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
+ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
 ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
 
 Architecture defines a 64-bit virtual address. Implementations can support
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (8 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 08/10] x86/mm/64: Improve the memory map documentation Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-13 13:17   ` Kirill A. Shutemov
  2017-12-12 15:56 ` [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on Andy Lutomirski
  10 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski, Kirill A. Shutemov,
	Dave Hansen

This patch shrinks vmalloc space from 16384TiB to 12800TiB to
enlarge the hole starting at 0xff90000000000000 to be a full PGD
entry.

A subsequent patch will use this hole for the pagetable isolation
LDT alias.

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 Documentation/x86/x86_64/mm.txt         | 4 ++--
 arch/x86/include/asm/pgtable_64_types.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 63a41671d25b..6a28aeaccd53 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -28,8 +28,8 @@ Virtual memory map with 5 level page tables:
 hole caused by [56:63] sign extension
 ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
 ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff91ffffffffffff (=49 bits) hole
-ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
+ff90000000000000 - ff9fffffffffffff (=52 bits) hole
+ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
 ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
 ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
 ... unused hole ...
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 6d5f45dcd4a1..6e2f50843c7d 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -78,8 +78,8 @@ typedef struct { pteval_t pte; } pte_t;
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
 #define MAXMEM		_AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
 #ifdef CONFIG_X86_5LEVEL
-#define VMALLOC_SIZE_TB _AC(16384, UL)
-#define __VMALLOC_BASE	_AC(0xff92000000000000, UL)
+#define VMALLOC_SIZE_TB _AC(12800, UL)
+#define __VMALLOC_BASE	_AC(0xffa0000000000000, UL)
 #define __VMEMMAP_BASE	_AC(0xffd4000000000000, UL)
 #else
 #define VMALLOC_SIZE_TB	_AC(32, UL)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on
  2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
                   ` (9 preceding siblings ...)
  2017-12-12 15:56 ` [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map Andy Lutomirski
@ 2017-12-12 15:56 ` Andy Lutomirski
  2017-12-15 22:54   ` Thomas Gleixner
  2017-12-16  0:39   ` Thomas Gleixner
  10 siblings, 2 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:56 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Andy Lutomirski, Kirill A. Shutemov,
	Dave Hansen

With PTI on, we need the LDT to be in the usermode tables somewhere,
and the LDT is per-mm.

tglx had a hack to have a per-cpu LDT and context switch it, but it
was probably insanely slow due to the required TLB flushes.

Instead, take advantage of the fact that we have an address space
hole that gives us a completely unused pgd and make that pgd be
per-mm.  We can put the LDT in it.

This has a down side: the LDT isn't (currently) randomized, and an
attack that can write the LDT is instant root due to call gates
(thanks, AMD, for leaving call gates in AMD64 but designing them
wrong so they're only useful for exploits).  We could mitigate this
by making the LDT read-only or randomizing it, either of which is
strightforward on top of this patch.

This will significantly slow down LDT users, but that shouldn't
matter for important workloads -- the LDT is only used by DOSEMU(2),
Wine, and very old libc implementations.

Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 Documentation/x86/x86_64/mm.txt         |   3 +-
 arch/x86/include/asm/mmu_context.h      |  48 +++++++++-
 arch/x86/include/asm/pgtable_64_types.h |   4 +
 arch/x86/include/asm/processor.h        |  23 +++--
 arch/x86/kernel/ldt.c                   | 155 ++++++++++++++++++++++++++++++--
 arch/x86/mm/dump_pagetables.c           |  12 +++
 6 files changed, 225 insertions(+), 20 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 6a28aeaccd53..0a9925881e37 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
 ... unused hole ...
 ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
 ... unused hole ...
+fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI
 ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
 ... unused hole ...
 ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
@@ -28,7 +29,7 @@ Virtual memory map with 5 level page tables:
 hole caused by [56:63] sign extension
 ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
 ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
-ff90000000000000 - ff9fffffffffffff (=52 bits) hole
+ff90000000000000 - ff9fffffffffffff (=49 bits) LDT remap for PTI
 ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
 ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
 ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 5e1a1ecb65c6..6088f508b623 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -52,13 +52,33 @@ struct ldt_struct {
 	 */
 	struct desc_struct *entries;
 	unsigned int nr_entries;
+
+	/*
+	 * If PTI is in use, then the entries array is not mapped while we're
+	 * in user mode.  The whole array will be aliased at the addressed
+	 * given by ldt_slot_va(slot).  We use two slots so that we can allocate
+	 * and map, and enable a new LDT without invalidating the mapping
+	 * of an older, still-in-use LDT.
+	 *
+	 * slot will be -1 if this LDT doesn't have an alias mapping.
+	 */
+	int slot;
 };
 
+/* This is a multiple of PAGE_SIZE. */
+#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
+
+static void *ldt_slot_va(int slot)
+{
+	return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
+}
+
 /*
  * Used for LDT copy/destruction.
  */
 int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm);
 void destroy_context_ldt(struct mm_struct *mm);
+void ldt_arch_exit_mmap(struct mm_struct *mm);
 #else	/* CONFIG_MODIFY_LDT_SYSCALL */
 static inline int init_new_context_ldt(struct task_struct *tsk,
 				       struct mm_struct *mm)
@@ -90,10 +110,31 @@ static inline void load_mm_ldt(struct mm_struct *mm)
 	 * that we can see.
 	 */
 
-	if (unlikely(ldt))
-		set_ldt(ldt->entries, ldt->nr_entries);
-	else
+	if (unlikely(ldt)) {
+		if (static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI)) {
+			if (WARN_ON_ONCE((unsigned long)ldt->slot > 1)) {
+				/*
+				 * Whoops -- either the new LDT isn't mapped
+				 * (if slot == -1) or is mapped into a bogus
+				 * slot (if slot > 1).
+				 */
+				clear_LDT();
+				return;
+			}
+
+			/*
+			 * If page table isolation is enabled, ldt->entries
+			 * will not be mapped in the userspace pagetables.
+			 * Tell the CPU to access the LDT through the alias
+			 * at ldt_slot_va(ldt->slot).
+			 */
+			set_ldt(ldt_slot_va(ldt->slot), ldt->nr_entries);
+		} else {
+			set_ldt(ldt->entries, ldt->nr_entries);
+		}
+	} else {
 		clear_LDT();
+	}
 #else
 	clear_LDT();
 #endif
@@ -185,6 +226,7 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
 static inline void arch_exit_mmap(struct mm_struct *mm)
 {
 	paravirt_arch_exit_mmap(mm);
+	ldt_arch_exit_mmap(mm);
 }
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 6e2f50843c7d..10ea949df97f 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -81,10 +81,14 @@ typedef struct { pteval_t pte; } pte_t;
 #define VMALLOC_SIZE_TB _AC(12800, UL)
 #define __VMALLOC_BASE	_AC(0xffa0000000000000, UL)
 #define __VMEMMAP_BASE	_AC(0xffd4000000000000, UL)
+#define LDT_PGD_ENTRY _AC(-112, UL)
+#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
 #else
 #define VMALLOC_SIZE_TB	_AC(32, UL)
 #define __VMALLOC_BASE	_AC(0xffffc90000000000, UL)
 #define __VMEMMAP_BASE	_AC(0xffffea0000000000, UL)
+#define LDT_PGD_ENTRY _AC(-3, UL)
+#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
 #endif
 #ifdef CONFIG_RANDOMIZE_MEMORY
 #define VMALLOC_START	vmalloc_base
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9e482d8b0b97..9c18da64daa9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -851,13 +851,22 @@ static inline void spin_lock_prefetch(const void *x)
 
 #else
 /*
- * User space process size. 47bits minus one guard page.  The guard
- * page is necessary on Intel CPUs: if a SYSCALL instruction is at
- * the highest possible canonical userspace address, then that
- * syscall will enter the kernel with a non-canonical return
- * address, and SYSRET will explode dangerously.  We avoid this
- * particular problem by preventing anything from being mapped
- * at the maximum canonical address.
+ * User space process size.  This is the first address outside the user range.
+ * There are a few constraints that determine this:
+ *
+ * On Intel CPUs, if a SYSCALL instruction is at the highest canonical
+ * address, then that syscall will enter the kernel with a
+ * non-canonical return address, and SYSRET will explode dangerously.
+ * We avoid this particular problem by preventing anything executable
+ * from being mapped at the maximum canonical address.
+ *
+ * On AMD CPUs in the Ryzen family, there's a nasty bug in which the
+ * CPUs malfunction if they execute code from the highest canonical page.
+ * They'll speculate right off the end of the canonical space, and
+ * bad things happen.  This is worked around in the same way as the
+ * Intel problem.
+ *
+ * With page table isolation enabled, we map the LDT in ... [stay tuned]
  */
 #define TASK_SIZE_MAX	((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
 
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index ae5615b03def..fe5e024b3a20 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -19,6 +19,7 @@
 #include <linux/uaccess.h>
 
 #include <asm/ldt.h>
+#include <asm/tlb.h>
 #include <asm/desc.h>
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
@@ -46,13 +47,11 @@ static void refresh_ldt_segments(void)
 static void flush_ldt(void *__mm)
 {
 	struct mm_struct *mm = __mm;
-	mm_context_t *pc;
 
 	if (this_cpu_read(cpu_tlbstate.loaded_mm) != mm)
 		return;
 
-	pc = &mm->context;
-	set_ldt(pc->ldt->entries, pc->ldt->nr_entries);
+	load_mm_ldt(mm);
 
 	refresh_ldt_segments();
 }
@@ -89,10 +88,124 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 		return NULL;
 	}
 
+	/* The new LDT isn't aliased for PTI yet. */
+	new_ldt->slot = -1;
+
 	new_ldt->nr_entries = num_entries;
 	return new_ldt;
 }
 
+/*
+ * If PTI is enabled, this maps the LDT into the kernelmode and
+ * usermode tables for the given mm.
+ *
+ * There is no corresponding unmap function.  Even if the LDT is freed, we
+ * leave the PTEs around until the slot is reused or the mm is destroyed.
+ * This is harmless: the LDT is always in ordinary memory, and no one will
+ * access the freed slot.
+ *
+ * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
+ * it useful, and the flush would slow down modify_ldt().
+ */
+static int map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	spinlock_t *ptl;
+	bool is_vmalloc;
+	bool had_top_level_entry;
+	pgd_t *pgd;
+	int i;
+
+	if (!static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI))
+		return 0;
+
+	/*
+	 * Any given ldt_struct should have map_ldt_struct() called at most
+	 * once.
+	 */
+	WARN_ON(ldt->slot != -1);
+
+	/*
+	 * Did we already have the top level entry allocated?  We can't
+	 * use pgd_none() for this because it doens't do anything on
+	 * 4-level page table kernels.
+	 */
+	pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	had_top_level_entry = (pgd->pgd != 0);
+
+	is_vmalloc = is_vmalloc_addr(ldt->entries);
+
+	for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+		unsigned long offset = i << PAGE_SHIFT;
+		unsigned long va = (unsigned long)ldt_slot_va(slot) + offset;
+		const void *src = (char *)ldt->entries + offset;
+		unsigned long pfn = is_vmalloc ? vmalloc_to_pfn(src) :
+			page_to_pfn(virt_to_page(src));
+		pte_t pte, *ptep;
+
+		/*
+		 * Treat the PTI LDT range as a *userspace* range.
+		 * get_locked_pte() will allocate all needed pagetables
+		 * and account for them in this mm.
+		 */
+		ptep = get_locked_pte(mm, va, &ptl);
+		if (!ptep)
+			return -ENOMEM;
+		pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL));
+			      set_pte_at(mm, va, ptep, pte);
+		pte_unmap_unlock(ptep, ptl);
+	}
+
+	if (mm->context.ldt) {
+		/*
+		 * We already had an LDT.  The top-level entry should already
+		 * have been allocated and synchronized with the usermode
+		 * tables.
+		 */
+		WARN_ON(!had_top_level_entry);
+		if (static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI))
+			WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
+	} else {
+		/*
+		 * This is the first time we're mapping an LDT for this process.
+		 * Sync the pgd to the usermode tables.
+		 */
+		WARN_ON(had_top_level_entry);
+		if (static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI)) {
+			WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
+			set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+		}
+	}
+
+	flush_tlb_mm_range(mm,
+			   (unsigned long)ldt_slot_va(slot),
+			   (unsigned long)ldt_slot_va(slot) + LDT_SLOT_STRIDE,
+			   0);
+
+	ldt->slot = slot;
+
+	return 0;
+#else
+	return -EINVAL;
+#endif
+}
+
+static void free_ldt_pgtables(struct mm_struct *mm)
+{
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	struct mmu_gather tlb;
+	unsigned long start = LDT_BASE_ADDR;
+	unsigned long end = start + (1UL << PGDIR_SHIFT);
+
+	if (!static_cpu_has_bug(X86_BUG_CPU_SECURE_MODE_PTI))
+		return;
+
+	tlb_gather_mmu(&tlb, mm, start, end);
+	free_pgd_range(&tlb, start, end, start, end);
+	tlb_finish_mmu(&tlb, start, end);
+#endif
+}
+
 /* After calling this, the LDT is immutable. */
 static void finalize_ldt_struct(struct ldt_struct *ldt)
 {
@@ -134,17 +247,15 @@ int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
 	int retval = 0;
 
 	mutex_init(&mm->context.lock);
+	mm->context.ldt = NULL;
+
 	old_mm = current->mm;
-	if (!old_mm) {
-		mm->context.ldt = NULL;
+	if (!old_mm)
 		return 0;
-	}
 
 	mutex_lock(&old_mm->context.lock);
-	if (!old_mm->context.ldt) {
-		mm->context.ldt = NULL;
+	if (!old_mm->context.ldt)
 		goto out_unlock;
-	}
 
 	new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
 	if (!new_ldt) {
@@ -155,8 +266,17 @@ int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
 	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
 	       new_ldt->nr_entries * LDT_ENTRY_SIZE);
 	finalize_ldt_struct(new_ldt);
+	retval = map_ldt_struct(mm, new_ldt, 0);
+	if (retval)
+		goto out_free;
 
 	mm->context.ldt = new_ldt;
+	goto out_unlock;
+
+out_free:
+	free_ldt_pgtables(mm);
+	free_ldt_struct(new_ldt);
+	return retval;
 
 out_unlock:
 	mutex_unlock(&old_mm->context.lock);
@@ -174,6 +294,11 @@ void destroy_context_ldt(struct mm_struct *mm)
 	mm->context.ldt = NULL;
 }
 
+void ldt_arch_exit_mmap(struct mm_struct *mm)
+{
+	free_ldt_pgtables(mm);
+}
+
 static int read_ldt(void __user *ptr, unsigned long bytecount)
 {
 	struct mm_struct *mm = current->mm;
@@ -286,6 +411,18 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	new_ldt->entries[ldt_info.entry_number] = ldt;
 	finalize_ldt_struct(new_ldt);
 
+	/*
+	 * If we are using PTI, map the new LDT into the userspace pagetables.
+	 * If there is already an LDT, use the other slot so that other CPUs
+	 * will continue to use the old LDT until install_ldt() switches
+	 * them over to the new LDT.
+	 */
+	error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0);
+	if (error) {
+		free_ldt_struct(old_ldt);
+		goto out_unlock;
+	}
+
 	install_ldt(mm, new_ldt);
 	free_ldt_struct(old_ldt);
 	error = 0;
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index e5a2df886130..878b3fdc00a5 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -50,12 +50,18 @@ enum address_markers_idx {
 #ifdef CONFIG_X86_64
 	KERNEL_SPACE_NR,
 	LOW_KERNEL_NR,
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && defined(CONFIG_X86_5LEVEL)
+	LDT_NR,
+#endif
 	VMALLOC_START_NR,
 	VMEMMAP_START_NR,
 #ifdef CONFIG_KASAN
 	KASAN_SHADOW_START_NR,
 	KASAN_SHADOW_END_NR,
 #endif
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL)
+	LDT_NR,
+#endif
 # ifdef CONFIG_X86_ESPFIX64
 	ESPFIX_START_NR,
 # endif
@@ -79,12 +85,18 @@ static struct addr_marker address_markers[] = {
 #ifdef CONFIG_X86_64
 	{ 0x8000000000000000UL, "Kernel Space" },
 	{ 0/* PAGE_OFFSET */,   "Low Kernel Mapping" },
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && defined(CONFIG_X86_5LEVEL)
+	{ LDT_BASE_ADDR,	"LDT remap" },
+#endif
 	{ 0/* VMALLOC_START */, "vmalloc() Area" },
 	{ 0/* VMEMMAP_START */, "Vmemmap" },
 #ifdef CONFIG_KASAN
 	{ KASAN_SHADOW_START,	"KASAN shadow" },
 	{ KASAN_SHADOW_END,	"KASAN shadow end" },
 #endif
+#if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL)
+	{ LDT_BASE_ADDR,	"LDT remap" },
+#endif
 # ifdef CONFIG_X86_ESPFIX64
 	{ ESPFIX_BASE_ADDR,	"ESPfix Area", 16 },
 # endif
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT
  2017-12-12 10:09 ` Ingo Molnar
@ 2017-12-12 15:58   ` Andy Lutomirski
  2017-12-12 16:13     ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-12 15:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Borislav Petkov,
	Brian Gerst, David Laight, Kees Cook, Peter Zijlstra

On Tue, Dec 12, 2017 at 2:09 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@kernel.org> wrote:
>
>> This should fix some existing 5-level bugs and get VSYSCALL and LDT
>> working with PTI.
>>
>> Changes from v1:
>>  - vsyscalls actually work.
>>  - Added the "Warn and fail" patch to prevent the testing goof I had on v1.
>>  - Lots of cleanups
>>
>> Andy Lutomirski (10):
>>   x86/espfix/64: Fix espfix double-fault handling on 5-level systems
>>   x86/pti: Vastly simplify pgd synchronization
>>   x86/pti/64: Fix ESPFIX64 user mapping
>>   Revert "x86/mm/pti: Disable native VSYSCALL"
>>   x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
>>   x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
>>   x86/pti: Map the vsyscall page if needed
>>   x86/mm/64: Improve the memory map documentation
>>   x86/mm/64: Make a full PGD-entry size hole in the memory map
>>   x86/pti: Put the LDT in its own PGD if PTI is on
>>
>>  Documentation/x86/x86_64/mm.txt         |  15 +--
>>  arch/x86/Kconfig                        |   8 --
>>  arch/x86/entry/vsyscall/vsyscall_64.c   |  37 +++++++-
>>  arch/x86/include/asm/mmu_context.h      |  48 +++++++++-
>>  arch/x86/include/asm/pgtable.h          |   6 +-
>>  arch/x86/include/asm/pgtable_64.h       |  77 +++++++---------
>>  arch/x86/include/asm/pgtable_64_types.h |   8 +-
>>  arch/x86/include/asm/processor.h        |  23 +++--
>>  arch/x86/include/asm/vsyscall.h         |   1 +
>>  arch/x86/kernel/espfix_64.c             |  16 ----
>>  arch/x86/kernel/ldt.c                   | 155 +++++++++++++++++++++++++++++--
>>  arch/x86/kernel/traps.c                 |   2 +-
>>  arch/x86/mm/dump_pagetables.c           |  12 +++
>>  arch/x86/mm/pti.c                       | 157 ++++++++++++++++++++++----------
>>  init/main.c                             |  11 ++-
>>  15 files changed, 426 insertions(+), 150 deletions(-)
>
> Hm, I only received the 0/10 boilerplate email, not any of the patches - and
> Thomas tells me he too only got the cover letter.

Ugh, the whole send-email process exploded due to Cc:
stable@vger.kernel.org # 4.14.  I stripped the # 4.14 and it's okay
now.

>
> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT
  2017-12-12 15:58   ` Andy Lutomirski
@ 2017-12-12 16:13     ` Borislav Petkov
  2017-12-12 16:14       ` Juergen Gross
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2017-12-12 16:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, X86 ML, linux-kernel, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra

On Tue, Dec 12, 2017 at 07:58:07AM -0800, Andy Lutomirski wrote:
> Ugh, the whole send-email process exploded due to Cc:
> stable@vger.kernel.org # 4.14.  I stripped the # 4.14 and it's okay
> now.

In the future, strip the whole CC: stable directly. Although git
send-email with --suppress-cc is kinda clumsy for that... or I haven't
figured it out properly.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT
  2017-12-12 16:13     ` Borislav Petkov
@ 2017-12-12 16:14       ` Juergen Gross
  2017-12-12 16:20         ` Borislav Petkov
  0 siblings, 1 reply; 27+ messages in thread
From: Juergen Gross @ 2017-12-12 16:14 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski
  Cc: Ingo Molnar, X86 ML, linux-kernel, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra

On 12/12/17 17:13, Borislav Petkov wrote:
> On Tue, Dec 12, 2017 at 07:58:07AM -0800, Andy Lutomirski wrote:
>> Ugh, the whole send-email process exploded due to Cc:
>> stable@vger.kernel.org # 4.14.  I stripped the # 4.14 and it's okay
>> now.
> 
> In the future, strip the whole CC: stable directly. Although git
> send-email with --suppress-cc is kinda clumsy for that... or I haven't
> figured it out properly.
> 

Using

Cc: <stable@vger.kernel.org> # 4.14

works for me.


Juergen

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT
  2017-12-12 16:14       ` Juergen Gross
@ 2017-12-12 16:20         ` Borislav Petkov
  0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2017-12-12 16:20 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andy Lutomirski, Ingo Molnar, X86 ML, linux-kernel, Brian Gerst,
	David Laight, Kees Cook, Peter Zijlstra

On Tue, Dec 12, 2017 at 05:14:58PM +0100, Juergen Gross wrote:
> Using
> 
> Cc: <stable@vger.kernel.org> # 4.14
> 
> works for me.

... except that you should not CC: stable on submission.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems
  2017-12-12 15:56 ` [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems Andy Lutomirski
@ 2017-12-12 17:18   ` Kirill A. Shutemov
  2017-12-15 18:34   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 27+ messages in thread
From: Kirill A. Shutemov @ 2017-12-12 17:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, stable, Dave Hansen

On Tue, Dec 12, 2017 at 07:56:36AM -0800, Andy Lutomirski wrote:
> Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems
> was wrong, and it resulted in panics due to unhandled double faults.
> Use P4D_SHIFT instead, which is correct on 4-level and 5-level
> machines.
> 
> This fixes a panic when running x86 selftests on 5-level machines.
> 
> Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging")
> Cc: stable@vger.kernel.org
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Thanks for catching this.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I remember that I tested espfix before and it was fine. That's strange.

With this patch on top of tip/WIP.x86/pti plus the change I've mentioned
before, sigreturn_64 doesn't crash qemu for me anymore.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping
  2017-12-12 15:56 ` [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping Andy Lutomirski
@ 2017-12-13 13:12   ` Kirill A. Shutemov
  2017-12-13 17:01     ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Kirill A. Shutemov @ 2017-12-13 13:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra

On Tue, Dec 12, 2017 at 07:56:38AM -0800, Andy Lutomirski wrote:
> The ESPFIX64 user mapping belongs in pti.c just like all the other
> user mappings.  Move it there and make it work correctly while we're
> at it.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

BTW, why do we open-code p?d_alloc() in pti_user_pagetable_walk_*()?
It seems unnecessary and potentially bogus: see smp_wmb() in __p?d_alloc()
helpers.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map
  2017-12-12 15:56 ` [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map Andy Lutomirski
@ 2017-12-13 13:17   ` Kirill A. Shutemov
  2017-12-13 17:04     ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Kirill A. Shutemov @ 2017-12-13 13:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Dave Hansen

On Tue, Dec 12, 2017 at 07:56:44AM -0800, Andy Lutomirski wrote:
> This patch shrinks vmalloc space from 16384TiB to 12800TiB to
> enlarge the hole starting at 0xff90000000000000 to be a full PGD
> entry.
> 
> A subsequent patch will use this hole for the pagetable isolation
> LDT alias.
> 
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  Documentation/x86/x86_64/mm.txt         | 4 ++--
>  arch/x86/include/asm/pgtable_64_types.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> index 63a41671d25b..6a28aeaccd53 100644
> --- a/Documentation/x86/x86_64/mm.txt
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -28,8 +28,8 @@ Virtual memory map with 5 level page tables:
>  hole caused by [56:63] sign extension
>  ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
>  ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
> -ff90000000000000 - ff91ffffffffffff (=49 bits) hole
> -ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
> +ff90000000000000 - ff9fffffffffffff (=52 bits) hole
> +ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space

It's not 54 bits anymore.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping
  2017-12-13 13:12   ` Kirill A. Shutemov
@ 2017-12-13 17:01     ` Andy Lutomirski
  2017-12-14 14:10       ` Kirill A. Shutemov
  0 siblings, 1 reply; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-13 17:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Borislav Petkov,
	Brian Gerst, David Laight, Kees Cook, Peter Zijlstra

On Wed, Dec 13, 2017 at 5:12 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Tue, Dec 12, 2017 at 07:56:38AM -0800, Andy Lutomirski wrote:
>> The ESPFIX64 user mapping belongs in pti.c just like all the other
>> user mappings.  Move it there and make it work correctly while we're
>> at it.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> BTW, why do we open-code p?d_alloc() in pti_user_pagetable_walk_*()?
> It seems unnecessary and potentially bogus: see smp_wmb() in __p?d_alloc()
> helpers.

The helpers won't work -- we're allocating kernel-owned tables in the
usermode part of init_mm.  The p?d_alloc() helpers allocate
user-accounted tables in the kernelmode part of the mm.

--Andy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map
  2017-12-13 13:17   ` Kirill A. Shutemov
@ 2017-12-13 17:04     ` Andy Lutomirski
  0 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-13 17:04 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Borislav Petkov,
	Brian Gerst, David Laight, Kees Cook, Peter Zijlstra,
	Dave Hansen

Whoops.  I'll change that to "(=12800 TB)".

On Wed, Dec 13, 2017 at 5:17 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Tue, Dec 12, 2017 at 07:56:44AM -0800, Andy Lutomirski wrote:
>> This patch shrinks vmalloc space from 16384TiB to 12800TiB to
>> enlarge the hole starting at 0xff90000000000000 to be a full PGD
>> entry.
>>
>> A subsequent patch will use this hole for the pagetable isolation
>> LDT alias.
>>
>> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  Documentation/x86/x86_64/mm.txt         | 4 ++--
>>  arch/x86/include/asm/pgtable_64_types.h | 4 ++--
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
>> index 63a41671d25b..6a28aeaccd53 100644
>> --- a/Documentation/x86/x86_64/mm.txt
>> +++ b/Documentation/x86/x86_64/mm.txt
>> @@ -28,8 +28,8 @@ Virtual memory map with 5 level page tables:
>>  hole caused by [56:63] sign extension
>>  ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
>>  ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
>> -ff90000000000000 - ff91ffffffffffff (=49 bits) hole
>> -ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
>> +ff90000000000000 - ff9fffffffffffff (=52 bits) hole
>> +ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
>
> It's not 54 bits anymore.
>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping
  2017-12-13 17:01     ` Andy Lutomirski
@ 2017-12-14 14:10       ` Kirill A. Shutemov
  2017-12-14 16:18         ` Andy Lutomirski
  0 siblings, 1 reply; 27+ messages in thread
From: Kirill A. Shutemov @ 2017-12-14 14:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra

On Wed, Dec 13, 2017 at 09:01:50AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 13, 2017 at 5:12 AM, Kirill A. Shutemov
> <kirill@shutemov.name> wrote:
> > On Tue, Dec 12, 2017 at 07:56:38AM -0800, Andy Lutomirski wrote:
> >> The ESPFIX64 user mapping belongs in pti.c just like all the other
> >> user mappings.  Move it there and make it work correctly while we're
> >> at it.
> >>
> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> >
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >
> > BTW, why do we open-code p?d_alloc() in pti_user_pagetable_walk_*()?
> > It seems unnecessary and potentially bogus: see smp_wmb() in __p?d_alloc()
> > helpers.
> 
> The helpers won't work -- we're allocating kernel-owned tables in the
> usermode part of init_mm.  The p?d_alloc() helpers allocate
> user-accounted tables in the kernelmode part of the mm.

What's wrong to account them against init_mm?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping
  2017-12-14 14:10       ` Kirill A. Shutemov
@ 2017-12-14 16:18         ` Andy Lutomirski
  0 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-14 16:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Borislav Petkov,
	Brian Gerst, David Laight, Kees Cook, Peter Zijlstra

On Thu, Dec 14, 2017 at 6:10 AM, Kirill A. Shutemov
<kirill@shutemov.name> wrote:
> On Wed, Dec 13, 2017 at 09:01:50AM -0800, Andy Lutomirski wrote:
>> On Wed, Dec 13, 2017 at 5:12 AM, Kirill A. Shutemov
>> <kirill@shutemov.name> wrote:
>> > On Tue, Dec 12, 2017 at 07:56:38AM -0800, Andy Lutomirski wrote:
>> >> The ESPFIX64 user mapping belongs in pti.c just like all the other
>> >> user mappings.  Move it there and make it work correctly while we're
>> >> at it.
>> >>
>> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> >
>> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >
>> > BTW, why do we open-code p?d_alloc() in pti_user_pagetable_walk_*()?
>> > It seems unnecessary and potentially bogus: see smp_wmb() in __p?d_alloc()
>> > helpers.
>>
>> The helpers won't work -- we're allocating kernel-owned tables in the
>> usermode part of init_mm.  The p?d_alloc() helpers allocate
>> user-accounted tables in the kernelmode part of the mm.
>
> What's wrong to account them against init_mm?
>

Nothing.  But the other parts are showstoppers: the helpers won't set
_PAGE_USER in the higher level entries and they will put the pgds in
the wrong place.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/urgent] x86/espfix/64: Fix espfix double-fault handling on 5-level systems
  2017-12-12 15:56 ` [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems Andy Lutomirski
  2017-12-12 17:18   ` Kirill A. Shutemov
@ 2017-12-15 18:34   ` tip-bot for Andy Lutomirski
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot for Andy Lutomirski @ 2017-12-15 18:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, David.Laight, tglx, mingo, kirill, bp,
	kirill.shutemov, dave.hansen, keescook, hpa, luto, peterz,
	brgerst

Commit-ID:  c739f930be1dd5fd949030e3475a884fe06dae9b
Gitweb:     https://git.kernel.org/tip/c739f930be1dd5fd949030e3475a884fe06dae9b
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 12 Dec 2017 07:56:36 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 15 Dec 2017 16:58:53 +0100

x86/espfix/64: Fix espfix double-fault handling on 5-level systems

Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems
was wrong, and it resulted in panics due to unhandled double faults.
Use P4D_SHIFT instead, which is correct on 4-level and 5-level
machines.

This fixes a panic when running x86 selftests on 5-level machines.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging")
Link: http://lkml.kernel.org/r/24c898b4f44fdf8c22d93703850fb384ef87cfdc.1513035461.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b7b0f74..c751518 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -355,7 +355,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
 	 *
 	 * No need for ist_enter here because we don't use RCU.
 	 */
-	if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY &&
+	if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
 		regs->cs == __KERNEL_CS &&
 		regs->ip == (unsigned long)native_irq_return_iret)
 	{

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on
  2017-12-12 15:56 ` [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on Andy Lutomirski
@ 2017-12-15 22:54   ` Thomas Gleixner
  2017-12-16  0:39   ` Thomas Gleixner
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2017-12-15 22:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Kirill A. Shutemov, Dave Hansen

On Tue, 12 Dec 2017, Andy Lutomirski wrote:
> +/* This is a multiple of PAGE_SIZE. */
> +#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
> +
> +static void *ldt_slot_va(int slot)

How is that supposed to compile w/o warnings? Want's to be inline ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on
  2017-12-12 15:56 ` [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on Andy Lutomirski
  2017-12-15 22:54   ` Thomas Gleixner
@ 2017-12-16  0:39   ` Thomas Gleixner
  2017-12-16  6:41     ` Andy Lutomirski
  1 sibling, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2017-12-16  0:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, linux-kernel, Borislav Petkov, Brian Gerst, David Laight,
	Kees Cook, Peter Zijlstra, Kirill A. Shutemov, Dave Hansen

On Tue, 12 Dec 2017, Andy Lutomirski wrote:
> +
> +	return 0;
> +#else
> +	return -EINVAL;

Errm. What's the point of that? Breaking non PTI?

>  	new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
>  	if (!new_ldt) {
> @@ -155,8 +266,17 @@ int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
>  	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
>  	       new_ldt->nr_entries * LDT_ENTRY_SIZE);
>  	finalize_ldt_struct(new_ldt);
> +	retval = map_ldt_struct(mm, new_ldt, 0);
> +	if (retval)
> +		goto out_free;
>  
>  	mm->context.ldt = new_ldt;
> +	goto out_unlock;
> +
> +out_free:
> +	free_ldt_pgtables(mm);
> +	free_ldt_struct(new_ldt);
> +	return retval;

Leaks old_mm->context_lock;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on
  2017-12-16  0:39   ` Thomas Gleixner
@ 2017-12-16  6:41     ` Andy Lutomirski
  0 siblings, 0 replies; 27+ messages in thread
From: Andy Lutomirski @ 2017-12-16  6:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, X86 ML, linux-kernel, Borislav Petkov,
	Brian Gerst, David Laight, Kees Cook, Peter Zijlstra,
	Kirill A. Shutemov, Dave Hansen

On Fri, Dec 15, 2017 at 2:54 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 12 Dec 2017, Andy Lutomirski wrote:
>> +/* This is a multiple of PAGE_SIZE. */
>> +#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
>> +
>> +static void *ldt_slot_va(int slot)
>
> How is that supposed to compile w/o warnings? Want's to be inline ....

Good question.  But it does compile for me and apparently for the 0day
bot without warnings.  Fixed in my tree.

On Fri, Dec 15, 2017 at 4:39 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 12 Dec 2017, Andy Lutomirski wrote:
>> +
>> +     return 0;
>> +#else
>> +     return -EINVAL;
>
> Errm. What's the point of that? Breaking non PTI?

That's embarrassing.

>
>>       new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
>>       if (!new_ldt) {
>> @@ -155,8 +266,17 @@ int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm)
>>       memcpy(new_ldt->entries, old_mm->context.ldt->entries,
>>              new_ldt->nr_entries * LDT_ENTRY_SIZE);
>>       finalize_ldt_struct(new_ldt);
>> +     retval = map_ldt_struct(mm, new_ldt, 0);
>> +     if (retval)
>> +             goto out_free;
>>
>>       mm->context.ldt = new_ldt;
>> +     goto out_unlock;
>> +
>> +out_free:
>> +     free_ldt_pgtables(mm);
>> +     free_ldt_struct(new_ldt);
>> +     return retval;
>
> Leaks old_mm->context_lock;
>

Indeed.

I'm going to go test the failure paths better tomorrow.  Meanwhile,
all three issues should be fixed in my tree.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-12-16  6:41 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-11 23:40 [PATCH PTI v3 00/10] Clean up pgd handling and fix VSYSCALL and LDT Andy Lutomirski
2017-12-12 10:09 ` Ingo Molnar
2017-12-12 15:58   ` Andy Lutomirski
2017-12-12 16:13     ` Borislav Petkov
2017-12-12 16:14       ` Juergen Gross
2017-12-12 16:20         ` Borislav Petkov
2017-12-12 15:56 ` [PATCH PTI v3 01/10] x86/espfix/64: Fix espfix double-fault handling on 5-level systems Andy Lutomirski
2017-12-12 17:18   ` Kirill A. Shutemov
2017-12-15 18:34   ` [tip:x86/urgent] " tip-bot for Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 02/10] x86/pti: Vastly simplify pgd synchronization Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 03/10] x86/pti/64: Fix ESPFIX64 user mapping Andy Lutomirski
2017-12-13 13:12   ` Kirill A. Shutemov
2017-12-13 17:01     ` Andy Lutomirski
2017-12-14 14:10       ` Kirill A. Shutemov
2017-12-14 16:18         ` Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 04/10] Revert "x86/mm/pti: Disable native VSYSCALL" Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 05/10] x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 06/10] x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 07/10] x86/pti: Map the vsyscall page if needed Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 08/10] x86/mm/64: Improve the memory map documentation Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 09/10] x86/mm/64: Make a full PGD-entry size hole in the memory map Andy Lutomirski
2017-12-13 13:17   ` Kirill A. Shutemov
2017-12-13 17:04     ` Andy Lutomirski
2017-12-12 15:56 ` [PATCH PTI v3 10/10] x86/pti: Put the LDT in its own PGD if PTI is on Andy Lutomirski
2017-12-15 22:54   ` Thomas Gleixner
2017-12-16  0:39   ` Thomas Gleixner
2017-12-16  6:41     ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).