linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/2] Fix couple of issues with LDT remap for PTI
@ 2018-10-24 12:51 Kirill A. Shutemov
  2018-10-24 12:51 ` [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
  2018-10-24 12:51 ` [PATCHv2 2/2] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
  0 siblings, 2 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2018-10-24 12:51 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

The patchset fixes issues with LDT remap for PTI:

 - Layout collision due to KASLR with 5-level paging;

 - Information leak via Meltdown-like attack;

Please review and consider applying.

v2:
 - Rebase to the Linus' tree
   + fix conflict with new documentation of kernel memory layout
   + fix few mistakes in layout documentation
 - Fix typo in commit message

Kirill A. Shutemov (2):
  x86/mm: Move LDT remap out of KASLR region on 5-level paging
  x86/ldt: Unmap PTEs for the slot before freeing LDT pages

 Documentation/x86/x86_64/mm.txt         | 34 +++++++-------
 arch/x86/include/asm/page_64_types.h    | 12 ++---
 arch/x86/include/asm/pgtable_64_types.h |  4 +-
 arch/x86/kernel/ldt.c                   | 59 ++++++++++++++++---------
 arch/x86/xen/mmu_pv.c                   |  6 +--
 5 files changed, 67 insertions(+), 48 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-24 12:51 [PATCHv2 0/2] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
@ 2018-10-24 12:51 ` Kirill A. Shutemov
  2018-10-24 13:12   ` Matthew Wilcox
  2018-10-25  2:18   ` Baoquan He
  2018-10-24 12:51 ` [PATCHv2 2/2] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov
  1 sibling, 2 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2018-10-24 12:51 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

On 5-level paging LDT remap area is placed in the middle of
KASLR randomization region and it can overlap with direct mapping,
vmalloc or vmap area.

Let's move LDT just before direct mapping which makes it safe for KASLR.
This also allows us to unify layout between 4- and 5-level paging.

We don't touch 4 pgd slot gap just before the direct mapping reserved
for a hypervisor, but move direct mapping by one slot instead.

The LDT mapping is per-mm, so we cannot move it into P4D page table next
to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
---
 Documentation/x86/x86_64/mm.txt         | 34 +++++++++++++------------
 arch/x86/include/asm/page_64_types.h    | 12 +++++----
 arch/x86/include/asm/pgtable_64_types.h |  4 +--
 arch/x86/xen/mmu_pv.c                   |  6 ++---
 4 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 702898633b00..75bff98928a8 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -34,23 +34,24 @@ __________________|____________|__________________|_________|___________________
 ____________________________________________________________|___________________________________________________________
                   |            |                  |         |
  ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
- ffff880000000000 | -120    TB | ffffc7ffffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
- ffffc80000000000 |  -56    TB | ffffc8ffffffffff |    1 TB | ... unused hole
+ ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
+ ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
+ ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
  ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
  ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
  ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
  ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
  ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
- fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
-                  |            |                  |         | vaddr_end for KASLR
- fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
- fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | LDT remap for PTI
- ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 __________________|____________|__________________|_________|____________________________________________________________
                                                             |
-                                                            | Identical layout to the 47-bit one from here on:
+                                                            | Identical layout to the 56-bit one from here on:
 ____________________________________________________________|____________________________________________________________
                   |            |                  |         |
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
  ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
  ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
@@ -83,7 +84,7 @@ Notes:
 __________________|____________|__________________|_________|___________________________________________________________
                   |            |                  |         |
  0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
-                  |            |                  |         |     virtual memory addresses up to the -128 TB
+                  |            |                  |         |     virtual memory addresses up to the -64 PB
                   |            |                  |         |     starting offset of kernel mappings.
 __________________|____________|__________________|_________|___________________________________________________________
                                                             |
@@ -91,23 +92,24 @@ __________________|____________|__________________|_________|___________________
 ____________________________________________________________|___________________________________________________________
                   |            |                  |         |
  ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
- ff10000000000000 |  -60    PB | ff8fffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
- ff90000000000000 |  -28    PB | ff9fffffffffffff |    4 PB | LDT remap for PTI
+ ff10000000000000 |  -60    PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
+ ff11000000000000 |  -59.75 PB | ff90ffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
+ ff91000000000000 |  -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
  ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
  ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
  ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
  ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
  ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
- fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
-                  |            |                  |         | vaddr_end for KASLR
- fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
- fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
- ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
 __________________|____________|__________________|_________|____________________________________________________________
                                                             |
                                                             | Identical layout to the 47-bit one from here on:
 ____________________________________________________________|____________________________________________________________
                   |            |                  |         |
+ fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
+                  |            |                  |         | vaddr_end for KASLR
+ fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
+ fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
+ ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
  ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
  ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
  ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index cd0cf1c568b4..8f657286d599 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -33,12 +33,14 @@
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
- * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
- * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
- * what Xen requires.
+ * PGDIR_SIZE*17 (pgd slot 273).
+ *
+ * The gap is to allow a space for LDT remap for PTI (1 pgd slot) and space for
+ * a hypervisor (16 slots). Choosing 16 slots for a hypervisor is arbitrary,
+ * but it's what Xen requires.
  */
-#define __PAGE_OFFSET_BASE_L5	_AC(0xff10000000000000, UL)
-#define __PAGE_OFFSET_BASE_L4	_AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE_L5	_AC(0xff11000000000000, UL)
+#define __PAGE_OFFSET_BASE_L4	_AC(0xffff888000000000, UL)
 
 #ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
 #define __PAGE_OFFSET           page_offset_base
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 04edd2d58211..84bd9bdc1987 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -111,9 +111,7 @@ extern unsigned int ptrs_per_p4d;
  */
 #define MAXMEM			(1UL << MAX_PHYSMEM_BITS)
 
-#define LDT_PGD_ENTRY_L4	-3UL
-#define LDT_PGD_ENTRY_L5	-112UL
-#define LDT_PGD_ENTRY		(pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4)
+#define LDT_PGD_ENTRY		-240UL
 #define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
 #define LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 70ea598a37d2..7a2a74c2dd30 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1905,7 +1905,7 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	init_top_pgt[0] = __pgd(0);
 
 	/* Pre-constructed entries are in pfn, so convert to mfn */
-	/* L4[272] -> level3_ident_pgt  */
+	/* L4[273] -> level3_ident_pgt  */
 	/* L4[511] -> level3_kernel_pgt */
 	convert_pfn_mfn(init_top_pgt);
 
@@ -1925,8 +1925,8 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 	addr[0] = (unsigned long)pgd;
 	addr[1] = (unsigned long)l3;
 	addr[2] = (unsigned long)l2;
-	/* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
-	 * Both L4[272][0] and L4[511][510] have entries that point to the same
+	/* Graft it onto L4[273][0]. Note that we creating an aliasing problem:
+	 * Both L4[273][0] and L4[511][510] have entries that point to the same
 	 * L2 (PMD) tables. Meaning that if you modify it in __va space
 	 * it will be also modified in the __ka space! (But if you just
 	 * modify the PMD table to point to other PTE's or none, then you
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 2/2] x86/ldt: Unmap PTEs for the slot before freeing LDT pages
  2018-10-24 12:51 [PATCHv2 0/2] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
  2018-10-24 12:51 ` [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
@ 2018-10-24 12:51 ` Kirill A. Shutemov
  1 sibling, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2018-10-24 12:51 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, dave.hansen, luto, peterz
  Cc: boris.ostrovsky, jgross, bhe, willy, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov

modify_ldt(2) leaves old LDT mapped after we switch over to the new one.
Memory for the old LDT gets freed and the pages can be re-used.

Leaving the mapping in place can have security implications. The mapping
is present in userspace copy of page tables and Meltdown-like attack can
read these freed and possibly reused pages.

It's relatively simple to fix: just unmap the old LDT and flush TLB
before freeing LDT memory.

We can now avoid flushing TLB on map_ldt_struct() as the slot is
unmapped and flushed by unmap_ldt_struct() (or never mapped in
the first place). The overhead of the change should be negligible.
It shouldn't be a particularly hot path anyway.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on")
---
 arch/x86/kernel/ldt.c | 59 ++++++++++++++++++++++++++++---------------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index ab18e0884dc6..5dc8ed202fa8 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -199,14 +199,6 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
- *
- * There is no corresponding unmap function.  Even if the LDT is freed, we
- * leave the PTEs around until the slot is reused or the mm is destroyed.
- * This is harmless: the LDT is always in ordinary memory, and no one will
- * access the freed slot.
- *
- * If we wanted to unmap freed LDTs, we'd also need to do a flush to make
- * it useful, and the flush would slow down modify_ldt().
  */
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
@@ -214,8 +206,7 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	unsigned long va;
 	bool is_vmalloc;
 	spinlock_t *ptl;
-	pgd_t *pgd;
-	int i;
+	int i, nr_pages;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return 0;
@@ -229,16 +220,10 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	/* Check if the current mappings are sane */
 	sanity_check_ldt_mapping(mm);
 
-	/*
-	 * Did we already have the top level entry allocated?  We can't
-	 * use pgd_none() for this because it doens't do anything on
-	 * 4-level page table kernels.
-	 */
-	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
-	for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) {
+	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
+	for (i = 0; i < nr_pages; i++) {
 		unsigned long offset = i << PAGE_SHIFT;
 		const void *src = (char *)ldt->entries + offset;
 		unsigned long pfn;
@@ -272,13 +257,39 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	/* Propagate LDT mapping to the user page-table */
 	map_ldt_struct_to_user(mm);
 
-	va = (unsigned long)ldt_slot_va(slot);
-	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, PAGE_SHIFT, false);
-
 	ldt->slot = slot;
 	return 0;
 }
 
+static void
+unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
+{
+	unsigned long va;
+	int i, nr_pages;
+
+	if (!ldt)
+		return;
+
+	/* LDT map/unmap is only required for PTI */
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
+
+	nr_pages = DIV_ROUND_UP(ldt->nr_entries * LDT_ENTRY_SIZE, PAGE_SIZE);
+	for (i = 0; i < nr_pages; i++) {
+		unsigned long offset = i << PAGE_SHIFT;
+		pte_t *ptep;
+		spinlock_t *ptl;
+
+		va = (unsigned long)ldt_slot_va(ldt->slot) + offset;
+		ptep = get_locked_pte(mm, va, &ptl);
+		pte_clear(mm, va, ptep);
+		pte_unmap_unlock(ptep, ptl);
+	}
+
+	va = (unsigned long)ldt_slot_va(ldt->slot);
+	flush_tlb_mm_range(mm, va, va + nr_pages * PAGE_SIZE, 0, false);
+}
+
 #else /* !CONFIG_PAGE_TABLE_ISOLATION */
 
 static int
@@ -286,6 +297,11 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
 	return 0;
 }
+
+static void
+unmap_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt)
+{
+}
 #endif /* CONFIG_PAGE_TABLE_ISOLATION */
 
 static void free_ldt_pgtables(struct mm_struct *mm)
@@ -524,6 +540,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	}
 
 	install_ldt(mm, new_ldt);
+	unmap_ldt_struct(mm, old_ldt);
 	free_ldt_struct(old_ldt);
 	error = 0;
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-24 12:51 ` [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
@ 2018-10-24 13:12   ` Matthew Wilcox
  2018-10-25  2:18   ` Baoquan He
  1 sibling, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2018-10-24 13:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, boris.ostrovsky,
	jgross, bhe, x86, linux-mm, linux-kernel

On Wed, Oct 24, 2018 at 03:51:11PM +0300, Kirill A. Shutemov wrote:
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -34,23 +34,24 @@ __________________|____________|__________________|_________|___________________
>  ____________________________________________________________|___________________________________________________________
>                    |            |                  |         |
>   ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor

Oh good, it's been rewritten for people with 200-column screens.  It's
too painful to review now.

This is how it looks for me, Ingo:

> @@ -34,23 +34,24 @@ __________________|____________|__________________|_______
__|___________________
>  ____________________________________________________________|________________
___________________________________________
>                    |            |                  |         |
>   ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole,
 also reserved for hypervisor

If it were being formatted in rst so we could get a nice html view out
of the conversion, I'd understand.  But I don't see what we get from
this hilariously verbose reformatting.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-24 12:51 ` [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
  2018-10-24 13:12   ` Matthew Wilcox
@ 2018-10-25  2:18   ` Baoquan He
  2018-10-25  7:24     ` Kirill A. Shutemov
  1 sibling, 1 reply; 7+ messages in thread
From: Baoquan He @ 2018-10-25  2:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, hpa, dave.hansen, luto, peterz, boris.ostrovsky,
	jgross, willy, x86, linux-mm, linux-kernel

Hi Kirill,

Thanks for making this patchset. I have small concerns, please see the
inline comments.

On 10/24/18 at 03:51pm, Kirill A. Shutemov wrote:
> On 5-level paging LDT remap area is placed in the middle of
> KASLR randomization region and it can overlap with direct mapping,
> vmalloc or vmap area.
> 
> Let's move LDT just before direct mapping which makes it safe for KASLR.
> This also allows us to unify layout between 4- and 5-level paging.

In crash utility and makedumpfile which are used to analyze system
memory content, PAGE_OFFSET is hardcoded as below in non-KASLR case:

#define PAGE_OFFSET_2_6_27         0xffff880000000000

Seems this time they need add another value for them. For 4-level and
5-level, since 5-level code also exist in stable kernel. Surely this
doesn't matter much.

> 
> We don't touch 4 pgd slot gap just before the direct mapping reserved
> for a hypervisor, but move direct mapping by one slot instead.
> 
> The LDT mapping is per-mm, so we cannot move it into P4D page table next
> to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
> paging.

Here as discussed in private thread, at the first place you also agreed
to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd
mind, there must be some reasons when you implemented and investigated
further to find out. Could you please say more about how it will
complicating PGD table allocation for 5-level paging? Or give an use
case where it will complicate?

Very sorry I am stupid, still don't get what's the point. Really
appreciate it.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-25  2:18   ` Baoquan He
@ 2018-10-25  7:24     ` Kirill A. Shutemov
  2018-10-25  8:11       ` Baoquan He
  0 siblings, 1 reply; 7+ messages in thread
From: Kirill A. Shutemov @ 2018-10-25  7:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: Kirill A. Shutemov, tglx, mingo, bp, hpa, dave.hansen, luto,
	peterz, boris.ostrovsky, jgross, willy, x86, linux-mm,
	linux-kernel

On Thu, Oct 25, 2018 at 10:18:09AM +0800, Baoquan He wrote:
> > We don't touch 4 pgd slot gap just before the direct mapping reserved
> > for a hypervisor, but move direct mapping by one slot instead.
> > 
> > The LDT mapping is per-mm, so we cannot move it into P4D page table next
> > to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
> > paging.
> 
> Here as discussed in private thread, at the first place you also agreed
> to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd
> mind, there must be some reasons when you implemented and investigated
> further to find out. Could you please say more about how it will
> complicating PGD table allocation for 5-level paging? Or give an use
> case where it will complicate?

On 5-level machine all memory starting from CPU_ENTRY_AREA (and part of
KASAN memory) is in the same P4D page table. All this memory is shared
across all processes, we just copy PGD entry -- all proceses point to the
same P4D page table. (I leave out PTI from the picture for simplicity.)

LDT is per-mm. If we would place it next to CPU_ENTRY_AREA we would need
to unshare P4D page table and create a new one on each fork and copy P4D
entries.

It's considerably more complex and would affect processes that never use
modify_ldt() at all.

Other option would be to move LDT remap *to* KASLR region for both paging
modes and make KALSR code aware about it: randomize it as we do for page
offset, vmalloc, vmap. It's probably better long term, but it's more
complex and I wanted to get backportable fix.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging
  2018-10-25  7:24     ` Kirill A. Shutemov
@ 2018-10-25  8:11       ` Baoquan He
  0 siblings, 0 replies; 7+ messages in thread
From: Baoquan He @ 2018-10-25  8:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, tglx, mingo, bp, hpa, dave.hansen, luto,
	peterz, boris.ostrovsky, jgross, willy, x86, linux-mm,
	linux-kernel

On 10/25/18 at 10:24am, Kirill A. Shutemov wrote:
> On Thu, Oct 25, 2018 at 10:18:09AM +0800, Baoquan He wrote:
> > > We don't touch 4 pgd slot gap just before the direct mapping reserved
> > > for a hypervisor, but move direct mapping by one slot instead.
> > > 
> > > The LDT mapping is per-mm, so we cannot move it into P4D page table next
> > > to CPU_ENTRY_AREA without complicating PGD table allocation for 5-level
> > > paging.
> > 
> > Here as discussed in private thread, at the first place you also agreed
> > to put it in p4d entry next to CPU_ENTRY_AREA, but finally you changd
> > mind, there must be some reasons when you implemented and investigated
> > further to find out. Could you please say more about how it will
> > complicating PGD table allocation for 5-level paging? Or give an use
> > case where it will complicate?
> 
> On 5-level machine all memory starting from CPU_ENTRY_AREA (and part of
> KASAN memory) is in the same P4D page table. All this memory is shared
> across all processes, we just copy PGD entry -- all proceses point to the
> same P4D page table. (I leave out PTI from the picture for simplicity.)

Yes, got it, I didn't notice this, thanks a lot.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-25  8:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-24 12:51 [PATCHv2 0/2] Fix couple of issues with LDT remap for PTI Kirill A. Shutemov
2018-10-24 12:51 ` [PATCHv2 1/2] x86/mm: Move LDT remap out of KASLR region on 5-level paging Kirill A. Shutemov
2018-10-24 13:12   ` Matthew Wilcox
2018-10-25  2:18   ` Baoquan He
2018-10-25  7:24     ` Kirill A. Shutemov
2018-10-25  8:11       ` Baoquan He
2018-10-24 12:51 ` [PATCHv2 2/2] x86/ldt: Unmap PTEs for the slot before freeing LDT pages Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).