linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Last bits for initial 5-level paging enabling
@ 2017-06-22 12:26 Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 1/5] x86: Enable 5-level paging support Kirill A. Shutemov
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-22 12:26 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

As Ingo requested I've split and updated last two patches for my previous
patchset.

Please review and consider applying.

Kirill A. Shutemov (5):
  x86: Enable 5-level paging support
  x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
  x86/mpx: Do not allow MPX if we have mappings above 47-bit
  x86/mm: Prepare to expose larger address space to userspace
  x86/mm: Allow userspace have mapping above 47-bit

 Documentation/x86/x86_64/5level-paging.txt | 64 ++++++++++++++++++++++++++++++
 arch/x86/Kconfig                           | 18 +++++++++
 arch/x86/include/asm/elf.h                 |  6 +--
 arch/x86/include/asm/mpx.h                 |  9 +++++
 arch/x86/include/asm/processor.h           | 12 ++++--
 arch/x86/kernel/sys_x86_64.c               | 30 ++++++++++++--
 arch/x86/mm/hugetlbpage.c                  | 27 +++++++++++--
 arch/x86/mm/mmap.c                         | 12 +++---
 arch/x86/mm/mpx.c                          | 33 ++++++++++++++-
 arch/x86/xen/Kconfig                       |  3 ++
 10 files changed, 193 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/x86/x86_64/5level-paging.txt

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] x86: Enable 5-level paging support
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
@ 2017-06-22 12:26 ` Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 2/5] x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit Kirill A. Shutemov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-22 12:26 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

Most of things are in place and we can enable support of 5-level paging.

The patch makes XEN_PV dependent on !X86_5LEVEL. XEN_PV is not ready to
work with 5-level paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 Documentation/x86/x86_64/5level-paging.txt | 64 ++++++++++++++++++++++++++++++
 arch/x86/Kconfig                           | 18 +++++++++
 arch/x86/xen/Kconfig                       |  3 ++
 3 files changed, 85 insertions(+)
 create mode 100644 Documentation/x86/x86_64/5level-paging.txt

diff --git a/Documentation/x86/x86_64/5level-paging.txt b/Documentation/x86/x86_64/5level-paging.txt
new file mode 100644
index 000000000000..087251a0d99c
--- /dev/null
+++ b/Documentation/x86/x86_64/5level-paging.txt
@@ -0,0 +1,64 @@
+== Overview ==
+
+Original x86-64 was limited by 4-level paing to 256 TiB of virtual address
+space and 64 TiB of physical address space. We are already bumping into
+this limit: some vendors offers servers with 64 TiB of memory today.
+
+To overcome the limitation upcoming hardware will introduce support for
+5-level paging. It is a straight-forward extension of the current page
+table structure adding one more layer of translation.
+
+It bumps the limits to 128 PiB of virtual address space and 4 PiB of
+physical address space. This "ought to be enough for anybody" A(C).
+
+QEMU 2.9 and later support 5-level paging.
+
+Virtual memory layout for 5-level paging is described in
+Documentation/x86/x86_64/mm.txt
+
+== Enabling 5-level paging ==
+
+CONFIG_X86_5LEVEL=y enables the feature.
+
+So far, a kernel compiled with the option enabled will be able to boot
+only on machines that supports the feature -- see for 'la57' flag in
+/proc/cpuinfo.
+
+The plan is to implement boot-time switching between 4- and 5-level paging
+in the future.
+
+== User-space and large virtual address space ==
+
+On x86, 5-level paging enables 56-bit userspace virtual address space.
+Not all user space is ready to handle wide addresses. It's known that
+at least some JIT compilers use higher bits in pointers to encode their
+information. It collides with valid pointers with 5-level paging and
+leads to crashes.
+
+To mitigate this, we are not going to allocate virtual address space
+above 47-bit by default.
+
+But userspace can ask for allocation from full address space by
+specifying hint address (with or without MAP_FIXED) above 47-bits.
+
+If hint address set above 47-bit, but MAP_FIXED is not specified, we try
+to look for unmapped area by specified address. If it's already
+occupied, we look for unmapped area in *full* address space, rather than
+from 47-bit window.
+
+A high hint address would only affect the allocation in question, but not
+any future mmap()s.
+
+Specifying high hint address on older kernel or on machine without 5-level
+paging support is safe. The hint will be ignored and kernel will fall back
+to allocation from 47-bit address space.
+
+This approach helps to easily make application's memory allocator aware
+about large address space without manually tracking allocated virtual
+address space.
+
+One important case we need to handle here is interaction with MPX.
+MPX (without MAWA extension) cannot handle addresses above 47-bit, so we
+need to make sure that MPX cannot be enabled we already have VMA above
+the boundary and forbid creating such VMAs once MPX is enabled.
+
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72028a16327b..dc91c1763736 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -320,6 +320,7 @@ config FIX_EARLYCON_MEM
 
 config PGTABLE_LEVELS
 	int
+	default 5 if X86_5LEVEL
 	default 4 if X86_64
 	default 3 if X86_PAE
 	default 2
@@ -1392,6 +1393,23 @@ config X86_PAE
 	  has the cost of more pagetable lookup overhead, and also
 	  consumes more pagetable space per process.
 
+config X86_5LEVEL
+	bool "Enable 5-level page tables support"
+	depends on X86_64
+	---help---
+	  5-level paging enables access to larger address space:
+	  upto 128 PiB of virtual address space and 4 PiB of
+	  physical address space.
+
+	  It will be supported by future Intel CPUs.
+
+	  Note: kernel with the option enabled can only be booted
+	  on machines that support the feature.
+
+	  See Documentation/x86/x86_64/5level-paging.txt for more info.
+
+	  Say N if unsure.
+
 config ARCH_PHYS_ADDR_T_64BIT
 	def_bool y
 	depends on X86_64 || X86_PAE
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 027987638e98..72bd4c62b742 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -17,6 +17,9 @@ config XEN_PV
 	bool "Xen PV guest support"
 	default y
 	depends on XEN
+	# XEN_PV is not ready to work with 5-level paging.
+	# Changes to hypervisor are also required.
+	depends on !X86_5LEVEL
 	select XEN_HAVE_PVMMU
 	select XEN_HAVE_VPMU
 	help
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 1/5] x86: Enable 5-level paging support Kirill A. Shutemov
@ 2017-06-22 12:26 ` Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 3/5] x86/mpx: Do not allow MPX if we have mappings above 47-bit Kirill A. Shutemov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-22 12:26 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

Rename these helpers to be consistent with spelling of TASK_SIZE and
related constants.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/elf.h   |  4 ++--
 arch/x86/kernel/sys_x86_64.c |  2 +-
 arch/x86/mm/hugetlbpage.c    |  2 +-
 arch/x86/mm/mmap.c           | 10 +++++-----
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index e8ab9a46bc68..e68e9ff588c8 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -303,8 +303,8 @@ static inline int mmap_is_ia32(void)
 		test_thread_flag(TIF_ADDR32));
 }
 
-extern unsigned long tasksize_32bit(void);
-extern unsigned long tasksize_64bit(void);
+extern unsigned long task_size_32bit(void);
+extern unsigned long task_size_64bit(void);
 extern unsigned long get_mmap_base(int is_legacy);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 213ddf3e937d..89bd0d6460e1 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -120,7 +120,7 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
 	}
 
 	*begin	= get_mmap_base(1);
-	*end	= in_compat_syscall() ? tasksize_32bit() : tasksize_64bit();
+	*end	= in_compat_syscall() ? task_size_32bit() : task_size_64bit();
 }
 
 unsigned long
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index adad702b39cd..93bfd6d7ce1c 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -86,7 +86,7 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
 	info.length = len;
 	info.low_limit = get_mmap_base(1);
 	info.high_limit = in_compat_syscall() ?
-		tasksize_32bit() : tasksize_64bit();
+		task_size_32bit() : task_size_64bit();
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
 	info.align_offset = 0;
 	return vm_unmapped_area(&info);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 19ad095b41df..b99c1a29dcca 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -37,12 +37,12 @@ struct va_alignment __read_mostly va_align = {
 	.flags = -1,
 };
 
-unsigned long tasksize_32bit(void)
+unsigned long task_size_32bit(void)
 {
 	return IA32_PAGE_OFFSET;
 }
 
-unsigned long tasksize_64bit(void)
+unsigned long task_size_64bit(void)
 {
 	return TASK_SIZE_MAX;
 }
@@ -52,7 +52,7 @@ static unsigned long stack_maxrandom_size(unsigned long task_size)
 	unsigned long max = 0;
 	if ((current->flags & PF_RANDOMIZE) &&
 		!(current->personality & ADDR_NO_RANDOMIZE)) {
-		max = (-1UL) & __STACK_RND_MASK(task_size == tasksize_32bit());
+		max = (-1UL) & __STACK_RND_MASK(task_size == task_size_32bit());
 		max <<= PAGE_SHIFT;
 	}
 
@@ -140,7 +140,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 
 	arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
-			arch_rnd(mmap64_rnd_bits), tasksize_64bit());
+			arch_rnd(mmap64_rnd_bits), task_size_64bit());
 
 #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
 	/*
@@ -150,7 +150,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 	 * mmap_base, the compat syscall uses mmap_compat_base.
 	 */
 	arch_pick_mmap_base(&mm->mmap_compat_base, &mm->mmap_compat_legacy_base,
-			arch_rnd(mmap32_rnd_bits), tasksize_32bit());
+			arch_rnd(mmap32_rnd_bits), task_size_32bit());
 #endif
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] x86/mpx: Do not allow MPX if we have mappings above 47-bit
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 1/5] x86: Enable 5-level paging support Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 2/5] x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit Kirill A. Shutemov
@ 2017-06-22 12:26 ` Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 4/5] x86/mm: Prepare to expose larger address space to userspace Kirill A. Shutemov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-22 12:26 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

MPX (without MAWA extension) cannot handle addresses above 47-bit, so we
need to make sure that MPX cannot be enabled if we already have VMA above
the boundary and forbid creating such VMAs once MPX is enabled.

The patch implements mpx_unmapped_area_check() which is called from all
variants of get_unmapped_area() to check if the requested address fits
mpx.

On enabling MPX, we check if we already have any vma above 47-bit
boundary and forbit the enabling if we do.

As long as DEFAULT_MAP_WINDOW is equal to TASK_SIZE_MAX, the change is
nop. It will change when we allow userspace to have mappings above
47-bits.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/mpx.h       |  9 +++++++++
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/sys_x86_64.c     |  9 +++++++++
 arch/x86/mm/hugetlbpage.c        |  6 ++++++
 arch/x86/mm/mpx.c                | 33 ++++++++++++++++++++++++++++++++-
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h
index a0d662be4c5b..7d7404756bb4 100644
--- a/arch/x86/include/asm/mpx.h
+++ b/arch/x86/include/asm/mpx.h
@@ -73,6 +73,9 @@ static inline void mpx_mm_init(struct mm_struct *mm)
 }
 void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 		      unsigned long start, unsigned long end);
+
+unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
+		unsigned long flags);
 #else
 static inline siginfo_t *mpx_generate_siginfo(struct pt_regs *regs)
 {
@@ -94,6 +97,12 @@ static inline void mpx_notify_unmap(struct mm_struct *mm,
 				    unsigned long start, unsigned long end)
 {
 }
+
+static inline unsigned long mpx_unmapped_area_check(unsigned long addr,
+		unsigned long len, unsigned long flags)
+{
+	return addr;
+}
 #endif /* CONFIG_X86_INTEL_MPX */
 
 #endif /* _ASM_X86_MPX_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f3b1b27f1c0a..97e9cada4945 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -803,6 +803,7 @@ static inline void spin_lock_prefetch(const void *x)
 #define IA32_PAGE_OFFSET	PAGE_OFFSET
 #define TASK_SIZE		PAGE_OFFSET
 #define TASK_SIZE_MAX		TASK_SIZE
+#define DEFAULT_MAP_WINDOW	TASK_SIZE
 #define STACK_TOP		TASK_SIZE
 #define STACK_TOP_MAX		STACK_TOP
 
@@ -844,6 +845,8 @@ static inline void spin_lock_prefetch(const void *x)
  */
 #define TASK_SIZE_MAX	((1UL << 47) - PAGE_SIZE)
 
+#define DEFAULT_MAP_WINDOW	TASK_SIZE_MAX
+
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 89bd0d6460e1..f840e895d871 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -21,6 +21,7 @@
 #include <asm/compat.h>
 #include <asm/ia32.h>
 #include <asm/syscalls.h>
+#include <asm/mpx.h>
 
 /*
  * Align a virtual address to avoid aliasing in the I$ on AMD F15h.
@@ -132,6 +133,10 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	struct vm_unmapped_area_info info;
 	unsigned long begin, end;
 
+	addr = mpx_unmapped_area_check(addr, len, flags);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+
 	if (flags & MAP_FIXED)
 		return addr;
 
@@ -171,6 +176,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	unsigned long addr = addr0;
 	struct vm_unmapped_area_info info;
 
+	addr = mpx_unmapped_area_check(addr, len, flags);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+
 	/* requested length too big for entire address space */
 	if (len > TASK_SIZE)
 		return -ENOMEM;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 93bfd6d7ce1c..afd5f2152300 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -18,6 +18,7 @@
 #include <asm/tlbflush.h>
 #include <asm/pgalloc.h>
 #include <asm/elf.h>
+#include <asm/mpx.h>
 
 #if 0	/* This is just for testing */
 struct page *
@@ -135,6 +136,11 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 
 	if (len & ~huge_page_mask(h))
 		return -EINVAL;
+
+	addr = mpx_unmapped_area_check(addr, len, flags);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 1c34b767c84c..8c8da27e8549 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -355,10 +355,19 @@ int mpx_enable_management(void)
 	 */
 	bd_base = mpx_get_bounds_dir();
 	down_write(&mm->mmap_sem);
+
+	/* MPX doesn't support addresses above 47-bits yet. */
+	if (find_vma(mm, DEFAULT_MAP_WINDOW)) {
+		pr_warn_once("%s (%d): MPX cannot handle addresses "
+				"above 47-bits. Disabling.",
+				current->comm, current->pid);
+		ret = -ENXIO;
+		goto out;
+	}
 	mm->context.bd_addr = bd_base;
 	if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR)
 		ret = -ENXIO;
-
+out:
 	up_write(&mm->mmap_sem);
 	return ret;
 }
@@ -1030,3 +1039,25 @@ void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (ret)
 		force_sig(SIGSEGV, current);
 }
+
+/* MPX cannot handle addresses above 47-bits yet. */
+unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
+		unsigned long flags)
+{
+	if (!kernel_managing_mpx_tables(current->mm))
+		return addr;
+	if (addr + len <= DEFAULT_MAP_WINDOW)
+		return addr;
+	if (flags & MAP_FIXED)
+		return -ENOMEM;
+
+	/*
+	 * Requested len is larger than whole area we're allowed to map in.
+	 * Resetting hinting address wouldn't do much good -- fail early.
+	 */
+	if (len > DEFAULT_MAP_WINDOW)
+		return -ENOMEM;
+
+	/* Look for unmap area within DEFAULT_MAP_WINDOW */
+	return 0;
+}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] x86/mm: Prepare to expose larger address space to userspace
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2017-06-22 12:26 ` [PATCH 3/5] x86/mpx: Do not allow MPX if we have mappings above 47-bit Kirill A. Shutemov
@ 2017-06-22 12:26 ` Kirill A. Shutemov
  2017-06-22 12:26 ` [PATCH 5/5] x86/mm: Allow userspace have mapping above 47-bit Kirill A. Shutemov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-22 12:26 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

On x86, 5-level paging enables 56-bit userspace virtual address space.
Not all user space is ready to handle wide addresses. It's known that
at least some JIT compilers use higher bits in pointers to encode their
information. It collides with valid pointers with 5-level paging and
leads to crashes.

To mitigate this, we are not going to allocate virtual address space
above 47-bit by default.

But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 47-bits.

If hint address set above 47-bit, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 47-bit window.

A high hint address would only affect the allocation in question, but not
any future mmap()s.

Specifying high hint address on older kernel or on machine without 5-level
paging support is safe. The hint will be ignored and kernel will fall back
to allocation from 47-bit address space.

This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.

The patch puts all machinery in place, but not yet allows userspace to have
mappings above 47-bit -- TASK_SIZE_MAX has to be raised to get the effect.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/elf.h       |  4 ++--
 arch/x86/include/asm/processor.h |  7 +++++--
 arch/x86/kernel/sys_x86_64.c     | 21 +++++++++++++++++----
 arch/x86/mm/hugetlbpage.c        | 21 +++++++++++++++++----
 arch/x86/mm/mmap.c               |  6 +++---
 5 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index e68e9ff588c8..7bcd15827a5b 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -250,7 +250,7 @@ extern int force_personality32;
    the loader.  We need to make sure that it is out of the way of the program
    that it will "exec", and that there is sufficient room for the brk.  */
 
-#define ELF_ET_DYN_BASE		(TASK_SIZE / 3 * 2)
+#define ELF_ET_DYN_BASE		(TASK_SIZE_LOW / 3 * 2)
 
 /* This yields a mask that user programs can use to figure out what
    instruction set this CPU supports.  This could be done in user space,
@@ -304,7 +304,7 @@ static inline int mmap_is_ia32(void)
 }
 
 extern unsigned long task_size_32bit(void);
-extern unsigned long task_size_64bit(void);
+extern unsigned long task_size_64bit(int full_addr_space);
 extern unsigned long get_mmap_base(int is_legacy);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 97e9cada4945..42b87689ecd7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -802,6 +802,7 @@ static inline void spin_lock_prefetch(const void *x)
  */
 #define IA32_PAGE_OFFSET	PAGE_OFFSET
 #define TASK_SIZE		PAGE_OFFSET
+#define TASK_SIZE_LOW		TASK_SIZE
 #define TASK_SIZE_MAX		TASK_SIZE
 #define DEFAULT_MAP_WINDOW	TASK_SIZE
 #define STACK_TOP		TASK_SIZE
@@ -853,12 +854,14 @@ static inline void spin_lock_prefetch(const void *x)
 #define IA32_PAGE_OFFSET	((current->personality & ADDR_LIMIT_3GB) ? \
 					0xc0000000 : 0xFFFFe000)
 
+#define TASK_SIZE_LOW		(test_thread_flag(TIF_ADDR32) ? \
+					IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW)
 #define TASK_SIZE		(test_thread_flag(TIF_ADDR32) ? \
 					IA32_PAGE_OFFSET : TASK_SIZE_MAX)
 #define TASK_SIZE_OF(child)	((test_tsk_thread_flag(child, TIF_ADDR32)) ? \
 					IA32_PAGE_OFFSET : TASK_SIZE_MAX)
 
-#define STACK_TOP		TASK_SIZE
+#define STACK_TOP		TASK_SIZE_LOW
 #define STACK_TOP_MAX		TASK_SIZE_MAX
 
 #define INIT_THREAD  {						\
@@ -881,7 +884,7 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
  * space during mmap's.
  */
 #define __TASK_UNMAPPED_BASE(task_size)	(PAGE_ALIGN(task_size / 3))
-#define TASK_UNMAPPED_BASE		__TASK_UNMAPPED_BASE(TASK_SIZE)
+#define TASK_UNMAPPED_BASE		__TASK_UNMAPPED_BASE(TASK_SIZE_LOW)
 
 #define KSTK_EIP(task)		(task_pt_regs(task)->ip)
 
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index f840e895d871..73e4d28112f8 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -101,8 +101,8 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 	return error;
 }
 
-static void find_start_end(unsigned long flags, unsigned long *begin,
-			   unsigned long *end)
+static void find_start_end(unsigned long addr, unsigned long flags,
+		unsigned long *begin, unsigned long *end)
 {
 	if (!in_compat_syscall() && (flags & MAP_32BIT)) {
 		/* This is usually used needed to map code in small
@@ -121,7 +121,10 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
 	}
 
 	*begin	= get_mmap_base(1);
-	*end	= in_compat_syscall() ? task_size_32bit() : task_size_64bit();
+	if (in_compat_syscall())
+		*end = task_size_32bit();
+	else
+		*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
 }
 
 unsigned long
@@ -140,7 +143,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	if (flags & MAP_FIXED)
 		return addr;
 
-	find_start_end(flags, &begin, &end);
+	find_start_end(addr, flags, &begin, &end);
 
 	if (len > end)
 		return -ENOMEM;
@@ -204,6 +207,16 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	info.length = len;
 	info.low_limit = PAGE_SIZE;
 	info.high_limit = get_mmap_base(0);
+
+	/*
+	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
+	 * in the full address space.
+	 *
+	 * !in_compat_syscall() check to avoid high addresses for x32.
+	 */
+	if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
+
 	info.align_mask = 0;
 	info.align_offset = pgoff << PAGE_SHIFT;
 	if (filp) {
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index afd5f2152300..6aa0a679e3f3 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -86,25 +86,38 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
 	info.flags = 0;
 	info.length = len;
 	info.low_limit = get_mmap_base(1);
+
+	/*
+	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
+	 * in the full address space.
+	 */
 	info.high_limit = in_compat_syscall() ?
-		task_size_32bit() : task_size_64bit();
+		task_size_32bit() : task_size_64bit(addr > DEFAULT_MAP_WINDOW);
+
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
 	info.align_offset = 0;
 	return vm_unmapped_area(&info);
 }
 
 static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
-		unsigned long addr0, unsigned long len,
+		unsigned long addr, unsigned long len,
 		unsigned long pgoff, unsigned long flags)
 {
 	struct hstate *h = hstate_file(file);
 	struct vm_unmapped_area_info info;
-	unsigned long addr;
 
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
 	info.length = len;
 	info.low_limit = PAGE_SIZE;
 	info.high_limit = get_mmap_base(0);
+
+	/*
+	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
+	 * in the full address space.
+	 */
+	if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
+
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
 	info.align_offset = 0;
 	addr = vm_unmapped_area(&info);
@@ -119,7 +132,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 		VM_BUG_ON(addr != -ENOMEM);
 		info.flags = 0;
 		info.low_limit = TASK_UNMAPPED_BASE;
-		info.high_limit = TASK_SIZE;
+		info.high_limit = TASK_SIZE_LOW;
 		addr = vm_unmapped_area(&info);
 	}
 
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index b99c1a29dcca..4e5a0449da1c 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -42,9 +42,9 @@ unsigned long task_size_32bit(void)
 	return IA32_PAGE_OFFSET;
 }
 
-unsigned long task_size_64bit(void)
+unsigned long task_size_64bit(int full_addr_space)
 {
-	return TASK_SIZE_MAX;
+	return full_addr_space ? TASK_SIZE_MAX : DEFAULT_MAP_WINDOW;
 }
 
 static unsigned long stack_maxrandom_size(unsigned long task_size)
@@ -140,7 +140,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
 		mm->get_unmapped_area = arch_get_unmapped_area_topdown;
 
 	arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
-			arch_rnd(mmap64_rnd_bits), task_size_64bit());
+			arch_rnd(mmap64_rnd_bits), task_size_64bit(0));
 
 #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
 	/*
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] x86/mm: Allow userspace have mapping above 47-bit
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2017-06-22 12:26 ` [PATCH 4/5] x86/mm: Prepare to expose larger address space to userspace Kirill A. Shutemov
@ 2017-06-22 12:26 ` Kirill A. Shutemov
  2017-06-23  9:06 ` [PATCH 0/5] Last bits for initial 5-level paging enabling Ingo Molnar
  2017-06-28 12:17 ` [PATCH 6/5] x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging Kirill A. Shutemov
  6 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-22 12:26 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

All bits and pieces now in place and we can allow userspace to have VMAs
above 47-bit.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/processor.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 42b87689ecd7..0ed01fdd7fa2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -844,9 +844,9 @@ static inline void spin_lock_prefetch(const void *x)
  * particular problem by preventing anything from being mapped
  * at the maximum canonical address.
  */
-#define TASK_SIZE_MAX	((1UL << 47) - PAGE_SIZE)
+#define TASK_SIZE_MAX	((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
 
-#define DEFAULT_MAP_WINDOW	TASK_SIZE_MAX
+#define DEFAULT_MAP_WINDOW	((1UL << 47) - PAGE_SIZE)
 
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Last bits for initial 5-level paging enabling
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2017-06-22 12:26 ` [PATCH 5/5] x86/mm: Allow userspace have mapping above 47-bit Kirill A. Shutemov
@ 2017-06-23  9:06 ` Ingo Molnar
  2017-06-23 14:49   ` Kirill A. Shutemov
  2017-06-28 12:17 ` [PATCH 6/5] x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging Kirill A. Shutemov
  6 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-06-23  9:06 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> As Ingo requested I've split and updated last two patches for my previous
> patchset.
> 
> Please review and consider applying.
> 
> Kirill A. Shutemov (5):
>   x86: Enable 5-level paging support
>   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
>   x86/mpx: Do not allow MPX if we have mappings above 47-bit
>   x86/mm: Prepare to expose larger address space to userspace
>   x86/mm: Allow userspace have mapping above 47-bit

Ok, looks pretty neat now.

Can I apply them in this order cleanly, without breaking bisection:

>   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
>   x86/mpx: Do not allow MPX if we have mappings above 47-bit
>   x86/mm: Prepare to expose larger address space to userspace
>   x86/mm: Allow userspace have mapping above 47-bit
>   x86: Enable 5-level paging support

?

I.e. I'd like to move the first patch last.

The reason is that we should first get all quirks and assumptions fixed, all 
facilities implemented - and only then enable 5-level paging as a final step which 
produces a well working kernel.

(This should also make it slightly easier to analyze any potential regressions in 
earlier patches.)

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Last bits for initial 5-level paging enabling
  2017-06-23  9:06 ` [PATCH 0/5] Last bits for initial 5-level paging enabling Ingo Molnar
@ 2017-06-23 14:49   ` Kirill A. Shutemov
  2017-06-29 15:06     ` Kirill A. Shutemov
  0 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-23 14:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, linux-mm, linux-kernel

On Fri, Jun 23, 2017 at 11:06:01AM +0200, Ingo Molnar wrote:
> 
> * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> 
> > As Ingo requested I've split and updated last two patches for my previous
> > patchset.
> > 
> > Please review and consider applying.
> > 
> > Kirill A. Shutemov (5):
> >   x86: Enable 5-level paging support
> >   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
> >   x86/mpx: Do not allow MPX if we have mappings above 47-bit
> >   x86/mm: Prepare to expose larger address space to userspace
> >   x86/mm: Allow userspace have mapping above 47-bit
> 
> Ok, looks pretty neat now.
> 
> Can I apply them in this order cleanly, without breaking bisection:
> 
> >   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
> >   x86/mpx: Do not allow MPX if we have mappings above 47-bit
> >   x86/mm: Prepare to expose larger address space to userspace
> >   x86/mm: Allow userspace have mapping above 47-bit
> >   x86: Enable 5-level paging support
> 
> ?
> 
> I.e. I'd like to move the first patch last.
> 
> The reason is that we should first get all quirks and assumptions fixed, all 
> facilities implemented - and only then enable 5-level paging as a final step which 
> produces a well working kernel.
> 
> (This should also make it slightly easier to analyze any potential regressions in 
> earlier patches.)

Just checked bisectability with this order on allmodconfig -- works fine.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/5] x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
  2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2017-06-23  9:06 ` [PATCH 0/5] Last bits for initial 5-level paging enabling Ingo Molnar
@ 2017-06-28 12:17 ` Kirill A. Shutemov
  2017-06-28 16:39   ` Kees Cook
  6 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-28 12:17 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov, Kees Cook

KASLR uses hack to detect whether we booted via startup_32() or
startup_64(): it checks what is loaded into cr3 and compares it to
_pgtables. _pgtables is the array of page tables where early code
allocates page table from.

KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
that's not true if we booted with 5-level paging enabled. In this case top
level page table is allocated separately and only the first p4d page table
is allocated from the array.

Let's modify the check to cover both 4- and 5-level paging cases.

The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
page table for 4th or 5th level, depending on configuration.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/pagetable.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 8e69df96492e..da4cf44d4aac 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -63,7 +63,7 @@ static void *alloc_pgt_page(void *context)
 static struct alloc_pgt_data pgt_data;
 
 /* The top level page table entry pointer. */
-static unsigned long level4p;
+static unsigned long top_level_pgt;
 
 /*
  * Mapping information structure passed to kernel_ident_mapping_init().
@@ -91,9 +91,15 @@ void initialize_identity_maps(void)
 	 * If we came here via startup_32(), cr3 will be _pgtable already
 	 * and we must append to the existing area instead of entirely
 	 * overwriting it.
+	 *
+	 * With 5-level paging, we use _pgtable allocate p4d page table,
+	 * top-level page table is allocated separately.
+	 *
+	 * p4d_offset(top_level_pgt, 0) would cover both 4- and 5-level
+	 * cases. On 4-level paging it's equal to top_level_pgt.
 	 */
-	level4p = read_cr3_pa();
-	if (level4p == (unsigned long)_pgtable) {
+	top_level_pgt = read_cr3_pa();
+	if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
 		debug_putstr("booted via startup_32()\n");
 		pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
 		pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
@@ -103,7 +109,7 @@ void initialize_identity_maps(void)
 		pgt_data.pgt_buf = _pgtable;
 		pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
 		memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
-		level4p = (unsigned long)alloc_pgt_page(&pgt_data);
+		top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
 	}
 }
 
@@ -123,7 +129,7 @@ void add_identity_map(unsigned long start, unsigned long size)
 		return;
 
 	/* Build the mapping. */
-	kernel_ident_mapping_init(&mapping_info, (pgd_t *)level4p,
+	kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
 				  start, end);
 }
 
@@ -134,5 +140,5 @@ void add_identity_map(unsigned long start, unsigned long size)
  */
 void finalize_identity_maps(void)
 {
-	write_cr3(level4p);
+	write_cr3(top_level_pgt);
 }
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/5] x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging
  2017-06-28 12:17 ` [PATCH 6/5] x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging Kirill A. Shutemov
@ 2017-06-28 16:39   ` Kees Cook
  0 siblings, 0 replies; 12+ messages in thread
From: Kees Cook @ 2017-06-28 16:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Dave Hansen, Andy Lutomirski,
	linux-arch, Linux-MM, LKML

On Wed, Jun 28, 2017 at 5:17 AM, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> KASLR uses hack to detect whether we booted via startup_32() or
> startup_64(): it checks what is loaded into cr3 and compares it to
> _pgtables. _pgtables is the array of page tables where early code
> allocates page table from.
>
> KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
> that's not true if we booted with 5-level paging enabled. In this case top
> level page table is allocated separately and only the first p4d page table
> is allocated from the array.
>
> Let's modify the check to cover both 4- and 5-level paging cases.
>
> The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
> page table for 4th or 5th level, depending on configuration.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Kees Cook <keescook@chromium.org>

This looks good, thanks!

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/x86/boot/compressed/pagetable.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
> index 8e69df96492e..da4cf44d4aac 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -63,7 +63,7 @@ static void *alloc_pgt_page(void *context)
>  static struct alloc_pgt_data pgt_data;
>
>  /* The top level page table entry pointer. */
> -static unsigned long level4p;
> +static unsigned long top_level_pgt;
>
>  /*
>   * Mapping information structure passed to kernel_ident_mapping_init().
> @@ -91,9 +91,15 @@ void initialize_identity_maps(void)
>          * If we came here via startup_32(), cr3 will be _pgtable already
>          * and we must append to the existing area instead of entirely
>          * overwriting it.
> +        *
> +        * With 5-level paging, we use _pgtable allocate p4d page table,
> +        * top-level page table is allocated separately.
> +        *
> +        * p4d_offset(top_level_pgt, 0) would cover both 4- and 5-level
> +        * cases. On 4-level paging it's equal to top_level_pgt.
>          */
> -       level4p = read_cr3_pa();
> -       if (level4p == (unsigned long)_pgtable) {
> +       top_level_pgt = read_cr3_pa();
> +       if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
>                 debug_putstr("booted via startup_32()\n");
>                 pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
>                 pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
> @@ -103,7 +109,7 @@ void initialize_identity_maps(void)
>                 pgt_data.pgt_buf = _pgtable;
>                 pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
>                 memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
> -               level4p = (unsigned long)alloc_pgt_page(&pgt_data);
> +               top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
>         }
>  }
>
> @@ -123,7 +129,7 @@ void add_identity_map(unsigned long start, unsigned long size)
>                 return;
>
>         /* Build the mapping. */
> -       kernel_ident_mapping_init(&mapping_info, (pgd_t *)level4p,
> +       kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
>                                   start, end);
>  }
>
> @@ -134,5 +140,5 @@ void add_identity_map(unsigned long start, unsigned long size)
>   */
>  void finalize_identity_maps(void)
>  {
> -       write_cr3(level4p);
> +       write_cr3(top_level_pgt);
>  }
> --
> 2.11.0
>



-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Last bits for initial 5-level paging enabling
  2017-06-23 14:49   ` Kirill A. Shutemov
@ 2017-06-29 15:06     ` Kirill A. Shutemov
  2017-06-30  6:59       ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2017-06-29 15:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm, linux-kernel

On Fri, Jun 23, 2017 at 05:49:15PM +0300, Kirill A. Shutemov wrote:
> On Fri, Jun 23, 2017 at 11:06:01AM +0200, Ingo Molnar wrote:
> > 
> > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> > 
> > > As Ingo requested I've split and updated last two patches for my previous
> > > patchset.
> > > 
> > > Please review and consider applying.
> > > 
> > > Kirill A. Shutemov (5):
> > >   x86: Enable 5-level paging support
> > >   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
> > >   x86/mpx: Do not allow MPX if we have mappings above 47-bit
> > >   x86/mm: Prepare to expose larger address space to userspace
> > >   x86/mm: Allow userspace have mapping above 47-bit
> > 
> > Ok, looks pretty neat now.
> > 
> > Can I apply them in this order cleanly, without breaking bisection:
> > 
> > >   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
> > >   x86/mpx: Do not allow MPX if we have mappings above 47-bit
> > >   x86/mm: Prepare to expose larger address space to userspace
> > >   x86/mm: Allow userspace have mapping above 47-bit
> > >   x86: Enable 5-level paging support
> > 
> > ?
> > 
> > I.e. I'd like to move the first patch last.
> > 
> > The reason is that we should first get all quirks and assumptions fixed, all 
> > facilities implemented - and only then enable 5-level paging as a final step which 
> > produces a well working kernel.
> > 
> > (This should also make it slightly easier to analyze any potential regressions in 
> > earlier patches.)
> 
> Just checked bisectability with this order on allmodconfig -- works fine.

Ingo, if there's no objections, can we get these applied?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Last bits for initial 5-level paging enabling
  2017-06-29 15:06     ` Kirill A. Shutemov
@ 2017-06-30  6:59       ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2017-06-30  6:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Dave Hansen, Andy Lutomirski, linux-arch, linux-mm, linux-kernel


* Kirill A. Shutemov <kirill@shutemov.name> wrote:

> > > Can I apply them in this order cleanly, without breaking bisection:
> > > 
> > > >   x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit
> > > >   x86/mpx: Do not allow MPX if we have mappings above 47-bit
> > > >   x86/mm: Prepare to expose larger address space to userspace
> > > >   x86/mm: Allow userspace have mapping above 47-bit
> > > >   x86: Enable 5-level paging support
> > > 
> > > ?
> > > 
> > > I.e. I'd like to move the first patch last.
> > > 
> > > The reason is that we should first get all quirks and assumptions fixed, all 
> > > facilities implemented - and only then enable 5-level paging as a final step which 
> > > produces a well working kernel.
> > > 
> > > (This should also make it slightly easier to analyze any potential regressions in 
> > > earlier patches.)
> > 
> > Just checked bisectability with this order on allmodconfig -- works fine.
> 
> Ingo, if there's no objections, can we get these applied?

Just this week, which is the final week of the development window, we had two 
fixes for the 5-level pagetables commits, so we need to delay the rest to right 
after -rc1.

Could you please resend them them (and any followup patches), in the suggested 
order? I don't see any conceptual problems, so this is only about timing and 
maximizing stability.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-30  6:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-22 12:26 [PATCH 0/5] Last bits for initial 5-level paging enabling Kirill A. Shutemov
2017-06-22 12:26 ` [PATCH 1/5] x86: Enable 5-level paging support Kirill A. Shutemov
2017-06-22 12:26 ` [PATCH 2/5] x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit Kirill A. Shutemov
2017-06-22 12:26 ` [PATCH 3/5] x86/mpx: Do not allow MPX if we have mappings above 47-bit Kirill A. Shutemov
2017-06-22 12:26 ` [PATCH 4/5] x86/mm: Prepare to expose larger address space to userspace Kirill A. Shutemov
2017-06-22 12:26 ` [PATCH 5/5] x86/mm: Allow userspace have mapping above 47-bit Kirill A. Shutemov
2017-06-23  9:06 ` [PATCH 0/5] Last bits for initial 5-level paging enabling Ingo Molnar
2017-06-23 14:49   ` Kirill A. Shutemov
2017-06-29 15:06     ` Kirill A. Shutemov
2017-06-30  6:59       ` Ingo Molnar
2017-06-28 12:17 ` [PATCH 6/5] x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging Kirill A. Shutemov
2017-06-28 16:39   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).