All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border
@ 2017-11-15 14:36 ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel, Kirill A. Shutemov

In case of 5-level paging, the kernel does not place any mapping above
47-bit, unless userspace explicitly asks for it.

Userspace can request an allocation from the full address space by
specifying the mmap address hint above 47-bit.

Nicholas noticed that the current implementation violates this interface:

  If user space requests a mapping at the end of the 47-bit address space
  with a length which causes the mapping to cross the 47-bit border
  (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
  below and above.

Sanity check the mmap address hint so that start and end of the resulting
vma are on the same side of the 47-bit border. If that's not the case fall
back to the code path which ignores the address hint and allocate from the
regular address space below 47-bit.

[ tglx: Moved the address check to a function and massaged comment and
  	changelog ]

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/elf.h   |  1 +
 arch/x86/kernel/sys_x86_64.c | 10 +++++++---
 arch/x86/mm/hugetlbpage.c    | 11 ++++++++---
 arch/x86/mm/mmap.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 3a091cea36c5..0d157d2a1e2a 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void)
 extern unsigned long task_size_32bit(void);
 extern unsigned long task_size_64bit(int full_addr_space);
 extern unsigned long get_mmap_base(int is_legacy);
+extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
 
 #ifdef CONFIG_X86_32
 
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index a63fe77b3217..676774b9bb8d 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
+	/* No address checking. See comment at mmap_address_hint_valid() */
 	if (flags & MAP_FIXED)
 		return addr;
 
@@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* requesting a specific address */
 	if (addr) {
-		addr = PAGE_ALIGN(addr);
+		addr &= PAGE_MASK;
+		if (!mmap_address_hint_valid(addr, len))
+			goto get_unmapped_area;
+
 		vma = find_vma(mm, addr);
-		if (TASK_SIZE - len >= addr &&
-				(!vma || addr + len <= vm_start_gap(vma)))
+		if (!vma || addr + len <= vm_start_gap(vma))
 			return addr;
 	}
+get_unmapped_area:
 
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
 	info.length = len;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8ae0000cbdb3..00b296617ca4 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
+	/* No address checking. See comment at mmap_address_hint_valid() */
 	if (flags & MAP_FIXED) {
 		if (prepare_hugepage_range(file, addr, len))
 			return -EINVAL;
@@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 	}
 
 	if (addr) {
-		addr = ALIGN(addr, huge_page_size(h));
+		addr &= huge_page_mask(h);
+		if (!mmap_address_hint_valid(addr, len))
+			goto get_unmapped_area;
+
 		vma = find_vma(mm, addr);
-		if (TASK_SIZE - len >= addr &&
-		    (!vma || addr + len <= vm_start_gap(vma)))
+		if (!vma || addr + len <= vm_start_gap(vma))
 			return addr;
 	}
+
+get_unmapped_area:
 	if (mm->get_unmapped_area == arch_get_unmapped_area)
 		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
 				pgoff, flags);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index a99679826846..62285fe77b0f 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 		return "[mpx]";
 	return NULL;
 }
+
+/**
+ * mmap_address_hint_valid - Validate the address hint of mmap
+ * @addr:	Address hint
+ * @len:	Mapping length
+ *
+ * Check whether @addr and @addr + @len result in a valid mapping.
+ *
+ * On 32bit this only checks whether @addr + @len is <= TASK_SIZE.
+ *
+ * On 64bit with 5-level page tables another sanity check is required
+ * because mappings requested by mmap(@addr, 0) which cross the 47-bit
+ * virtual address boundary can cause the following theoretical issue:
+ *
+ *  An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr
+ *  is below the border of the 47-bit address space and @addr + @len is
+ *  above the border.
+ *
+ *  With 4-level paging this request succeeds, but the resulting mapping
+ *  address will always be within the 47-bit virtual address space, because
+ *  the hint address does not result in a valid mapping and is
+ *  ignored. Hence applications which are not prepared to handle virtual
+ *  addresses above 47-bit work correctly.
+ *
+ *  With 5-level paging this request would be granted and result in a
+ *  mapping which crosses the border of the 47-bit virtual address
+ *  space. If the application cannot handle addresses above 47-bit this
+ *  will lead to misbehaviour and hard to diagnose failures.
+ *
+ * Therefore ignore address hints which would result in a mapping crossing
+ * the 47-bit virtual address boundary.
+ *
+ * Note, that in the same scenario with MAP_FIXED the behaviour is
+ * different. The request with @addr < 47-bit and @addr + @len > 47-bit
+ * fails on a 4-level paging machine but succeeds on a 5-level paging
+ * machine. It is reasonable to expect that an application does not rely on
+ * the failure of such a fixed mapping request, so the restriction is not
+ * applied.
+ */
+bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
+{
+	if (TASK_SIZE - len < addr)
+		return false;
+
+	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
+}
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border
@ 2017-11-15 14:36 ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel, Kirill A. Shutemov

In case of 5-level paging, the kernel does not place any mapping above
47-bit, unless userspace explicitly asks for it.

Userspace can request an allocation from the full address space by
specifying the mmap address hint above 47-bit.

Nicholas noticed that the current implementation violates this interface:

  If user space requests a mapping at the end of the 47-bit address space
  with a length which causes the mapping to cross the 47-bit border
  (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
  below and above.

Sanity check the mmap address hint so that start and end of the resulting
vma are on the same side of the 47-bit border. If that's not the case fall
back to the code path which ignores the address hint and allocate from the
regular address space below 47-bit.

[ tglx: Moved the address check to a function and massaged comment and
  	changelog ]

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/elf.h   |  1 +
 arch/x86/kernel/sys_x86_64.c | 10 +++++++---
 arch/x86/mm/hugetlbpage.c    | 11 ++++++++---
 arch/x86/mm/mmap.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 3a091cea36c5..0d157d2a1e2a 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void)
 extern unsigned long task_size_32bit(void);
 extern unsigned long task_size_64bit(int full_addr_space);
 extern unsigned long get_mmap_base(int is_legacy);
+extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
 
 #ifdef CONFIG_X86_32
 
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index a63fe77b3217..676774b9bb8d 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
+	/* No address checking. See comment at mmap_address_hint_valid() */
 	if (flags & MAP_FIXED)
 		return addr;
 
@@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* requesting a specific address */
 	if (addr) {
-		addr = PAGE_ALIGN(addr);
+		addr &= PAGE_MASK;
+		if (!mmap_address_hint_valid(addr, len))
+			goto get_unmapped_area;
+
 		vma = find_vma(mm, addr);
-		if (TASK_SIZE - len >= addr &&
-				(!vma || addr + len <= vm_start_gap(vma)))
+		if (!vma || addr + len <= vm_start_gap(vma))
 			return addr;
 	}
+get_unmapped_area:
 
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
 	info.length = len;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8ae0000cbdb3..00b296617ca4 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
+	/* No address checking. See comment at mmap_address_hint_valid() */
 	if (flags & MAP_FIXED) {
 		if (prepare_hugepage_range(file, addr, len))
 			return -EINVAL;
@@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 	}
 
 	if (addr) {
-		addr = ALIGN(addr, huge_page_size(h));
+		addr &= huge_page_mask(h);
+		if (!mmap_address_hint_valid(addr, len))
+			goto get_unmapped_area;
+
 		vma = find_vma(mm, addr);
-		if (TASK_SIZE - len >= addr &&
-		    (!vma || addr + len <= vm_start_gap(vma)))
+		if (!vma || addr + len <= vm_start_gap(vma))
 			return addr;
 	}
+
+get_unmapped_area:
 	if (mm->get_unmapped_area == arch_get_unmapped_area)
 		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
 				pgoff, flags);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index a99679826846..62285fe77b0f 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 		return "[mpx]";
 	return NULL;
 }
+
+/**
+ * mmap_address_hint_valid - Validate the address hint of mmap
+ * @addr:	Address hint
+ * @len:	Mapping length
+ *
+ * Check whether @addr and @addr + @len result in a valid mapping.
+ *
+ * On 32bit this only checks whether @addr + @len is <= TASK_SIZE.
+ *
+ * On 64bit with 5-level page tables another sanity check is required
+ * because mappings requested by mmap(@addr, 0) which cross the 47-bit
+ * virtual address boundary can cause the following theoretical issue:
+ *
+ *  An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr
+ *  is below the border of the 47-bit address space and @addr + @len is
+ *  above the border.
+ *
+ *  With 4-level paging this request succeeds, but the resulting mapping
+ *  address will always be within the 47-bit virtual address space, because
+ *  the hint address does not result in a valid mapping and is
+ *  ignored. Hence applications which are not prepared to handle virtual
+ *  addresses above 47-bit work correctly.
+ *
+ *  With 5-level paging this request would be granted and result in a
+ *  mapping which crosses the border of the 47-bit virtual address
+ *  space. If the application cannot handle addresses above 47-bit this
+ *  will lead to misbehaviour and hard to diagnose failures.
+ *
+ * Therefore ignore address hints which would result in a mapping crossing
+ * the 47-bit virtual address boundary.
+ *
+ * Note, that in the same scenario with MAP_FIXED the behaviour is
+ * different. The request with @addr < 47-bit and @addr + @len > 47-bit
+ * fails on a 4-level paging machine but succeeds on a 5-level paging
+ * machine. It is reasonable to expect that an application does not rely on
+ * the failure of such a fixed mapping request, so the restriction is not
+ * applied.
+ */
+bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
+{
+	if (TASK_SIZE - len < addr)
+		return false;
+
+	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
+}
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
  2017-11-15 14:36 ` Kirill A. Shutemov
@ 2017-11-15 14:36   ` Kirill A. Shutemov
  -1 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel, Kirill A. Shutemov

With 5-level paging, we have 56-bit virtual address space available for
userspace. But we don't want to expose userspace to addresses above
47-bits, unless it asked specifically for it.

We use mmap(2) hint address as a way for kernel to know if it's okay to
allocate virtual memory above 47-bit.

Let's add a self-test that covers few corner cases of the interface.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/5lvl.c   | 177 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/x86/Makefile |   2 +-
 2 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/5lvl.c

diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c
new file mode 100644
index 000000000000..2eafdcd4c2b3
--- /dev/null
+++ b/tools/testing/selftests/x86/5lvl.c
@@ -0,0 +1,177 @@
+#include <stdio.h>
+#include <sys/mman.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+#define PAGE_SIZE	4096
+#define LOW_ADDR	((void *) (1UL << 30))
+#define HIGH_ADDR	((void *) (1UL << 50))
+
+struct testcase {
+	void *addr;
+	unsigned long size;
+	unsigned long flags;
+	const char *msg;
+	unsigned int low_addr_required:1;
+	unsigned int keep_mapped:1;
+};
+
+static struct testcase testcases[] = {
+	{
+		.addr = NULL,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(NULL)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = LOW_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(LOW_ADDR)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR) again",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1) again",
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 2 * PAGE_SIZE)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE / 2),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)",
+	},
+	{
+		.addr = NULL,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(NULL, MAP_HUGETLB)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = LOW_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1, MAP_HUGETLB)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1, MAP_HUGETLB) again",
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 4UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - (2UL << 20)),
+		.size = 4UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)",
+	},
+};
+
+int main(int argc, char **argv)
+{
+	int i;
+	void *p;
+
+	for (i = 0; i < ARRAY_SIZE(testcases); i++) {
+		struct testcase *t = testcases + i;
+
+		p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0);
+
+		printf("%s: %p - ", t->msg, p);
+
+		if (p == MAP_FAILED) {
+			printf("FAILED\n");
+			continue;
+		}
+
+		if (t->low_addr_required && p >= (void *)(1UL << 47))
+			printf("FAILED\n");
+		else
+			printf("OK\n");
+		if (!t->keep_mapped)
+			munmap(p, t->size);
+	}
+	return 0;
+}
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 7b1adeee4b0f..939a337128db 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip
+TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
@ 2017-11-15 14:36   ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw)
  To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel, Kirill A. Shutemov

With 5-level paging, we have 56-bit virtual address space available for
userspace. But we don't want to expose userspace to addresses above
47-bits, unless it asked specifically for it.

We use mmap(2) hint address as a way for kernel to know if it's okay to
allocate virtual memory above 47-bit.

Let's add a self-test that covers few corner cases of the interface.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 tools/testing/selftests/x86/5lvl.c   | 177 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/x86/Makefile |   2 +-
 2 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/5lvl.c

diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c
new file mode 100644
index 000000000000..2eafdcd4c2b3
--- /dev/null
+++ b/tools/testing/selftests/x86/5lvl.c
@@ -0,0 +1,177 @@
+#include <stdio.h>
+#include <sys/mman.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+#define PAGE_SIZE	4096
+#define LOW_ADDR	((void *) (1UL << 30))
+#define HIGH_ADDR	((void *) (1UL << 50))
+
+struct testcase {
+	void *addr;
+	unsigned long size;
+	unsigned long flags;
+	const char *msg;
+	unsigned int low_addr_required:1;
+	unsigned int keep_mapped:1;
+};
+
+static struct testcase testcases[] = {
+	{
+		.addr = NULL,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(NULL)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = LOW_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(LOW_ADDR)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR) again",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1) again",
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 2 * PAGE_SIZE)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE / 2),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)",
+	},
+	{
+		.addr = NULL,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(NULL, MAP_HUGETLB)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = LOW_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1, MAP_HUGETLB)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1, MAP_HUGETLB) again",
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 4UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - (2UL << 20)),
+		.size = 4UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)",
+	},
+};
+
+int main(int argc, char **argv)
+{
+	int i;
+	void *p;
+
+	for (i = 0; i < ARRAY_SIZE(testcases); i++) {
+		struct testcase *t = testcases + i;
+
+		p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0);
+
+		printf("%s: %p - ", t->msg, p);
+
+		if (p == MAP_FAILED) {
+			printf("FAILED\n");
+			continue;
+		}
+
+		if (t->low_addr_required && p >= (void *)(1UL << 47))
+			printf("FAILED\n");
+		else
+			printf("OK\n");
+		if (!t->keep_mapped)
+			munmap(p, t->size);
+	}
+	return 0;
+}
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 7b1adeee4b0f..939a337128db 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip
+TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:x86/urgent] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border
  2017-11-15 14:36 ` Kirill A. Shutemov
  (?)
  (?)
@ 2017-11-16 10:46 ` tip-bot for Kirill A. Shutemov
  -1 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2017-11-16 10:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: kirill.shutemov, hpa, tglx, luto, linux-kernel, npiggin, torvalds, mingo

Commit-ID:  1e0f25dbf2464df8445dd40881f4d9e732434947
Gitweb:     https://git.kernel.org/tip/1e0f25dbf2464df8445dd40881f4d9e732434947
Author:     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Wed, 15 Nov 2017 17:36:06 +0300
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 16 Nov 2017 11:43:11 +0100

x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border

In case of 5-level paging, the kernel does not place any mapping above
47-bit, unless userspace explicitly asks for it.

Userspace can request an allocation from the full address space by
specifying the mmap address hint above 47-bit.

Nicholas noticed that the current implementation violates this interface:

  If user space requests a mapping at the end of the 47-bit address space
  with a length which causes the mapping to cross the 47-bit border
  (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
  below and above.

Sanity check the mmap address hint so that start and end of the resulting
vma are on the same side of the 47-bit border. If that's not the case fall
back to the code path which ignores the address hint and allocate from the
regular address space below 47-bit.

To make the checks consistent, mask out the address hints lower bits
(either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can
push them up to the next boundary.

[ tglx: Moved the address check to a function and massaged comment and
  	changelog ]

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com

---
 arch/x86/include/asm/elf.h   |  1 +
 arch/x86/kernel/sys_x86_64.c | 10 +++++++---
 arch/x86/mm/hugetlbpage.c    | 11 ++++++++---
 arch/x86/mm/mmap.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 3a091ce..0d157d2 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void)
 extern unsigned long task_size_32bit(void);
 extern unsigned long task_size_64bit(int full_addr_space);
 extern unsigned long get_mmap_base(int is_legacy);
+extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
 
 #ifdef CONFIG_X86_32
 
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index a63fe77..676774b 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
+	/* No address checking. See comment at mmap_address_hint_valid() */
 	if (flags & MAP_FIXED)
 		return addr;
 
@@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 
 	/* requesting a specific address */
 	if (addr) {
-		addr = PAGE_ALIGN(addr);
+		addr &= PAGE_MASK;
+		if (!mmap_address_hint_valid(addr, len))
+			goto get_unmapped_area;
+
 		vma = find_vma(mm, addr);
-		if (TASK_SIZE - len >= addr &&
-				(!vma || addr + len <= vm_start_gap(vma)))
+		if (!vma || addr + len <= vm_start_gap(vma))
 			return addr;
 	}
+get_unmapped_area:
 
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
 	info.length = len;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8ae0000..00b2966 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 	if (len > TASK_SIZE)
 		return -ENOMEM;
 
+	/* No address checking. See comment at mmap_address_hint_valid() */
 	if (flags & MAP_FIXED) {
 		if (prepare_hugepage_range(file, addr, len))
 			return -EINVAL;
@@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 	}
 
 	if (addr) {
-		addr = ALIGN(addr, huge_page_size(h));
+		addr &= huge_page_mask(h);
+		if (!mmap_address_hint_valid(addr, len))
+			goto get_unmapped_area;
+
 		vma = find_vma(mm, addr);
-		if (TASK_SIZE - len >= addr &&
-		    (!vma || addr + len <= vm_start_gap(vma)))
+		if (!vma || addr + len <= vm_start_gap(vma))
 			return addr;
 	}
+
+get_unmapped_area:
 	if (mm->get_unmapped_area == arch_get_unmapped_area)
 		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
 				pgoff, flags);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index a996798..62285fe 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 		return "[mpx]";
 	return NULL;
 }
+
+/**
+ * mmap_address_hint_valid - Validate the address hint of mmap
+ * @addr:	Address hint
+ * @len:	Mapping length
+ *
+ * Check whether @addr and @addr + @len result in a valid mapping.
+ *
+ * On 32bit this only checks whether @addr + @len is <= TASK_SIZE.
+ *
+ * On 64bit with 5-level page tables another sanity check is required
+ * because mappings requested by mmap(@addr, 0) which cross the 47-bit
+ * virtual address boundary can cause the following theoretical issue:
+ *
+ *  An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr
+ *  is below the border of the 47-bit address space and @addr + @len is
+ *  above the border.
+ *
+ *  With 4-level paging this request succeeds, but the resulting mapping
+ *  address will always be within the 47-bit virtual address space, because
+ *  the hint address does not result in a valid mapping and is
+ *  ignored. Hence applications which are not prepared to handle virtual
+ *  addresses above 47-bit work correctly.
+ *
+ *  With 5-level paging this request would be granted and result in a
+ *  mapping which crosses the border of the 47-bit virtual address
+ *  space. If the application cannot handle addresses above 47-bit this
+ *  will lead to misbehaviour and hard to diagnose failures.
+ *
+ * Therefore ignore address hints which would result in a mapping crossing
+ * the 47-bit virtual address boundary.
+ *
+ * Note, that in the same scenario with MAP_FIXED the behaviour is
+ * different. The request with @addr < 47-bit and @addr + @len > 47-bit
+ * fails on a 4-level paging machine but succeeds on a 5-level paging
+ * machine. It is reasonable to expect that an application does not rely on
+ * the failure of such a fixed mapping request, so the restriction is not
+ * applied.
+ */
+bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
+{
+	if (TASK_SIZE - len < addr)
+		return false;
+
+	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
+}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:x86/urgent] x86/selftests: Add test for mapping placement for 5-level paging
  2017-11-15 14:36   ` Kirill A. Shutemov
  (?)
@ 2017-11-16 10:46   ` tip-bot for Kirill A. Shutemov
  -1 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Kirill A. Shutemov @ 2017-11-16 10:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: npiggin, luto, kirill.shutemov, hpa, torvalds, tglx, mingo, linux-kernel

Commit-ID:  97f404ad3e53bf9ac598745066ba2f57c1da3039
Gitweb:     https://git.kernel.org/tip/97f404ad3e53bf9ac598745066ba2f57c1da3039
Author:     Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate: Wed, 15 Nov 2017 17:36:07 +0300
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 16 Nov 2017 11:43:12 +0100

x86/selftests: Add test for mapping placement for 5-level paging

5-level paging provides a 56-bit virtual address space for user space
application. But the kernel defaults to mappings below the 47-bit address
space boundary, which is the upper bound for 4-level paging, unless an
application explicitely request it by using a mmap(2) address hint above
the 47-bit boundary. The kernel prevents mappings which spawn across the
47-bit boundary unless mmap(2) was invoked with MAP_FIXED.

Add a self-test that covers the corner cases of the interface and validates
the correctness of the implementation.

[ tglx: Massaged changelog once more ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20171115143607.81541-2-kirill.shutemov@linux.intel.com

---
 tools/testing/selftests/x86/5lvl.c   | 177 +++++++++++++++++++++++++++++++++++
 tools/testing/selftests/x86/Makefile |   2 +-
 2 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c
new file mode 100644
index 0000000..2eafdcd
--- /dev/null
+++ b/tools/testing/selftests/x86/5lvl.c
@@ -0,0 +1,177 @@
+#include <stdio.h>
+#include <sys/mman.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+#define PAGE_SIZE	4096
+#define LOW_ADDR	((void *) (1UL << 30))
+#define HIGH_ADDR	((void *) (1UL << 50))
+
+struct testcase {
+	void *addr;
+	unsigned long size;
+	unsigned long flags;
+	const char *msg;
+	unsigned int low_addr_required:1;
+	unsigned int keep_mapped:1;
+};
+
+static struct testcase testcases[] = {
+	{
+		.addr = NULL,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(NULL)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = LOW_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(LOW_ADDR)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR) again",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1) again",
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 2 * PAGE_SIZE)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE / 2),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 2 * PAGE_SIZE,
+		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)",
+	},
+	{
+		.addr = NULL,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(NULL, MAP_HUGETLB)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = LOW_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
+		.low_addr_required = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = HIGH_ADDR,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1, MAP_HUGETLB)",
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void*) -1,
+		.size = 2UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap(-1, MAP_HUGETLB) again",
+	},
+	{
+		.addr = (void *)((1UL << 47) - PAGE_SIZE),
+		.size = 4UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
+		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
+		.low_addr_required = 1,
+		.keep_mapped = 1,
+	},
+	{
+		.addr = (void *)((1UL << 47) - (2UL << 20)),
+		.size = 4UL << 20,
+		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
+		.msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)",
+	},
+};
+
+int main(int argc, char **argv)
+{
+	int i;
+	void *p;
+
+	for (i = 0; i < ARRAY_SIZE(testcases); i++) {
+		struct testcase *t = testcases + i;
+
+		p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0);
+
+		printf("%s: %p - ", t->msg, p);
+
+		if (p == MAP_FAILED) {
+			printf("FAILED\n");
+			continue;
+		}
+
+		if (t->low_addr_required && p >= (void *)(1UL << 47))
+			printf("FAILED\n");
+		else
+			printf("OK\n");
+		if (!t->keep_mapped)
+			munmap(p, t->size);
+	}
+	return 0;
+}
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 7b1adee..939a337 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip
+TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border
  2017-11-15 14:36 ` Kirill A. Shutemov
@ 2017-11-16 13:23   ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-11-16 13:23 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin,
	Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel

On Wed 15-11-17 17:36:06, Kirill A. Shutemov wrote:
> In case of 5-level paging, the kernel does not place any mapping above
> 47-bit, unless userspace explicitly asks for it.
> 
> Userspace can request an allocation from the full address space by
> specifying the mmap address hint above 47-bit.
> 
> Nicholas noticed that the current implementation violates this interface:
> 
>   If user space requests a mapping at the end of the 47-bit address space
>   with a length which causes the mapping to cross the 47-bit border
>   (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
>   below and above.
> 
> Sanity check the mmap address hint so that start and end of the resulting
> vma are on the same side of the 47-bit border. If that's not the case fall
> back to the code path which ignores the address hint and allocate from the
> regular address space below 47-bit.
> 
> [ tglx: Moved the address check to a function and massaged comment and
>   	changelog ]
> 
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

FWIW
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  arch/x86/include/asm/elf.h   |  1 +
>  arch/x86/kernel/sys_x86_64.c | 10 +++++++---
>  arch/x86/mm/hugetlbpage.c    | 11 ++++++++---
>  arch/x86/mm/mmap.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 3a091cea36c5..0d157d2a1e2a 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void)
>  extern unsigned long task_size_32bit(void);
>  extern unsigned long task_size_64bit(int full_addr_space);
>  extern unsigned long get_mmap_base(int is_legacy);
> +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
>  
>  #ifdef CONFIG_X86_32
>  
> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index a63fe77b3217..676774b9bb8d 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>  	if (len > TASK_SIZE)
>  		return -ENOMEM;
>  
> +	/* No address checking. See comment at mmap_address_hint_valid() */
>  	if (flags & MAP_FIXED)
>  		return addr;
>  
> @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>  
>  	/* requesting a specific address */
>  	if (addr) {
> -		addr = PAGE_ALIGN(addr);
> +		addr &= PAGE_MASK;
> +		if (!mmap_address_hint_valid(addr, len))
> +			goto get_unmapped_area;
> +
>  		vma = find_vma(mm, addr);
> -		if (TASK_SIZE - len >= addr &&
> -				(!vma || addr + len <= vm_start_gap(vma)))
> +		if (!vma || addr + len <= vm_start_gap(vma))
>  			return addr;
>  	}
> +get_unmapped_area:
>  
>  	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>  	info.length = len;
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 8ae0000cbdb3..00b296617ca4 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  	if (len > TASK_SIZE)
>  		return -ENOMEM;
>  
> +	/* No address checking. See comment at mmap_address_hint_valid() */
>  	if (flags & MAP_FIXED) {
>  		if (prepare_hugepage_range(file, addr, len))
>  			return -EINVAL;
> @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  	}
>  
>  	if (addr) {
> -		addr = ALIGN(addr, huge_page_size(h));
> +		addr &= huge_page_mask(h);
> +		if (!mmap_address_hint_valid(addr, len))
> +			goto get_unmapped_area;
> +
>  		vma = find_vma(mm, addr);
> -		if (TASK_SIZE - len >= addr &&
> -		    (!vma || addr + len <= vm_start_gap(vma)))
> +		if (!vma || addr + len <= vm_start_gap(vma))
>  			return addr;
>  	}
> +
> +get_unmapped_area:
>  	if (mm->get_unmapped_area == arch_get_unmapped_area)
>  		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>  				pgoff, flags);
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index a99679826846..62285fe77b0f 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma)
>  		return "[mpx]";
>  	return NULL;
>  }
> +
> +/**
> + * mmap_address_hint_valid - Validate the address hint of mmap
> + * @addr:	Address hint
> + * @len:	Mapping length
> + *
> + * Check whether @addr and @addr + @len result in a valid mapping.
> + *
> + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE.
> + *
> + * On 64bit with 5-level page tables another sanity check is required
> + * because mappings requested by mmap(@addr, 0) which cross the 47-bit
> + * virtual address boundary can cause the following theoretical issue:
> + *
> + *  An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr
> + *  is below the border of the 47-bit address space and @addr + @len is
> + *  above the border.
> + *
> + *  With 4-level paging this request succeeds, but the resulting mapping
> + *  address will always be within the 47-bit virtual address space, because
> + *  the hint address does not result in a valid mapping and is
> + *  ignored. Hence applications which are not prepared to handle virtual
> + *  addresses above 47-bit work correctly.
> + *
> + *  With 5-level paging this request would be granted and result in a
> + *  mapping which crosses the border of the 47-bit virtual address
> + *  space. If the application cannot handle addresses above 47-bit this
> + *  will lead to misbehaviour and hard to diagnose failures.
> + *
> + * Therefore ignore address hints which would result in a mapping crossing
> + * the 47-bit virtual address boundary.
> + *
> + * Note, that in the same scenario with MAP_FIXED the behaviour is
> + * different. The request with @addr < 47-bit and @addr + @len > 47-bit
> + * fails on a 4-level paging machine but succeeds on a 5-level paging
> + * machine. It is reasonable to expect that an application does not rely on
> + * the failure of such a fixed mapping request, so the restriction is not
> + * applied.
> + */
> +bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
> +{
> +	if (TASK_SIZE - len < addr)
> +		return false;
> +
> +	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
> +}
> -- 
> 2.15.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border
@ 2017-11-16 13:23   ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-11-16 13:23 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin,
	Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel

On Wed 15-11-17 17:36:06, Kirill A. Shutemov wrote:
> In case of 5-level paging, the kernel does not place any mapping above
> 47-bit, unless userspace explicitly asks for it.
> 
> Userspace can request an allocation from the full address space by
> specifying the mmap address hint above 47-bit.
> 
> Nicholas noticed that the current implementation violates this interface:
> 
>   If user space requests a mapping at the end of the 47-bit address space
>   with a length which causes the mapping to cross the 47-bit border
>   (DEFAULT_MAP_WINDOW), then the vma is partially in the address space
>   below and above.
> 
> Sanity check the mmap address hint so that start and end of the resulting
> vma are on the same side of the 47-bit border. If that's not the case fall
> back to the code path which ignores the address hint and allocate from the
> regular address space below 47-bit.
> 
> [ tglx: Moved the address check to a function and massaged comment and
>   	changelog ]
> 
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

FWIW
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  arch/x86/include/asm/elf.h   |  1 +
>  arch/x86/kernel/sys_x86_64.c | 10 +++++++---
>  arch/x86/mm/hugetlbpage.c    | 11 ++++++++---
>  arch/x86/mm/mmap.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 3a091cea36c5..0d157d2a1e2a 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void)
>  extern unsigned long task_size_32bit(void);
>  extern unsigned long task_size_64bit(int full_addr_space);
>  extern unsigned long get_mmap_base(int is_legacy);
> +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len);
>  
>  #ifdef CONFIG_X86_32
>  
> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index a63fe77b3217..676774b9bb8d 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>  	if (len > TASK_SIZE)
>  		return -ENOMEM;
>  
> +	/* No address checking. See comment at mmap_address_hint_valid() */
>  	if (flags & MAP_FIXED)
>  		return addr;
>  
> @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>  
>  	/* requesting a specific address */
>  	if (addr) {
> -		addr = PAGE_ALIGN(addr);
> +		addr &= PAGE_MASK;
> +		if (!mmap_address_hint_valid(addr, len))
> +			goto get_unmapped_area;
> +
>  		vma = find_vma(mm, addr);
> -		if (TASK_SIZE - len >= addr &&
> -				(!vma || addr + len <= vm_start_gap(vma)))
> +		if (!vma || addr + len <= vm_start_gap(vma))
>  			return addr;
>  	}
> +get_unmapped_area:
>  
>  	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
>  	info.length = len;
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 8ae0000cbdb3..00b296617ca4 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  	if (len > TASK_SIZE)
>  		return -ENOMEM;
>  
> +	/* No address checking. See comment at mmap_address_hint_valid() */
>  	if (flags & MAP_FIXED) {
>  		if (prepare_hugepage_range(file, addr, len))
>  			return -EINVAL;
> @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>  	}
>  
>  	if (addr) {
> -		addr = ALIGN(addr, huge_page_size(h));
> +		addr &= huge_page_mask(h);
> +		if (!mmap_address_hint_valid(addr, len))
> +			goto get_unmapped_area;
> +
>  		vma = find_vma(mm, addr);
> -		if (TASK_SIZE - len >= addr &&
> -		    (!vma || addr + len <= vm_start_gap(vma)))
> +		if (!vma || addr + len <= vm_start_gap(vma))
>  			return addr;
>  	}
> +
> +get_unmapped_area:
>  	if (mm->get_unmapped_area == arch_get_unmapped_area)
>  		return hugetlb_get_unmapped_area_bottomup(file, addr, len,
>  				pgoff, flags);
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index a99679826846..62285fe77b0f 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma)
>  		return "[mpx]";
>  	return NULL;
>  }
> +
> +/**
> + * mmap_address_hint_valid - Validate the address hint of mmap
> + * @addr:	Address hint
> + * @len:	Mapping length
> + *
> + * Check whether @addr and @addr + @len result in a valid mapping.
> + *
> + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE.
> + *
> + * On 64bit with 5-level page tables another sanity check is required
> + * because mappings requested by mmap(@addr, 0) which cross the 47-bit
> + * virtual address boundary can cause the following theoretical issue:
> + *
> + *  An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr
> + *  is below the border of the 47-bit address space and @addr + @len is
> + *  above the border.
> + *
> + *  With 4-level paging this request succeeds, but the resulting mapping
> + *  address will always be within the 47-bit virtual address space, because
> + *  the hint address does not result in a valid mapping and is
> + *  ignored. Hence applications which are not prepared to handle virtual
> + *  addresses above 47-bit work correctly.
> + *
> + *  With 5-level paging this request would be granted and result in a
> + *  mapping which crosses the border of the 47-bit virtual address
> + *  space. If the application cannot handle addresses above 47-bit this
> + *  will lead to misbehaviour and hard to diagnose failures.
> + *
> + * Therefore ignore address hints which would result in a mapping crossing
> + * the 47-bit virtual address boundary.
> + *
> + * Note, that in the same scenario with MAP_FIXED the behaviour is
> + * different. The request with @addr < 47-bit and @addr + @len > 47-bit
> + * fails on a 4-level paging machine but succeeds on a 5-level paging
> + * machine. It is reasonable to expect that an application does not rely on
> + * the failure of such a fixed mapping request, so the restriction is not
> + * applied.
> + */
> +bool mmap_address_hint_valid(unsigned long addr, unsigned long len)
> +{
> +	if (TASK_SIZE - len < addr)
> +		return false;
> +
> +	return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW);
> +}
> -- 
> 2.15.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
  2017-11-15 14:36   ` Kirill A. Shutemov
@ 2017-11-22  5:41     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2017-11-22  5:41 UTC (permalink / raw)
  To: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm,
	linux-kernel, Kirill A. Shutemov

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> With 5-level paging, we have 56-bit virtual address space available for
> userspace. But we don't want to expose userspace to addresses above
> 47-bits, unless it asked specifically for it.
>
> We use mmap(2) hint address as a way for kernel to know if it's okay to
> allocate virtual memory above 47-bit.
>
> Let's add a self-test that covers few corner cases of the interface.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Can we move this to selftest/vm/ ? I had a variant which i was using to
test issues on ppc64. One change we did recently was to use >=128TB as
the hint addr value to select larger address space. I also would like to
check for exact mmap return addr in some case. Attaching below the test
i was using. I will check whether this patch can be updated to test what
is converted in my selftest. I also want to do the boundary check twice.
The hash trasnslation mode in POWER require us to track addr limit and
we had bugs around address space slection before and after updating the
addr limit.

>From 7739eb02bb6b6602572a9c259e915ef23950aae1 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Date: Mon, 13 Nov 2017 10:41:10 +0530
Subject: [PATCH] selftest/mm: Add test for checking mmap across 128TB boundary

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 tools/testing/selftests/vm/Makefile         |   1 +
 tools/testing/selftests/vm/run_vmtests      |  11 ++
 tools/testing/selftests/vm/va_128TBswitch.c | 170 ++++++++++++++++++++++++++++
 3 files changed, 182 insertions(+)
 create mode 100644 tools/testing/selftests/vm/va_128TBswitch.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index cbb29e41ef2b..b1fb3cd7cf52 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -17,6 +17,7 @@ TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
 TEST_GEN_FILES += mlock-random-test
 TEST_GEN_FILES += virtual_address_range
+TEST_GEN_FILES += va_128TBswitch
 
 TEST_PROGS := run_vmtests
 
diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests
index 07548a1fa901..b367f7801b67 100755
--- a/tools/testing/selftests/vm/run_vmtests
+++ b/tools/testing/selftests/vm/run_vmtests
@@ -176,4 +176,15 @@ else
 	echo "[PASS]"
 fi
 
+echo "-----------------------------"
+echo "running virtual address 128TB switch test" 
+echo "-----------------------------"
+./va_128TBswitch
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
 exit $exitcode
diff --git a/tools/testing/selftests/vm/va_128TBswitch.c b/tools/testing/selftests/vm/va_128TBswitch.c
new file mode 100644
index 000000000000..dfa501b825a8
--- /dev/null
+++ b/tools/testing/selftests/vm/va_128TBswitch.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright IBM Corporation, 2017
+ * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+
+#ifdef DEBUG
+#define pr_debug(fmt, ...)  printf(fmt, ##__VA_ARGS__)
+#else
+#define pr_debug(fmt, ...)
+#endif
+
+/*
+ * >= 128TB is the hint addr value we used to select
+ * large address space.
+ */
+#define ADDR_SWITCH_HINT (1UL << 47)
+
+#ifdef __powerpc64__
+#define MAP_SIZE  64*1024
+#else
+#define MAP_SIZE  4*1024
+#endif
+
+
+void report_failure(long in_addr, unsigned long flags)
+{
+	printf("Failed to map 0x%lx with flags 0x%lx\n", in_addr, flags);
+	exit(1);
+}
+
+int *__map_addr(long in_addr, int size, unsigned long flags, int unmap)
+{
+	int *addr;
+
+	addr = (int *)mmap((void *)in_addr, size, PROT_READ | PROT_WRITE,
+			   MAP_ANONYMOUS | MAP_PRIVATE | flags, -1, 0);
+	if (addr == MAP_FAILED)
+		report_failure(in_addr, flags);
+	pr_debug("Mapped addr 0x%lx-0x%lx for request 0x%lx with flag 0x%lx\n",
+		 (unsigned long)addr, ((unsigned long)addr + size), in_addr, flags);
+	/*
+	 * Try to access to catch errors in fault handling/slb miss handling
+	 */
+	*addr = 10;
+	if (unmap)
+		munmap(addr, size);
+	return addr;
+}
+
+int *map_addr(long in_addr, unsigned long flags, int unmap)
+{
+	return __map_addr(in_addr, MAP_SIZE, flags, unmap);
+}
+
+void boundary_check(void)
+{
+	int *a;
+
+	/*
+	 * If stack is moved, we could possibly allocate
+	 * this at the requested address.
+	 */
+	a = map_addr((ADDR_SWITCH_HINT - MAP_SIZE), 0, 1);
+	if ((unsigned long)a > ADDR_SWITCH_HINT - MAP_SIZE)
+		report_failure(ADDR_SWITCH_HINT - MAP_SIZE, 0);
+
+	/*
+	 * We should never allocate at the requested address or above it
+	 * The len cross the 128TB boundary. Without MAP_FIXED
+	 * we will always search in the lower address space.
+	 */
+	a = __map_addr((ADDR_SWITCH_HINT - MAP_SIZE), 2*MAP_SIZE, 0, 1);
+	if ((unsigned long)a >= ADDR_SWITCH_HINT - MAP_SIZE)
+		report_failure(ADDR_SWITCH_HINT - MAP_SIZE, 0);
+	/*
+	 * Exact mapping at 128TB, the area is free we should get that
+	 * even without MAP_FIXED. Don't unmap. We check fixed in the
+	 * same range later.
+	 */
+	a = map_addr(ADDR_SWITCH_HINT, 0, 0);
+	if ((unsigned long)a != ADDR_SWITCH_HINT)
+		report_failure(ADDR_SWITCH_HINT, 0);
+
+	a = map_addr(ADDR_SWITCH_HINT + MAP_SIZE, 0, 1);
+	if ((unsigned long)a < ADDR_SWITCH_HINT)
+		report_failure(ADDR_SWITCH_HINT, 0);
+
+#if 0
+	/*
+	 * Enable this with stack mapped MAP_SIZE below 128TB
+	 */
+	a = map_addr((ADDR_SWITCH_HINT - MAP_SIZE), MAP_FIXED, 1);
+	if ((unsigned long)a != ADDR_SWITCH_HINT - MAP_SIZE)
+		report_failure(ADDR_SWITCH_HINT - MAP_SIZE, 0);
+	a = __map_addr((ADDR_SWITCH_HINT - MAP_SIZE), 2*MAP_SIZE, MAP_FIXED, 1);
+	if ((unsigned long)a != ADDR_SWITCH_HINT - MAP_SIZE)
+		report_failure(ADDR_SWITCH_HINT - MAP_SIZE, MAP_FIXED);
+
+#endif
+	a = map_addr(ADDR_SWITCH_HINT, MAP_FIXED, 1);
+	if ((unsigned long)a != ADDR_SWITCH_HINT)
+		report_failure(ADDR_SWITCH_HINT, MAP_FIXED);
+}
+
+static int supported_arch(void)
+{
+#if defined(__powerpc64__)
+	return 1;
+#elif defined(__x86_64__)
+	return 1;
+#else
+	return 0;
+#endif
+
+}
+
+int main(int argc, char *argv[])
+{
+	int *a;
+
+	if (!supported_arch())
+		return 0;
+	/*
+	 * check the 128TB boundary before we update addr_limit
+	 */
+	boundary_check();
+
+	a = map_addr(0, 0, 1);
+	if ((unsigned long)a > ADDR_SWITCH_HINT)
+		report_failure(0, 0);
+
+	a = map_addr(-1, 0, 1);
+	if ((unsigned long)a < ADDR_SWITCH_HINT)
+		report_failure(-1, 0);
+
+	/* don't unmap this one */
+	a = map_addr((1UL << 48), 0, 0);
+	if ((unsigned long)a != 1UL << 48)
+		report_failure((1UL << 48), 0);
+
+	a = map_addr((1UL << 48), 0, 1);
+	if ((unsigned long)a < ADDR_SWITCH_HINT)
+		report_failure((1UL << 48), 0);
+
+	a = map_addr(0, 0, 1);
+	if ((unsigned long)a > ADDR_SWITCH_HINT)
+		report_failure(0, 0);
+
+	a = map_addr(-1, 0, 1);
+	if ((unsigned long)a < ADDR_SWITCH_HINT)
+		report_failure(-1, 0);
+	/*
+	 * Try boundary conditions again after we allocated something above 128TB
+	 * and updated addr_limit.
+	 */
+	boundary_check();
+}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
@ 2017-11-22  5:41     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2017-11-22  5:41 UTC (permalink / raw)
  To: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin
  Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:

> With 5-level paging, we have 56-bit virtual address space available for
> userspace. But we don't want to expose userspace to addresses above
> 47-bits, unless it asked specifically for it.
>
> We use mmap(2) hint address as a way for kernel to know if it's okay to
> allocate virtual memory above 47-bit.
>
> Let's add a self-test that covers few corner cases of the interface.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Can we move this to selftest/vm/ ? I had a variant which i was using to
test issues on ppc64. One change we did recently was to use >=128TB as
the hint addr value to select larger address space. I also would like to
check for exact mmap return addr in some case. Attaching below the test
i was using. I will check whether this patch can be updated to test what
is converted in my selftest. I also want to do the boundary check twice.
The hash trasnslation mode in POWER require us to track addr limit and
we had bugs around address space slection before and after updating the
addr limit.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
  2017-11-22  5:41     ` Aneesh Kumar K.V
@ 2017-11-22  8:11       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-22  8:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin,
	linux-mm, linux-kernel

On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> 
> > With 5-level paging, we have 56-bit virtual address space available for
> > userspace. But we don't want to expose userspace to addresses above
> > 47-bits, unless it asked specifically for it.
> >
> > We use mmap(2) hint address as a way for kernel to know if it's okay to
> > allocate virtual memory above 47-bit.
> >
> > Let's add a self-test that covers few corner cases of the interface.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Can we move this to selftest/vm/ ? I had a variant which i was using to
> test issues on ppc64. One change we did recently was to use >=128TB as
> the hint addr value to select larger address space. I also would like to
> check for exact mmap return addr in some case. Attaching below the test
> i was using. I will check whether this patch can be updated to test what
> is converted in my selftest. I also want to do the boundary check twice.
> The hash trasnslation mode in POWER require us to track addr limit and
> we had bugs around address space slection before and after updating the
> addr limit.

Feel free to move it to selftest/vm. I don't have time to test setup and
test it on Power myself, but this would be great.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
@ 2017-11-22  8:11       ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-22  8:11 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin,
	linux-mm, linux-kernel

On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> 
> > With 5-level paging, we have 56-bit virtual address space available for
> > userspace. But we don't want to expose userspace to addresses above
> > 47-bits, unless it asked specifically for it.
> >
> > We use mmap(2) hint address as a way for kernel to know if it's okay to
> > allocate virtual memory above 47-bit.
> >
> > Let's add a self-test that covers few corner cases of the interface.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Can we move this to selftest/vm/ ? I had a variant which i was using to
> test issues on ppc64. One change we did recently was to use >=128TB as
> the hint addr value to select larger address space. I also would like to
> check for exact mmap return addr in some case. Attaching below the test
> i was using. I will check whether this patch can be updated to test what
> is converted in my selftest. I also want to do the boundary check twice.
> The hash trasnslation mode in POWER require us to track addr limit and
> we had bugs around address space slection before and after updating the
> addr limit.

Feel free to move it to selftest/vm. I don't have time to test setup and
test it on Power myself, but this would be great.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
  2017-11-22  8:11       ` Kirill A. Shutemov
@ 2017-11-22 11:36         ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2017-11-22 11:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin,
	linux-mm, linux-kernel

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote:
>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>> 
>> > With 5-level paging, we have 56-bit virtual address space available for
>> > userspace. But we don't want to expose userspace to addresses above
>> > 47-bits, unless it asked specifically for it.
>> >
>> > We use mmap(2) hint address as a way for kernel to know if it's okay to
>> > allocate virtual memory above 47-bit.
>> >
>> > Let's add a self-test that covers few corner cases of the interface.
>> >
>> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> 
>> Can we move this to selftest/vm/ ? I had a variant which i was using to
>> test issues on ppc64. One change we did recently was to use >=128TB as
>> the hint addr value to select larger address space. I also would like to
>> check for exact mmap return addr in some case. Attaching below the test
>> i was using. I will check whether this patch can be updated to test what
>> is converted in my selftest. I also want to do the boundary check twice.
>> The hash trasnslation mode in POWER require us to track addr limit and
>> we had bugs around address space slection before and after updating the
>> addr limit.
>
> Feel free to move it to selftest/vm. I don't have time to test setup and
> test it on Power myself, but this would be great.
>

How about the below? Do you want me to send this as a patch to the list? 

#include <stdio.h>
#include <sys/mman.h>
#include <string.h>

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))

#ifdef __powerpc64__
#define PAGE_SIZE	64*1024
/*
 * This will work with 16M and 2M hugepage size
 */
#define HUGETLB_SIZE	16*1024*1024
#else
#define PAGE_SIZE	4096
#define HUGETLB_SIZE	2*1024*1024
#endif

/*
 * >= 128TB is the hint addr value we used to select
 * large address space.
 */
#define ADDR_SWITCH_HINT (1UL << 47)
#define LOW_ADDR	((void *) (1UL << 30))
#define HIGH_ADDR	((void *) (1UL << 48))

struct testcase {
	void *addr;
	unsigned long size;
	unsigned long flags;
	const char *msg;
	unsigned int addr_check_cond;
	unsigned int low_addr_required:1;
	unsigned int keep_mapped:1;
};

static struct testcase testcases[] = {
	{
		/*
		 * If stack is moved, we could possibly allocate
		 * this at the requested address.
		 */
		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
		.low_addr_required = 1,
	},
	{
		/*
		 * We should never allocate at the requested address or above it
		 * The len cross the 128TB boundary. Without MAP_FIXED
		 * we will always search in the lower address space.
		 */
		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))",
		.low_addr_required = 1,
	},
	{
		/*
		 * Exact mapping at 128TB, the area is free we should get that
		 * even without MAP_FIXED.
		 */
		.addr = ((void *)(ADDR_SWITCH_HINT)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
		.keep_mapped = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
	},
	{
		.addr = NULL,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(NULL)",
		.low_addr_required = 1,
	},
	{
		.addr = LOW_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(LOW_ADDR)",
		.low_addr_required = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR)",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR) again",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
	},
	{
		.addr = (void*) -1,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1)",
		.keep_mapped = 1,
	},
	{
		.addr = (void*) -1,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1) again",
	},
	{
		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
		.low_addr_required = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)",
		.low_addr_required = 1,
		.keep_mapped = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)",
		.low_addr_required = 1,
		.keep_mapped = 1,
	},
	{
		.addr = ((void *)(ADDR_SWITCH_HINT)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
	},
};

static struct testcase hugetlb_testcases[] = {
	{
		.addr = NULL,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(NULL, MAP_HUGETLB)",
		.low_addr_required = 1,
	},
	{
		.addr = LOW_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
		.low_addr_required = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
	},
	{
		.addr = (void*) -1,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1, MAP_HUGETLB)",
		.keep_mapped = 1,
	},
	{
		.addr = (void*) -1,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1, MAP_HUGETLB) again",
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
		.size = 2 * HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
		.low_addr_required = 1,
		.keep_mapped = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT),
		.size = 2 * HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)",
	},
};

static void run_test(struct testcase *test, int count)
{
	int i;
	void *p;

	for (i = 0; i < count; i++) {
		struct testcase *t = test + i;

		p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0);

		printf("%s: %p - ", t->msg, p);

		if (p == MAP_FAILED) {
			printf("FAILED\n");
			continue;
		}

		if (t->low_addr_required && p >= (void *)(1UL << 47))
			printf("FAILED\n");
		else {
			/*
			 * Do a dereference of the address returned so that we catch
			 * bugs in page fault handling
			 */
			*(int *)p = 10;
			printf("OK\n");
		}
		if (!t->keep_mapped)
			munmap(p, t->size);
	}
}

static int supported_arch(void)
{
#if defined(__powerpc64__)
	return 1;
#elif defined(__x86_64__)
	return 1;
#else
	return 0;
#endif
}

int main(int argc, char **argv)
{
	if (!supported_arch())
		return 0;

	run_test(testcases, ARRAY_SIZE(testcases));
	if (argc == 2 && !strcmp(argv[1], "--run_hugetlb"))
		run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases));
	return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
@ 2017-11-22 11:36         ` Aneesh Kumar K.V
  0 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2017-11-22 11:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin,
	linux-mm, linux-kernel

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote:
>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>> 
>> > With 5-level paging, we have 56-bit virtual address space available for
>> > userspace. But we don't want to expose userspace to addresses above
>> > 47-bits, unless it asked specifically for it.
>> >
>> > We use mmap(2) hint address as a way for kernel to know if it's okay to
>> > allocate virtual memory above 47-bit.
>> >
>> > Let's add a self-test that covers few corner cases of the interface.
>> >
>> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> 
>> Can we move this to selftest/vm/ ? I had a variant which i was using to
>> test issues on ppc64. One change we did recently was to use >=128TB as
>> the hint addr value to select larger address space. I also would like to
>> check for exact mmap return addr in some case. Attaching below the test
>> i was using. I will check whether this patch can be updated to test what
>> is converted in my selftest. I also want to do the boundary check twice.
>> The hash trasnslation mode in POWER require us to track addr limit and
>> we had bugs around address space slection before and after updating the
>> addr limit.
>
> Feel free to move it to selftest/vm. I don't have time to test setup and
> test it on Power myself, but this would be great.
>

How about the below? Do you want me to send this as a patch to the list? 

#include <stdio.h>
#include <sys/mman.h>
#include <string.h>

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))

#ifdef __powerpc64__
#define PAGE_SIZE	64*1024
/*
 * This will work with 16M and 2M hugepage size
 */
#define HUGETLB_SIZE	16*1024*1024
#else
#define PAGE_SIZE	4096
#define HUGETLB_SIZE	2*1024*1024
#endif

/*
 * >= 128TB is the hint addr value we used to select
 * large address space.
 */
#define ADDR_SWITCH_HINT (1UL << 47)
#define LOW_ADDR	((void *) (1UL << 30))
#define HIGH_ADDR	((void *) (1UL << 48))

struct testcase {
	void *addr;
	unsigned long size;
	unsigned long flags;
	const char *msg;
	unsigned int addr_check_cond;
	unsigned int low_addr_required:1;
	unsigned int keep_mapped:1;
};

static struct testcase testcases[] = {
	{
		/*
		 * If stack is moved, we could possibly allocate
		 * this at the requested address.
		 */
		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
		.low_addr_required = 1,
	},
	{
		/*
		 * We should never allocate at the requested address or above it
		 * The len cross the 128TB boundary. Without MAP_FIXED
		 * we will always search in the lower address space.
		 */
		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))",
		.low_addr_required = 1,
	},
	{
		/*
		 * Exact mapping at 128TB, the area is free we should get that
		 * even without MAP_FIXED.
		 */
		.addr = ((void *)(ADDR_SWITCH_HINT)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
		.keep_mapped = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
	},
	{
		.addr = NULL,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(NULL)",
		.low_addr_required = 1,
	},
	{
		.addr = LOW_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(LOW_ADDR)",
		.low_addr_required = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR)",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR) again",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
	},
	{
		.addr = (void*) -1,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1)",
		.keep_mapped = 1,
	},
	{
		.addr = (void*) -1,
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1) again",
	},
	{
		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
		.low_addr_required = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)",
		.low_addr_required = 1,
		.keep_mapped = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)",
		.low_addr_required = 1,
		.keep_mapped = 1,
	},
	{
		.addr = ((void *)(ADDR_SWITCH_HINT)),
		.size = PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT),
		.size = 2 * PAGE_SIZE,
		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
	},
};

static struct testcase hugetlb_testcases[] = {
	{
		.addr = NULL,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(NULL, MAP_HUGETLB)",
		.low_addr_required = 1,
	},
	{
		.addr = LOW_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
		.low_addr_required = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
		.keep_mapped = 1,
	},
	{
		.addr = HIGH_ADDR,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
	},
	{
		.addr = (void*) -1,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1, MAP_HUGETLB)",
		.keep_mapped = 1,
	},
	{
		.addr = (void*) -1,
		.size = HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap(-1, MAP_HUGETLB) again",
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
		.size = 2 * HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
		.low_addr_required = 1,
		.keep_mapped = 1,
	},
	{
		.addr = (void *)(ADDR_SWITCH_HINT),
		.size = 2 * HUGETLB_SIZE,
		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
		.msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)",
	},
};

static void run_test(struct testcase *test, int count)
{
	int i;
	void *p;

	for (i = 0; i < count; i++) {
		struct testcase *t = test + i;

		p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0);

		printf("%s: %p - ", t->msg, p);

		if (p == MAP_FAILED) {
			printf("FAILED\n");
			continue;
		}

		if (t->low_addr_required && p >= (void *)(1UL << 47))
			printf("FAILED\n");
		else {
			/*
			 * Do a dereference of the address returned so that we catch
			 * bugs in page fault handling
			 */
			*(int *)p = 10;
			printf("OK\n");
		}
		if (!t->keep_mapped)
			munmap(p, t->size);
	}
}

static int supported_arch(void)
{
#if defined(__powerpc64__)
	return 1;
#elif defined(__x86_64__)
	return 1;
#else
	return 0;
#endif
}

int main(int argc, char **argv)
{
	if (!supported_arch())
		return 0;

	run_test(testcases, ARRAY_SIZE(testcases));
	if (argc == 2 && !strcmp(argv[1], "--run_hugetlb"))
		run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases));
	return 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
  2017-11-22 11:36         ` Aneesh Kumar K.V
@ 2017-11-22 12:40           ` Kirill A. Shutemov
  -1 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-22 12:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin,
	linux-mm, linux-kernel

On Wed, Nov 22, 2017 at 05:06:27PM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote:
> >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> >> 
> >> > With 5-level paging, we have 56-bit virtual address space available for
> >> > userspace. But we don't want to expose userspace to addresses above
> >> > 47-bits, unless it asked specifically for it.
> >> >
> >> > We use mmap(2) hint address as a way for kernel to know if it's okay to
> >> > allocate virtual memory above 47-bit.
> >> >
> >> > Let's add a self-test that covers few corner cases of the interface.
> >> >
> >> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> 
> >> Can we move this to selftest/vm/ ? I had a variant which i was using to
> >> test issues on ppc64. One change we did recently was to use >=128TB as
> >> the hint addr value to select larger address space. I also would like to
> >> check for exact mmap return addr in some case. Attaching below the test
> >> i was using. I will check whether this patch can be updated to test what
> >> is converted in my selftest. I also want to do the boundary check twice.
> >> The hash trasnslation mode in POWER require us to track addr limit and
> >> we had bugs around address space slection before and after updating the
> >> addr limit.
> >
> > Feel free to move it to selftest/vm. I don't have time to test setup and
> > test it on Power myself, but this would be great.
> >
> 
> How about the below? Do you want me to send this as a patch to the list? 

Yes, please. It actually triggered one hugetlb bug I made (patch sent).

Some feedback below.

> #include <stdio.h>
> #include <sys/mman.h>
> #include <string.h>
> 
> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
> 
> #ifdef __powerpc64__
> #define PAGE_SIZE	64*1024
> /*
>  * This will work with 16M and 2M hugepage size
>  */
> #define HUGETLB_SIZE	16*1024*1024
> #else
> #define PAGE_SIZE	4096
> #define HUGETLB_SIZE	2*1024*1024
> #endif
> 
> /*
>  * >= 128TB is the hint addr value we used to select
>  * large address space.
>  */
> #define ADDR_SWITCH_HINT (1UL << 47)
> #define LOW_ADDR	((void *) (1UL << 30))
> #define HIGH_ADDR	((void *) (1UL << 48))
> 
> struct testcase {
> 	void *addr;
> 	unsigned long size;
> 	unsigned long flags;
> 	const char *msg;
> 	unsigned int addr_check_cond;

Unused?

> 	unsigned int low_addr_required:1;
> 	unsigned int keep_mapped:1;
> };
> 
> static struct testcase testcases[] = {
> 	{
> 		/*
> 		 * If stack is moved, we could possibly allocate
> 		 * this at the requested address.
> 		 */
> 		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		/*
> 		 * We should never allocate at the requested address or above it
> 		 * The len cross the 128TB boundary. Without MAP_FIXED
> 		 * we will always search in the lower address space.
> 		 */
> 		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		/*
> 		 * Exact mapping at 128TB, the area is free we should get that
> 		 * even without MAP_FIXED.
> 		 */
> 		.addr = ((void *)(ADDR_SWITCH_HINT)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
> 	},
> 	{
> 		.addr = NULL,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(NULL)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = LOW_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(LOW_ADDR)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR) again",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1) again",
> 	},
> 	{
> 		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)",
> 		.low_addr_required = 1,
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)",
> 		.low_addr_required = 1,
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = ((void *)(ADDR_SWITCH_HINT)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
> 	},
> };
> 
> static struct testcase hugetlb_testcases[] = {
> 	{
> 		.addr = NULL,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(NULL, MAP_HUGETLB)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = LOW_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1, MAP_HUGETLB)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1, MAP_HUGETLB) again",
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
> 		.size = 2 * HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
> 		.low_addr_required = 1,
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT),
> 		.size = 2 * HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)",
> 	},
> };
> 
> static void run_test(struct testcase *test, int count)
> {
> 	int i;
> 	void *p;
> 
> 	for (i = 0; i < count; i++) {
> 		struct testcase *t = test + i;
> 
> 		p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0);
> 
> 		printf("%s: %p - ", t->msg, p);
> 
> 		if (p == MAP_FAILED) {
> 			printf("FAILED\n");
> 			continue;
> 		}
> 
> 		if (t->low_addr_required && p >= (void *)(1UL << 47))
> 			printf("FAILED\n");
> 		else {
> 			/*
> 			 * Do a dereference of the address returned so that we catch
> 			 * bugs in page fault handling
> 			 */
> 			*(int *)p = 10;
> 			printf("OK\n");
> 		}
> 		if (!t->keep_mapped)
> 			munmap(p, t->size);
> 	}
> }
> 
> static int supported_arch(void)
> {
> #if defined(__powerpc64__)
> 	return 1;
> #elif defined(__x86_64__)
> 	return 1;
> #else
> 	return 0;
> #endif
> }
> 
> int main(int argc, char **argv)
> {
> 	if (!supported_arch())
> 		return 0;
> 
> 	run_test(testcases, ARRAY_SIZE(testcases));
> 	if (argc == 2 && !strcmp(argv[1], "--run_hugetlb"))

--run-hugetlb would be better I think.

> 		run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases));
> 	return 0;
> }
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging
@ 2017-11-22 12:40           ` Kirill A. Shutemov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill A. Shutemov @ 2017-11-22 12:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner,
	H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin,
	linux-mm, linux-kernel

On Wed, Nov 22, 2017 at 05:06:27PM +0530, Aneesh Kumar K.V wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote:
> >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> >> 
> >> > With 5-level paging, we have 56-bit virtual address space available for
> >> > userspace. But we don't want to expose userspace to addresses above
> >> > 47-bits, unless it asked specifically for it.
> >> >
> >> > We use mmap(2) hint address as a way for kernel to know if it's okay to
> >> > allocate virtual memory above 47-bit.
> >> >
> >> > Let's add a self-test that covers few corner cases of the interface.
> >> >
> >> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> 
> >> Can we move this to selftest/vm/ ? I had a variant which i was using to
> >> test issues on ppc64. One change we did recently was to use >=128TB as
> >> the hint addr value to select larger address space. I also would like to
> >> check for exact mmap return addr in some case. Attaching below the test
> >> i was using. I will check whether this patch can be updated to test what
> >> is converted in my selftest. I also want to do the boundary check twice.
> >> The hash trasnslation mode in POWER require us to track addr limit and
> >> we had bugs around address space slection before and after updating the
> >> addr limit.
> >
> > Feel free to move it to selftest/vm. I don't have time to test setup and
> > test it on Power myself, but this would be great.
> >
> 
> How about the below? Do you want me to send this as a patch to the list? 

Yes, please. It actually triggered one hugetlb bug I made (patch sent).

Some feedback below.

> #include <stdio.h>
> #include <sys/mman.h>
> #include <string.h>
> 
> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
> 
> #ifdef __powerpc64__
> #define PAGE_SIZE	64*1024
> /*
>  * This will work with 16M and 2M hugepage size
>  */
> #define HUGETLB_SIZE	16*1024*1024
> #else
> #define PAGE_SIZE	4096
> #define HUGETLB_SIZE	2*1024*1024
> #endif
> 
> /*
>  * >= 128TB is the hint addr value we used to select
>  * large address space.
>  */
> #define ADDR_SWITCH_HINT (1UL << 47)
> #define LOW_ADDR	((void *) (1UL << 30))
> #define HIGH_ADDR	((void *) (1UL << 48))
> 
> struct testcase {
> 	void *addr;
> 	unsigned long size;
> 	unsigned long flags;
> 	const char *msg;
> 	unsigned int addr_check_cond;

Unused?

> 	unsigned int low_addr_required:1;
> 	unsigned int keep_mapped:1;
> };
> 
> static struct testcase testcases[] = {
> 	{
> 		/*
> 		 * If stack is moved, we could possibly allocate
> 		 * this at the requested address.
> 		 */
> 		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		/*
> 		 * We should never allocate at the requested address or above it
> 		 * The len cross the 128TB boundary. Without MAP_FIXED
> 		 * we will always search in the lower address space.
> 		 */
> 		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		/*
> 		 * Exact mapping at 128TB, the area is free we should get that
> 		 * even without MAP_FIXED.
> 		 */
> 		.addr = ((void *)(ADDR_SWITCH_HINT)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
> 	},
> 	{
> 		.addr = NULL,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(NULL)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = LOW_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(LOW_ADDR)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR) again",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(HIGH_ADDR, MAP_FIXED)",
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1) again",
> 	},
> 	{
> 		.addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)",
> 		.low_addr_required = 1,
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)",
> 		.low_addr_required = 1,
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = ((void *)(ADDR_SWITCH_HINT)),
> 		.size = PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)",
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT),
> 		.size = 2 * PAGE_SIZE,
> 		.flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)",
> 	},
> };
> 
> static struct testcase hugetlb_testcases[] = {
> 	{
> 		.addr = NULL,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(NULL, MAP_HUGETLB)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = LOW_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(LOW_ADDR, MAP_HUGETLB)",
> 		.low_addr_required = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = HIGH_ADDR,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)",
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1, MAP_HUGETLB)",
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void*) -1,
> 		.size = HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap(-1, MAP_HUGETLB) again",
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE),
> 		.size = 2 * HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
> 		.msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)",
> 		.low_addr_required = 1,
> 		.keep_mapped = 1,
> 	},
> 	{
> 		.addr = (void *)(ADDR_SWITCH_HINT),
> 		.size = 2 * HUGETLB_SIZE,
> 		.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,
> 		.msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)",
> 	},
> };
> 
> static void run_test(struct testcase *test, int count)
> {
> 	int i;
> 	void *p;
> 
> 	for (i = 0; i < count; i++) {
> 		struct testcase *t = test + i;
> 
> 		p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0);
> 
> 		printf("%s: %p - ", t->msg, p);
> 
> 		if (p == MAP_FAILED) {
> 			printf("FAILED\n");
> 			continue;
> 		}
> 
> 		if (t->low_addr_required && p >= (void *)(1UL << 47))
> 			printf("FAILED\n");
> 		else {
> 			/*
> 			 * Do a dereference of the address returned so that we catch
> 			 * bugs in page fault handling
> 			 */
> 			*(int *)p = 10;
> 			printf("OK\n");
> 		}
> 		if (!t->keep_mapped)
> 			munmap(p, t->size);
> 	}
> }
> 
> static int supported_arch(void)
> {
> #if defined(__powerpc64__)
> 	return 1;
> #elif defined(__x86_64__)
> 	return 1;
> #else
> 	return 0;
> #endif
> }
> 
> int main(int argc, char **argv)
> {
> 	if (!supported_arch())
> 		return 0;
> 
> 	run_test(testcases, ARRAY_SIZE(testcases));
> 	if (argc == 2 && !strcmp(argv[1], "--run_hugetlb"))

--run-hugetlb would be better I think.

> 		run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases));
> 	return 0;
> }
> 

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-22 12:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-15 14:36 [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border Kirill A. Shutemov
2017-11-15 14:36 ` Kirill A. Shutemov
2017-11-15 14:36 ` [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging Kirill A. Shutemov
2017-11-15 14:36   ` Kirill A. Shutemov
2017-11-16 10:46   ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov
2017-11-22  5:41   ` [PATCHv3 2/2] " Aneesh Kumar K.V
2017-11-22  5:41     ` Aneesh Kumar K.V
2017-11-22  8:11     ` Kirill A. Shutemov
2017-11-22  8:11       ` Kirill A. Shutemov
2017-11-22 11:36       ` Aneesh Kumar K.V
2017-11-22 11:36         ` Aneesh Kumar K.V
2017-11-22 12:40         ` Kirill A. Shutemov
2017-11-22 12:40           ` Kirill A. Shutemov
2017-11-16 10:46 ` [tip:x86/urgent] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border tip-bot for Kirill A. Shutemov
2017-11-16 13:23 ` [PATCHv3 1/2] " Michal Hocko
2017-11-16 13:23   ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.