* [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border @ 2017-11-15 14:36 ` Kirill A. Shutemov 0 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw) To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel, Kirill A. Shutemov In case of 5-level paging, the kernel does not place any mapping above 47-bit, unless userspace explicitly asks for it. Userspace can request an allocation from the full address space by specifying the mmap address hint above 47-bit. Nicholas noticed that the current implementation violates this interface: If user space requests a mapping at the end of the 47-bit address space with a length which causes the mapping to cross the 47-bit border (DEFAULT_MAP_WINDOW), then the vma is partially in the address space below and above. Sanity check the mmap address hint so that start and end of the resulting vma are on the same side of the 47-bit border. If that's not the case fall back to the code path which ignores the address hint and allocate from the regular address space below 47-bit. [ tglx: Moved the address check to a function and massaged comment and changelog ] Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/include/asm/elf.h | 1 + arch/x86/kernel/sys_x86_64.c | 10 +++++++--- arch/x86/mm/hugetlbpage.c | 11 ++++++++--- arch/x86/mm/mmap.c | 46 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 62 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 3a091cea36c5..0d157d2a1e2a 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void) extern unsigned long task_size_32bit(void); extern unsigned long task_size_64bit(int full_addr_space); extern unsigned long get_mmap_base(int is_legacy); +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len); #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index a63fe77b3217..676774b9bb8d 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (len > TASK_SIZE) return -ENOMEM; + /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) return addr; @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, /* requesting a specific address */ if (addr) { - addr = PAGE_ALIGN(addr); + addr &= PAGE_MASK; + if (!mmap_address_hint_valid(addr, len)) + goto get_unmapped_area; + vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vm_start_gap(vma))) + if (!vma || addr + len <= vm_start_gap(vma)) return addr; } +get_unmapped_area: info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 8ae0000cbdb3..00b296617ca4 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, if (len > TASK_SIZE) return -ENOMEM; + /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) { if (prepare_hugepage_range(file, addr, len)) return -EINVAL; @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } if (addr) { - addr = ALIGN(addr, huge_page_size(h)); + addr &= huge_page_mask(h); + if (!mmap_address_hint_valid(addr, len)) + goto get_unmapped_area; + vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vm_start_gap(vma))) + if (!vma || addr + len <= vm_start_gap(vma)) return addr; } + +get_unmapped_area: if (mm->get_unmapped_area == arch_get_unmapped_area) return hugetlb_get_unmapped_area_bottomup(file, addr, len, pgoff, flags); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index a99679826846..62285fe77b0f 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma) return "[mpx]"; return NULL; } + +/** + * mmap_address_hint_valid - Validate the address hint of mmap + * @addr: Address hint + * @len: Mapping length + * + * Check whether @addr and @addr + @len result in a valid mapping. + * + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE. + * + * On 64bit with 5-level page tables another sanity check is required + * because mappings requested by mmap(@addr, 0) which cross the 47-bit + * virtual address boundary can cause the following theoretical issue: + * + * An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr + * is below the border of the 47-bit address space and @addr + @len is + * above the border. + * + * With 4-level paging this request succeeds, but the resulting mapping + * address will always be within the 47-bit virtual address space, because + * the hint address does not result in a valid mapping and is + * ignored. Hence applications which are not prepared to handle virtual + * addresses above 47-bit work correctly. + * + * With 5-level paging this request would be granted and result in a + * mapping which crosses the border of the 47-bit virtual address + * space. If the application cannot handle addresses above 47-bit this + * will lead to misbehaviour and hard to diagnose failures. + * + * Therefore ignore address hints which would result in a mapping crossing + * the 47-bit virtual address boundary. + * + * Note, that in the same scenario with MAP_FIXED the behaviour is + * different. The request with @addr < 47-bit and @addr + @len > 47-bit + * fails on a 4-level paging machine but succeeds on a 5-level paging + * machine. It is reasonable to expect that an application does not rely on + * the failure of such a fixed mapping request, so the restriction is not + * applied. + */ +bool mmap_address_hint_valid(unsigned long addr, unsigned long len) +{ + if (TASK_SIZE - len < addr) + return false; + + return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW); +} -- 2.15.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border @ 2017-11-15 14:36 ` Kirill A. Shutemov 0 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw) To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel, Kirill A. Shutemov In case of 5-level paging, the kernel does not place any mapping above 47-bit, unless userspace explicitly asks for it. Userspace can request an allocation from the full address space by specifying the mmap address hint above 47-bit. Nicholas noticed that the current implementation violates this interface: If user space requests a mapping at the end of the 47-bit address space with a length which causes the mapping to cross the 47-bit border (DEFAULT_MAP_WINDOW), then the vma is partially in the address space below and above. Sanity check the mmap address hint so that start and end of the resulting vma are on the same side of the 47-bit border. If that's not the case fall back to the code path which ignores the address hint and allocate from the regular address space below 47-bit. [ tglx: Moved the address check to a function and massaged comment and changelog ] Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/include/asm/elf.h | 1 + arch/x86/kernel/sys_x86_64.c | 10 +++++++--- arch/x86/mm/hugetlbpage.c | 11 ++++++++--- arch/x86/mm/mmap.c | 46 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 62 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 3a091cea36c5..0d157d2a1e2a 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void) extern unsigned long task_size_32bit(void); extern unsigned long task_size_64bit(int full_addr_space); extern unsigned long get_mmap_base(int is_legacy); +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len); #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index a63fe77b3217..676774b9bb8d 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (len > TASK_SIZE) return -ENOMEM; + /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) return addr; @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, /* requesting a specific address */ if (addr) { - addr = PAGE_ALIGN(addr); + addr &= PAGE_MASK; + if (!mmap_address_hint_valid(addr, len)) + goto get_unmapped_area; + vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vm_start_gap(vma))) + if (!vma || addr + len <= vm_start_gap(vma)) return addr; } +get_unmapped_area: info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 8ae0000cbdb3..00b296617ca4 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, if (len > TASK_SIZE) return -ENOMEM; + /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) { if (prepare_hugepage_range(file, addr, len)) return -EINVAL; @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } if (addr) { - addr = ALIGN(addr, huge_page_size(h)); + addr &= huge_page_mask(h); + if (!mmap_address_hint_valid(addr, len)) + goto get_unmapped_area; + vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vm_start_gap(vma))) + if (!vma || addr + len <= vm_start_gap(vma)) return addr; } + +get_unmapped_area: if (mm->get_unmapped_area == arch_get_unmapped_area) return hugetlb_get_unmapped_area_bottomup(file, addr, len, pgoff, flags); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index a99679826846..62285fe77b0f 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma) return "[mpx]"; return NULL; } + +/** + * mmap_address_hint_valid - Validate the address hint of mmap + * @addr: Address hint + * @len: Mapping length + * + * Check whether @addr and @addr + @len result in a valid mapping. + * + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE. + * + * On 64bit with 5-level page tables another sanity check is required + * because mappings requested by mmap(@addr, 0) which cross the 47-bit + * virtual address boundary can cause the following theoretical issue: + * + * An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr + * is below the border of the 47-bit address space and @addr + @len is + * above the border. + * + * With 4-level paging this request succeeds, but the resulting mapping + * address will always be within the 47-bit virtual address space, because + * the hint address does not result in a valid mapping and is + * ignored. Hence applications which are not prepared to handle virtual + * addresses above 47-bit work correctly. + * + * With 5-level paging this request would be granted and result in a + * mapping which crosses the border of the 47-bit virtual address + * space. If the application cannot handle addresses above 47-bit this + * will lead to misbehaviour and hard to diagnose failures. + * + * Therefore ignore address hints which would result in a mapping crossing + * the 47-bit virtual address boundary. + * + * Note, that in the same scenario with MAP_FIXED the behaviour is + * different. The request with @addr < 47-bit and @addr + @len > 47-bit + * fails on a 4-level paging machine but succeeds on a 5-level paging + * machine. It is reasonable to expect that an application does not rely on + * the failure of such a fixed mapping request, so the restriction is not + * applied. + */ +bool mmap_address_hint_valid(unsigned long addr, unsigned long len) +{ + if (TASK_SIZE - len < addr) + return false; + + return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW); +} -- 2.15.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging 2017-11-15 14:36 ` Kirill A. Shutemov @ 2017-11-15 14:36 ` Kirill A. Shutemov -1 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw) To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel, Kirill A. Shutemov With 5-level paging, we have 56-bit virtual address space available for userspace. But we don't want to expose userspace to addresses above 47-bits, unless it asked specifically for it. We use mmap(2) hint address as a way for kernel to know if it's okay to allocate virtual memory above 47-bit. Let's add a self-test that covers few corner cases of the interface. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- tools/testing/selftests/x86/5lvl.c | 177 +++++++++++++++++++++++++++++++++++ tools/testing/selftests/x86/Makefile | 2 +- 2 files changed, 178 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/5lvl.c diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c new file mode 100644 index 000000000000..2eafdcd4c2b3 --- /dev/null +++ b/tools/testing/selftests/x86/5lvl.c @@ -0,0 +1,177 @@ +#include <stdio.h> +#include <sys/mman.h> + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +#define PAGE_SIZE 4096 +#define LOW_ADDR ((void *) (1UL << 30)) +#define HIGH_ADDR ((void *) (1UL << 50)) + +struct testcase { + void *addr; + unsigned long size; + unsigned long flags; + const char *msg; + unsigned int low_addr_required:1; + unsigned int keep_mapped:1; +}; + +static struct testcase testcases[] = { + { + .addr = NULL, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED)", + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE / 2), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)", + }, + { + .addr = NULL, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - (2UL << 20)), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)", + }, +}; + +int main(int argc, char **argv) +{ + int i; + void *p; + + for (i = 0; i < ARRAY_SIZE(testcases); i++) { + struct testcase *t = testcases + i; + + p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0); + + printf("%s: %p - ", t->msg, p); + + if (p == MAP_FAILED) { + printf("FAILED\n"); + continue; + } + + if (t->low_addr_required && p >= (void *)(1UL << 47)) + printf("FAILED\n"); + else + printf("OK\n"); + if (!t->keep_mapped) + munmap(p, t->size); + } + return 0; +} diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 7b1adeee4b0f..939a337128db 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer -TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip +TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY) TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY) -- 2.15.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging @ 2017-11-15 14:36 ` Kirill A. Shutemov 0 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-15 14:36 UTC (permalink / raw) To: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel, Kirill A. Shutemov With 5-level paging, we have 56-bit virtual address space available for userspace. But we don't want to expose userspace to addresses above 47-bits, unless it asked specifically for it. We use mmap(2) hint address as a way for kernel to know if it's okay to allocate virtual memory above 47-bit. Let's add a self-test that covers few corner cases of the interface. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- tools/testing/selftests/x86/5lvl.c | 177 +++++++++++++++++++++++++++++++++++ tools/testing/selftests/x86/Makefile | 2 +- 2 files changed, 178 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/5lvl.c diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c new file mode 100644 index 000000000000..2eafdcd4c2b3 --- /dev/null +++ b/tools/testing/selftests/x86/5lvl.c @@ -0,0 +1,177 @@ +#include <stdio.h> +#include <sys/mman.h> + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +#define PAGE_SIZE 4096 +#define LOW_ADDR ((void *) (1UL << 30)) +#define HIGH_ADDR ((void *) (1UL << 50)) + +struct testcase { + void *addr; + unsigned long size; + unsigned long flags; + const char *msg; + unsigned int low_addr_required:1; + unsigned int keep_mapped:1; +}; + +static struct testcase testcases[] = { + { + .addr = NULL, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED)", + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE / 2), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)", + }, + { + .addr = NULL, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - (2UL << 20)), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)", + }, +}; + +int main(int argc, char **argv) +{ + int i; + void *p; + + for (i = 0; i < ARRAY_SIZE(testcases); i++) { + struct testcase *t = testcases + i; + + p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0); + + printf("%s: %p - ", t->msg, p); + + if (p == MAP_FAILED) { + printf("FAILED\n"); + continue; + } + + if (t->low_addr_required && p >= (void *)(1UL << 47)) + printf("FAILED\n"); + else + printf("OK\n"); + if (!t->keep_mapped) + munmap(p, t->size); + } + return 0; +} diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 7b1adeee4b0f..939a337128db 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer -TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip +TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY) TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY) -- 2.15.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [tip:x86/urgent] x86/selftests: Add test for mapping placement for 5-level paging 2017-11-15 14:36 ` Kirill A. Shutemov (?) @ 2017-11-16 10:46 ` tip-bot for Kirill A. Shutemov -1 siblings, 0 replies; 16+ messages in thread From: tip-bot for Kirill A. Shutemov @ 2017-11-16 10:46 UTC (permalink / raw) To: linux-tip-commits Cc: npiggin, luto, kirill.shutemov, hpa, torvalds, tglx, mingo, linux-kernel Commit-ID: 97f404ad3e53bf9ac598745066ba2f57c1da3039 Gitweb: https://git.kernel.org/tip/97f404ad3e53bf9ac598745066ba2f57c1da3039 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> AuthorDate: Wed, 15 Nov 2017 17:36:07 +0300 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 16 Nov 2017 11:43:12 +0100 x86/selftests: Add test for mapping placement for 5-level paging 5-level paging provides a 56-bit virtual address space for user space application. But the kernel defaults to mappings below the 47-bit address space boundary, which is the upper bound for 4-level paging, unless an application explicitely request it by using a mmap(2) address hint above the 47-bit boundary. The kernel prevents mappings which spawn across the 47-bit boundary unless mmap(2) was invoked with MAP_FIXED. Add a self-test that covers the corner cases of the interface and validates the correctness of the implementation. [ tglx: Massaged changelog once more ] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20171115143607.81541-2-kirill.shutemov@linux.intel.com --- tools/testing/selftests/x86/5lvl.c | 177 +++++++++++++++++++++++++++++++++++ tools/testing/selftests/x86/Makefile | 2 +- 2 files changed, 178 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/5lvl.c b/tools/testing/selftests/x86/5lvl.c new file mode 100644 index 0000000..2eafdcd --- /dev/null +++ b/tools/testing/selftests/x86/5lvl.c @@ -0,0 +1,177 @@ +#include <stdio.h> +#include <sys/mman.h> + +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +#define PAGE_SIZE 4096 +#define LOW_ADDR ((void *) (1UL << 30)) +#define HIGH_ADDR ((void *) (1UL << 50)) + +struct testcase { + void *addr; + unsigned long size; + unsigned long flags; + const char *msg; + unsigned int low_addr_required:1; + unsigned int keep_mapped:1; +}; + +static struct testcase testcases[] = { + { + .addr = NULL, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED)", + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE / 2), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 2 * PAGE_SIZE / 2)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 2 * PAGE_SIZE, + .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - PAGE_SIZE, 2 * PAGE_SIZE, MAP_FIXED)", + }, + { + .addr = NULL, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(NULL, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = LOW_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", + .low_addr_required = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", + .keep_mapped = 1, + }, + { + .addr = HIGH_ADDR, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB)", + .keep_mapped = 1, + }, + { + .addr = (void*) -1, + .size = 2UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap(-1, MAP_HUGETLB) again", + }, + { + .addr = (void *)((1UL << 47) - PAGE_SIZE), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, + .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", + .low_addr_required = 1, + .keep_mapped = 1, + }, + { + .addr = (void *)((1UL << 47) - (2UL << 20)), + .size = 4UL << 20, + .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, + .msg = "mmap((1UL << 47) - (2UL << 20), 4UL << 20, MAP_FIXED | MAP_HUGETLB)", + }, +}; + +int main(int argc, char **argv) +{ + int i; + void *p; + + for (i = 0; i < ARRAY_SIZE(testcases); i++) { + struct testcase *t = testcases + i; + + p = mmap(t->addr, t->size, PROT_NONE, t->flags, -1, 0); + + printf("%s: %p - ", t->msg, p); + + if (p == MAP_FAILED) { + printf("FAILED\n"); + continue; + } + + if (t->low_addr_required && p >= (void *)(1UL << 47)) + printf("FAILED\n"); + else + printf("OK\n"); + if (!t->keep_mapped) + munmap(p, t->size); + } + return 0; +} diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 7b1adee..939a337 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -11,7 +11,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer -TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip +TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip 5lvl TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY) TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging 2017-11-15 14:36 ` Kirill A. Shutemov @ 2017-11-22 5:41 ` Aneesh Kumar K.V -1 siblings, 0 replies; 16+ messages in thread From: Aneesh Kumar K.V @ 2017-11-22 5:41 UTC (permalink / raw) To: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel, Kirill A. Shutemov "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > With 5-level paging, we have 56-bit virtual address space available for > userspace. But we don't want to expose userspace to addresses above > 47-bits, unless it asked specifically for it. > > We use mmap(2) hint address as a way for kernel to know if it's okay to > allocate virtual memory above 47-bit. > > Let's add a self-test that covers few corner cases of the interface. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Can we move this to selftest/vm/ ? I had a variant which i was using to test issues on ppc64. One change we did recently was to use >=128TB as the hint addr value to select larger address space. I also would like to check for exact mmap return addr in some case. Attaching below the test i was using. I will check whether this patch can be updated to test what is converted in my selftest. I also want to do the boundary check twice. The hash trasnslation mode in POWER require us to track addr limit and we had bugs around address space slection before and after updating the addr limit. >From 7739eb02bb6b6602572a9c259e915ef23950aae1 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Date: Mon, 13 Nov 2017 10:41:10 +0530 Subject: [PATCH] selftest/mm: Add test for checking mmap across 128TB boundary Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/run_vmtests | 11 ++ tools/testing/selftests/vm/va_128TBswitch.c | 170 ++++++++++++++++++++++++++++ 3 files changed, 182 insertions(+) create mode 100644 tools/testing/selftests/vm/va_128TBswitch.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index cbb29e41ef2b..b1fb3cd7cf52 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -17,6 +17,7 @@ TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd TEST_GEN_FILES += mlock-random-test TEST_GEN_FILES += virtual_address_range +TEST_GEN_FILES += va_128TBswitch TEST_PROGS := run_vmtests diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests index 07548a1fa901..b367f7801b67 100755 --- a/tools/testing/selftests/vm/run_vmtests +++ b/tools/testing/selftests/vm/run_vmtests @@ -176,4 +176,15 @@ else echo "[PASS]" fi +echo "-----------------------------" +echo "running virtual address 128TB switch test" +echo "-----------------------------" +./va_128TBswitch +if [ $? -ne 0 ]; then + echo "[FAIL]" + exitcode=1 +else + echo "[PASS]" +fi + exit $exitcode diff --git a/tools/testing/selftests/vm/va_128TBswitch.c b/tools/testing/selftests/vm/va_128TBswitch.c new file mode 100644 index 000000000000..dfa501b825a8 --- /dev/null +++ b/tools/testing/selftests/vm/va_128TBswitch.c @@ -0,0 +1,170 @@ +/* + * Copyright IBM Corporation, 2017 + * Author Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#include <stdio.h> +#include <stdlib.h> +#include <sys/mman.h> + +#ifdef DEBUG +#define pr_debug(fmt, ...) printf(fmt, ##__VA_ARGS__) +#else +#define pr_debug(fmt, ...) +#endif + +/* + * >= 128TB is the hint addr value we used to select + * large address space. + */ +#define ADDR_SWITCH_HINT (1UL << 47) + +#ifdef __powerpc64__ +#define MAP_SIZE 64*1024 +#else +#define MAP_SIZE 4*1024 +#endif + + +void report_failure(long in_addr, unsigned long flags) +{ + printf("Failed to map 0x%lx with flags 0x%lx\n", in_addr, flags); + exit(1); +} + +int *__map_addr(long in_addr, int size, unsigned long flags, int unmap) +{ + int *addr; + + addr = (int *)mmap((void *)in_addr, size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE | flags, -1, 0); + if (addr == MAP_FAILED) + report_failure(in_addr, flags); + pr_debug("Mapped addr 0x%lx-0x%lx for request 0x%lx with flag 0x%lx\n", + (unsigned long)addr, ((unsigned long)addr + size), in_addr, flags); + /* + * Try to access to catch errors in fault handling/slb miss handling + */ + *addr = 10; + if (unmap) + munmap(addr, size); + return addr; +} + +int *map_addr(long in_addr, unsigned long flags, int unmap) +{ + return __map_addr(in_addr, MAP_SIZE, flags, unmap); +} + +void boundary_check(void) +{ + int *a; + + /* + * If stack is moved, we could possibly allocate + * this at the requested address. + */ + a = map_addr((ADDR_SWITCH_HINT - MAP_SIZE), 0, 1); + if ((unsigned long)a > ADDR_SWITCH_HINT - MAP_SIZE) + report_failure(ADDR_SWITCH_HINT - MAP_SIZE, 0); + + /* + * We should never allocate at the requested address or above it + * The len cross the 128TB boundary. Without MAP_FIXED + * we will always search in the lower address space. + */ + a = __map_addr((ADDR_SWITCH_HINT - MAP_SIZE), 2*MAP_SIZE, 0, 1); + if ((unsigned long)a >= ADDR_SWITCH_HINT - MAP_SIZE) + report_failure(ADDR_SWITCH_HINT - MAP_SIZE, 0); + /* + * Exact mapping at 128TB, the area is free we should get that + * even without MAP_FIXED. Don't unmap. We check fixed in the + * same range later. + */ + a = map_addr(ADDR_SWITCH_HINT, 0, 0); + if ((unsigned long)a != ADDR_SWITCH_HINT) + report_failure(ADDR_SWITCH_HINT, 0); + + a = map_addr(ADDR_SWITCH_HINT + MAP_SIZE, 0, 1); + if ((unsigned long)a < ADDR_SWITCH_HINT) + report_failure(ADDR_SWITCH_HINT, 0); + +#if 0 + /* + * Enable this with stack mapped MAP_SIZE below 128TB + */ + a = map_addr((ADDR_SWITCH_HINT - MAP_SIZE), MAP_FIXED, 1); + if ((unsigned long)a != ADDR_SWITCH_HINT - MAP_SIZE) + report_failure(ADDR_SWITCH_HINT - MAP_SIZE, 0); + a = __map_addr((ADDR_SWITCH_HINT - MAP_SIZE), 2*MAP_SIZE, MAP_FIXED, 1); + if ((unsigned long)a != ADDR_SWITCH_HINT - MAP_SIZE) + report_failure(ADDR_SWITCH_HINT - MAP_SIZE, MAP_FIXED); + +#endif + a = map_addr(ADDR_SWITCH_HINT, MAP_FIXED, 1); + if ((unsigned long)a != ADDR_SWITCH_HINT) + report_failure(ADDR_SWITCH_HINT, MAP_FIXED); +} + +static int supported_arch(void) +{ +#if defined(__powerpc64__) + return 1; +#elif defined(__x86_64__) + return 1; +#else + return 0; +#endif + +} + +int main(int argc, char *argv[]) +{ + int *a; + + if (!supported_arch()) + return 0; + /* + * check the 128TB boundary before we update addr_limit + */ + boundary_check(); + + a = map_addr(0, 0, 1); + if ((unsigned long)a > ADDR_SWITCH_HINT) + report_failure(0, 0); + + a = map_addr(-1, 0, 1); + if ((unsigned long)a < ADDR_SWITCH_HINT) + report_failure(-1, 0); + + /* don't unmap this one */ + a = map_addr((1UL << 48), 0, 0); + if ((unsigned long)a != 1UL << 48) + report_failure((1UL << 48), 0); + + a = map_addr((1UL << 48), 0, 1); + if ((unsigned long)a < ADDR_SWITCH_HINT) + report_failure((1UL << 48), 0); + + a = map_addr(0, 0, 1); + if ((unsigned long)a > ADDR_SWITCH_HINT) + report_failure(0, 0); + + a = map_addr(-1, 0, 1); + if ((unsigned long)a < ADDR_SWITCH_HINT) + report_failure(-1, 0); + /* + * Try boundary conditions again after we allocated something above 128TB + * and updated addr_limit. + */ + boundary_check(); +} -- 2.14.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging @ 2017-11-22 5:41 ` Aneesh Kumar K.V 0 siblings, 0 replies; 16+ messages in thread From: Aneesh Kumar K.V @ 2017-11-22 5:41 UTC (permalink / raw) To: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin Cc: Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > With 5-level paging, we have 56-bit virtual address space available for > userspace. But we don't want to expose userspace to addresses above > 47-bits, unless it asked specifically for it. > > We use mmap(2) hint address as a way for kernel to know if it's okay to > allocate virtual memory above 47-bit. > > Let's add a self-test that covers few corner cases of the interface. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Can we move this to selftest/vm/ ? I had a variant which i was using to test issues on ppc64. One change we did recently was to use >=128TB as the hint addr value to select larger address space. I also would like to check for exact mmap return addr in some case. Attaching below the test i was using. I will check whether this patch can be updated to test what is converted in my selftest. I also want to do the boundary check twice. The hash trasnslation mode in POWER require us to track addr limit and we had bugs around address space slection before and after updating the addr limit. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging 2017-11-22 5:41 ` Aneesh Kumar K.V @ 2017-11-22 8:11 ` Kirill A. Shutemov -1 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-22 8:11 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > > With 5-level paging, we have 56-bit virtual address space available for > > userspace. But we don't want to expose userspace to addresses above > > 47-bits, unless it asked specifically for it. > > > > We use mmap(2) hint address as a way for kernel to know if it's okay to > > allocate virtual memory above 47-bit. > > > > Let's add a self-test that covers few corner cases of the interface. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Can we move this to selftest/vm/ ? I had a variant which i was using to > test issues on ppc64. One change we did recently was to use >=128TB as > the hint addr value to select larger address space. I also would like to > check for exact mmap return addr in some case. Attaching below the test > i was using. I will check whether this patch can be updated to test what > is converted in my selftest. I also want to do the boundary check twice. > The hash trasnslation mode in POWER require us to track addr limit and > we had bugs around address space slection before and after updating the > addr limit. Feel free to move it to selftest/vm. I don't have time to test setup and test it on Power myself, but this would be great. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging @ 2017-11-22 8:11 ` Kirill A. Shutemov 0 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-22 8:11 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > > > With 5-level paging, we have 56-bit virtual address space available for > > userspace. But we don't want to expose userspace to addresses above > > 47-bits, unless it asked specifically for it. > > > > We use mmap(2) hint address as a way for kernel to know if it's okay to > > allocate virtual memory above 47-bit. > > > > Let's add a self-test that covers few corner cases of the interface. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > Can we move this to selftest/vm/ ? I had a variant which i was using to > test issues on ppc64. One change we did recently was to use >=128TB as > the hint addr value to select larger address space. I also would like to > check for exact mmap return addr in some case. Attaching below the test > i was using. I will check whether this patch can be updated to test what > is converted in my selftest. I also want to do the boundary check twice. > The hash trasnslation mode in POWER require us to track addr limit and > we had bugs around address space slection before and after updating the > addr limit. Feel free to move it to selftest/vm. I don't have time to test setup and test it on Power myself, but this would be great. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging 2017-11-22 8:11 ` Kirill A. Shutemov @ 2017-11-22 11:36 ` Aneesh Kumar K.V -1 siblings, 0 replies; 16+ messages in thread From: Aneesh Kumar K.V @ 2017-11-22 11:36 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel "Kirill A. Shutemov" <kirill@shutemov.name> writes: > On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote: >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >> >> > With 5-level paging, we have 56-bit virtual address space available for >> > userspace. But we don't want to expose userspace to addresses above >> > 47-bits, unless it asked specifically for it. >> > >> > We use mmap(2) hint address as a way for kernel to know if it's okay to >> > allocate virtual memory above 47-bit. >> > >> > Let's add a self-test that covers few corner cases of the interface. >> > >> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> >> Can we move this to selftest/vm/ ? I had a variant which i was using to >> test issues on ppc64. One change we did recently was to use >=128TB as >> the hint addr value to select larger address space. I also would like to >> check for exact mmap return addr in some case. Attaching below the test >> i was using. I will check whether this patch can be updated to test what >> is converted in my selftest. I also want to do the boundary check twice. >> The hash trasnslation mode in POWER require us to track addr limit and >> we had bugs around address space slection before and after updating the >> addr limit. > > Feel free to move it to selftest/vm. I don't have time to test setup and > test it on Power myself, but this would be great. > How about the below? Do you want me to send this as a patch to the list? #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) #ifdef __powerpc64__ #define PAGE_SIZE 64*1024 /* * This will work with 16M and 2M hugepage size */ #define HUGETLB_SIZE 16*1024*1024 #else #define PAGE_SIZE 4096 #define HUGETLB_SIZE 2*1024*1024 #endif /* * >= 128TB is the hint addr value we used to select * large address space. */ #define ADDR_SWITCH_HINT (1UL << 47) #define LOW_ADDR ((void *) (1UL << 30)) #define HIGH_ADDR ((void *) (1UL << 48)) struct testcase { void *addr; unsigned long size; unsigned long flags; const char *msg; unsigned int addr_check_cond; unsigned int low_addr_required:1; unsigned int keep_mapped:1; }; static struct testcase testcases[] = { { /* * If stack is moved, we could possibly allocate * this at the requested address. */ .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", .low_addr_required = 1, }, { /* * We should never allocate at the requested address or above it * The len cross the 128TB boundary. Without MAP_FIXED * we will always search in the lower address space. */ .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))", .low_addr_required = 1, }, { /* * Exact mapping at 128TB, the area is free we should get that * even without MAP_FIXED. */ .addr = ((void *)(ADDR_SWITCH_HINT)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", .keep_mapped = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", }, { .addr = NULL, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(NULL)", .low_addr_required = 1, }, { .addr = LOW_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(LOW_ADDR)", .low_addr_required = 1, }, { .addr = HIGH_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR)", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR) again", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(HIGH_ADDR, MAP_FIXED)", }, { .addr = (void*) -1, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1)", .keep_mapped = 1, }, { .addr = (void*) -1, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1) again", }, { .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", .low_addr_required = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)", .low_addr_required = 1, .keep_mapped = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)", .low_addr_required = 1, .keep_mapped = 1, }, { .addr = ((void *)(ADDR_SWITCH_HINT)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", }, { .addr = (void *)(ADDR_SWITCH_HINT), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", }, }; static struct testcase hugetlb_testcases[] = { { .addr = NULL, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(NULL, MAP_HUGETLB)", .low_addr_required = 1, }, { .addr = LOW_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", .low_addr_required = 1, }, { .addr = HIGH_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", }, { .addr = (void*) -1, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1, MAP_HUGETLB)", .keep_mapped = 1, }, { .addr = (void*) -1, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1, MAP_HUGETLB) again", }, { .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), .size = 2 * HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", .low_addr_required = 1, .keep_mapped = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT), .size = 2 * HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)", }, }; static void run_test(struct testcase *test, int count) { int i; void *p; for (i = 0; i < count; i++) { struct testcase *t = test + i; p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0); printf("%s: %p - ", t->msg, p); if (p == MAP_FAILED) { printf("FAILED\n"); continue; } if (t->low_addr_required && p >= (void *)(1UL << 47)) printf("FAILED\n"); else { /* * Do a dereference of the address returned so that we catch * bugs in page fault handling */ *(int *)p = 10; printf("OK\n"); } if (!t->keep_mapped) munmap(p, t->size); } } static int supported_arch(void) { #if defined(__powerpc64__) return 1; #elif defined(__x86_64__) return 1; #else return 0; #endif } int main(int argc, char **argv) { if (!supported_arch()) return 0; run_test(testcases, ARRAY_SIZE(testcases)); if (argc == 2 && !strcmp(argv[1], "--run_hugetlb")) run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases)); return 0; } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging @ 2017-11-22 11:36 ` Aneesh Kumar K.V 0 siblings, 0 replies; 16+ messages in thread From: Aneesh Kumar K.V @ 2017-11-22 11:36 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel "Kirill A. Shutemov" <kirill@shutemov.name> writes: > On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote: >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: >> >> > With 5-level paging, we have 56-bit virtual address space available for >> > userspace. But we don't want to expose userspace to addresses above >> > 47-bits, unless it asked specifically for it. >> > >> > We use mmap(2) hint address as a way for kernel to know if it's okay to >> > allocate virtual memory above 47-bit. >> > >> > Let's add a self-test that covers few corner cases of the interface. >> > >> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> >> Can we move this to selftest/vm/ ? I had a variant which i was using to >> test issues on ppc64. One change we did recently was to use >=128TB as >> the hint addr value to select larger address space. I also would like to >> check for exact mmap return addr in some case. Attaching below the test >> i was using. I will check whether this patch can be updated to test what >> is converted in my selftest. I also want to do the boundary check twice. >> The hash trasnslation mode in POWER require us to track addr limit and >> we had bugs around address space slection before and after updating the >> addr limit. > > Feel free to move it to selftest/vm. I don't have time to test setup and > test it on Power myself, but this would be great. > How about the below? Do you want me to send this as a patch to the list? #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) #ifdef __powerpc64__ #define PAGE_SIZE 64*1024 /* * This will work with 16M and 2M hugepage size */ #define HUGETLB_SIZE 16*1024*1024 #else #define PAGE_SIZE 4096 #define HUGETLB_SIZE 2*1024*1024 #endif /* * >= 128TB is the hint addr value we used to select * large address space. */ #define ADDR_SWITCH_HINT (1UL << 47) #define LOW_ADDR ((void *) (1UL << 30)) #define HIGH_ADDR ((void *) (1UL << 48)) struct testcase { void *addr; unsigned long size; unsigned long flags; const char *msg; unsigned int addr_check_cond; unsigned int low_addr_required:1; unsigned int keep_mapped:1; }; static struct testcase testcases[] = { { /* * If stack is moved, we could possibly allocate * this at the requested address. */ .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", .low_addr_required = 1, }, { /* * We should never allocate at the requested address or above it * The len cross the 128TB boundary. Without MAP_FIXED * we will always search in the lower address space. */ .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))", .low_addr_required = 1, }, { /* * Exact mapping at 128TB, the area is free we should get that * even without MAP_FIXED. */ .addr = ((void *)(ADDR_SWITCH_HINT)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", .keep_mapped = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", }, { .addr = NULL, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(NULL)", .low_addr_required = 1, }, { .addr = LOW_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(LOW_ADDR)", .low_addr_required = 1, }, { .addr = HIGH_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR)", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR) again", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(HIGH_ADDR, MAP_FIXED)", }, { .addr = (void*) -1, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1)", .keep_mapped = 1, }, { .addr = (void*) -1, .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1) again", }, { .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", .low_addr_required = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)", .low_addr_required = 1, .keep_mapped = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)", .low_addr_required = 1, .keep_mapped = 1, }, { .addr = ((void *)(ADDR_SWITCH_HINT)), .size = PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", }, { .addr = (void *)(ADDR_SWITCH_HINT), .size = 2 * PAGE_SIZE, .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", }, }; static struct testcase hugetlb_testcases[] = { { .addr = NULL, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(NULL, MAP_HUGETLB)", .low_addr_required = 1, }, { .addr = LOW_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", .low_addr_required = 1, }, { .addr = HIGH_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", .keep_mapped = 1, }, { .addr = HIGH_ADDR, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", }, { .addr = (void*) -1, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1, MAP_HUGETLB)", .keep_mapped = 1, }, { .addr = (void*) -1, .size = HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap(-1, MAP_HUGETLB) again", }, { .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), .size = 2 * HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", .low_addr_required = 1, .keep_mapped = 1, }, { .addr = (void *)(ADDR_SWITCH_HINT), .size = 2 * HUGETLB_SIZE, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, .msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)", }, }; static void run_test(struct testcase *test, int count) { int i; void *p; for (i = 0; i < count; i++) { struct testcase *t = test + i; p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0); printf("%s: %p - ", t->msg, p); if (p == MAP_FAILED) { printf("FAILED\n"); continue; } if (t->low_addr_required && p >= (void *)(1UL << 47)) printf("FAILED\n"); else { /* * Do a dereference of the address returned so that we catch * bugs in page fault handling */ *(int *)p = 10; printf("OK\n"); } if (!t->keep_mapped) munmap(p, t->size); } } static int supported_arch(void) { #if defined(__powerpc64__) return 1; #elif defined(__x86_64__) return 1; #else return 0; #endif } int main(int argc, char **argv) { if (!supported_arch()) return 0; run_test(testcases, ARRAY_SIZE(testcases)); if (argc == 2 && !strcmp(argv[1], "--run_hugetlb")) run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases)); return 0; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging 2017-11-22 11:36 ` Aneesh Kumar K.V @ 2017-11-22 12:40 ` Kirill A. Shutemov -1 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-22 12:40 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel On Wed, Nov 22, 2017 at 05:06:27PM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill@shutemov.name> writes: > > > On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote: > >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > >> > >> > With 5-level paging, we have 56-bit virtual address space available for > >> > userspace. But we don't want to expose userspace to addresses above > >> > 47-bits, unless it asked specifically for it. > >> > > >> > We use mmap(2) hint address as a way for kernel to know if it's okay to > >> > allocate virtual memory above 47-bit. > >> > > >> > Let's add a self-test that covers few corner cases of the interface. > >> > > >> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > >> > >> Can we move this to selftest/vm/ ? I had a variant which i was using to > >> test issues on ppc64. One change we did recently was to use >=128TB as > >> the hint addr value to select larger address space. I also would like to > >> check for exact mmap return addr in some case. Attaching below the test > >> i was using. I will check whether this patch can be updated to test what > >> is converted in my selftest. I also want to do the boundary check twice. > >> The hash trasnslation mode in POWER require us to track addr limit and > >> we had bugs around address space slection before and after updating the > >> addr limit. > > > > Feel free to move it to selftest/vm. I don't have time to test setup and > > test it on Power myself, but this would be great. > > > > How about the below? Do you want me to send this as a patch to the list? Yes, please. It actually triggered one hugetlb bug I made (patch sent). Some feedback below. > #include <stdio.h> > #include <sys/mman.h> > #include <string.h> > > #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) > > #ifdef __powerpc64__ > #define PAGE_SIZE 64*1024 > /* > * This will work with 16M and 2M hugepage size > */ > #define HUGETLB_SIZE 16*1024*1024 > #else > #define PAGE_SIZE 4096 > #define HUGETLB_SIZE 2*1024*1024 > #endif > > /* > * >= 128TB is the hint addr value we used to select > * large address space. > */ > #define ADDR_SWITCH_HINT (1UL << 47) > #define LOW_ADDR ((void *) (1UL << 30)) > #define HIGH_ADDR ((void *) (1UL << 48)) > > struct testcase { > void *addr; > unsigned long size; > unsigned long flags; > const char *msg; > unsigned int addr_check_cond; Unused? > unsigned int low_addr_required:1; > unsigned int keep_mapped:1; > }; > > static struct testcase testcases[] = { > { > /* > * If stack is moved, we could possibly allocate > * this at the requested address. > */ > .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", > .low_addr_required = 1, > }, > { > /* > * We should never allocate at the requested address or above it > * The len cross the 128TB boundary. Without MAP_FIXED > * we will always search in the lower address space. > */ > .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))", > .low_addr_required = 1, > }, > { > /* > * Exact mapping at 128TB, the area is free we should get that > * even without MAP_FIXED. > */ > .addr = ((void *)(ADDR_SWITCH_HINT)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", > .keep_mapped = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", > }, > { > .addr = NULL, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(NULL)", > .low_addr_required = 1, > }, > { > .addr = LOW_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(LOW_ADDR)", > .low_addr_required = 1, > }, > { > .addr = HIGH_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR)", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR) again", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(HIGH_ADDR, MAP_FIXED)", > }, > { > .addr = (void*) -1, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1)", > .keep_mapped = 1, > }, > { > .addr = (void*) -1, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1) again", > }, > { > .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", > .low_addr_required = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)", > .low_addr_required = 1, > .keep_mapped = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)", > .low_addr_required = 1, > .keep_mapped = 1, > }, > { > .addr = ((void *)(ADDR_SWITCH_HINT)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", > }, > { > .addr = (void *)(ADDR_SWITCH_HINT), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", > }, > }; > > static struct testcase hugetlb_testcases[] = { > { > .addr = NULL, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(NULL, MAP_HUGETLB)", > .low_addr_required = 1, > }, > { > .addr = LOW_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", > .low_addr_required = 1, > }, > { > .addr = HIGH_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", > }, > { > .addr = (void*) -1, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1, MAP_HUGETLB)", > .keep_mapped = 1, > }, > { > .addr = (void*) -1, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1, MAP_HUGETLB) again", > }, > { > .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), > .size = 2 * HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", > .low_addr_required = 1, > .keep_mapped = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT), > .size = 2 * HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)", > }, > }; > > static void run_test(struct testcase *test, int count) > { > int i; > void *p; > > for (i = 0; i < count; i++) { > struct testcase *t = test + i; > > p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0); > > printf("%s: %p - ", t->msg, p); > > if (p == MAP_FAILED) { > printf("FAILED\n"); > continue; > } > > if (t->low_addr_required && p >= (void *)(1UL << 47)) > printf("FAILED\n"); > else { > /* > * Do a dereference of the address returned so that we catch > * bugs in page fault handling > */ > *(int *)p = 10; > printf("OK\n"); > } > if (!t->keep_mapped) > munmap(p, t->size); > } > } > > static int supported_arch(void) > { > #if defined(__powerpc64__) > return 1; > #elif defined(__x86_64__) > return 1; > #else > return 0; > #endif > } > > int main(int argc, char **argv) > { > if (!supported_arch()) > return 0; > > run_test(testcases, ARRAY_SIZE(testcases)); > if (argc == 2 && !strcmp(argv[1], "--run_hugetlb")) --run-hugetlb would be better I think. > run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases)); > return 0; > } > -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging @ 2017-11-22 12:40 ` Kirill A. Shutemov 0 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2017-11-22 12:40 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Kirill A. Shutemov, Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel On Wed, Nov 22, 2017 at 05:06:27PM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kirill@shutemov.name> writes: > > > On Wed, Nov 22, 2017 at 11:11:36AM +0530, Aneesh Kumar K.V wrote: > >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes: > >> > >> > With 5-level paging, we have 56-bit virtual address space available for > >> > userspace. But we don't want to expose userspace to addresses above > >> > 47-bits, unless it asked specifically for it. > >> > > >> > We use mmap(2) hint address as a way for kernel to know if it's okay to > >> > allocate virtual memory above 47-bit. > >> > > >> > Let's add a self-test that covers few corner cases of the interface. > >> > > >> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > >> > >> Can we move this to selftest/vm/ ? I had a variant which i was using to > >> test issues on ppc64. One change we did recently was to use >=128TB as > >> the hint addr value to select larger address space. I also would like to > >> check for exact mmap return addr in some case. Attaching below the test > >> i was using. I will check whether this patch can be updated to test what > >> is converted in my selftest. I also want to do the boundary check twice. > >> The hash trasnslation mode in POWER require us to track addr limit and > >> we had bugs around address space slection before and after updating the > >> addr limit. > > > > Feel free to move it to selftest/vm. I don't have time to test setup and > > test it on Power myself, but this would be great. > > > > How about the below? Do you want me to send this as a patch to the list? Yes, please. It actually triggered one hugetlb bug I made (patch sent). Some feedback below. > #include <stdio.h> > #include <sys/mman.h> > #include <string.h> > > #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) > > #ifdef __powerpc64__ > #define PAGE_SIZE 64*1024 > /* > * This will work with 16M and 2M hugepage size > */ > #define HUGETLB_SIZE 16*1024*1024 > #else > #define PAGE_SIZE 4096 > #define HUGETLB_SIZE 2*1024*1024 > #endif > > /* > * >= 128TB is the hint addr value we used to select > * large address space. > */ > #define ADDR_SWITCH_HINT (1UL << 47) > #define LOW_ADDR ((void *) (1UL << 30)) > #define HIGH_ADDR ((void *) (1UL << 48)) > > struct testcase { > void *addr; > unsigned long size; > unsigned long flags; > const char *msg; > unsigned int addr_check_cond; Unused? > unsigned int low_addr_required:1; > unsigned int keep_mapped:1; > }; > > static struct testcase testcases[] = { > { > /* > * If stack is moved, we could possibly allocate > * this at the requested address. > */ > .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", > .low_addr_required = 1, > }, > { > /* > * We should never allocate at the requested address or above it > * The len cross the 128TB boundary. Without MAP_FIXED > * we will always search in the lower address space. > */ > .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, (2 * PAGE_SIZE))", > .low_addr_required = 1, > }, > { > /* > * Exact mapping at 128TB, the area is free we should get that > * even without MAP_FIXED. > */ > .addr = ((void *)(ADDR_SWITCH_HINT)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", > .keep_mapped = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", > }, > { > .addr = NULL, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(NULL)", > .low_addr_required = 1, > }, > { > .addr = LOW_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(LOW_ADDR)", > .low_addr_required = 1, > }, > { > .addr = HIGH_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR)", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR) again", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(HIGH_ADDR, MAP_FIXED)", > }, > { > .addr = (void*) -1, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1)", > .keep_mapped = 1, > }, > { > .addr = (void*) -1, > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1) again", > }, > { > .addr = ((void *)(ADDR_SWITCH_HINT - PAGE_SIZE)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, PAGE_SIZE)", > .low_addr_required = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2 * PAGE_SIZE)", > .low_addr_required = 1, > .keep_mapped = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE / 2), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT - PAGE_SIZE/2 , 2 * PAGE_SIZE)", > .low_addr_required = 1, > .keep_mapped = 1, > }, > { > .addr = ((void *)(ADDR_SWITCH_HINT)), > .size = PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(ADDR_SWITCH_HINT, PAGE_SIZE)", > }, > { > .addr = (void *)(ADDR_SWITCH_HINT), > .size = 2 * PAGE_SIZE, > .flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(ADDR_SWITCH_HINT, 2 * PAGE_SIZE, MAP_FIXED)", > }, > }; > > static struct testcase hugetlb_testcases[] = { > { > .addr = NULL, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(NULL, MAP_HUGETLB)", > .low_addr_required = 1, > }, > { > .addr = LOW_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(LOW_ADDR, MAP_HUGETLB)", > .low_addr_required = 1, > }, > { > .addr = HIGH_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR, MAP_HUGETLB)", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(HIGH_ADDR, MAP_HUGETLB) again", > .keep_mapped = 1, > }, > { > .addr = HIGH_ADDR, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB)", > }, > { > .addr = (void*) -1, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1, MAP_HUGETLB)", > .keep_mapped = 1, > }, > { > .addr = (void*) -1, > .size = HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap(-1, MAP_HUGETLB) again", > }, > { > .addr = (void *)(ADDR_SWITCH_HINT - PAGE_SIZE), > .size = 2 * HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, > .msg = "mmap((1UL << 47), 4UL << 20, MAP_HUGETLB)", > .low_addr_required = 1, > .keep_mapped = 1, > }, > { > .addr = (void *)(ADDR_SWITCH_HINT), > .size = 2 * HUGETLB_SIZE, > .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, > .msg = "mmap(ADDR_SWITCH_HINT , 2 * HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB)", > }, > }; > > static void run_test(struct testcase *test, int count) > { > int i; > void *p; > > for (i = 0; i < count; i++) { > struct testcase *t = test + i; > > p = mmap(t->addr, t->size, PROT_READ | PROT_WRITE, t->flags, -1, 0); > > printf("%s: %p - ", t->msg, p); > > if (p == MAP_FAILED) { > printf("FAILED\n"); > continue; > } > > if (t->low_addr_required && p >= (void *)(1UL << 47)) > printf("FAILED\n"); > else { > /* > * Do a dereference of the address returned so that we catch > * bugs in page fault handling > */ > *(int *)p = 10; > printf("OK\n"); > } > if (!t->keep_mapped) > munmap(p, t->size); > } > } > > static int supported_arch(void) > { > #if defined(__powerpc64__) > return 1; > #elif defined(__x86_64__) > return 1; > #else > return 0; > #endif > } > > int main(int argc, char **argv) > { > if (!supported_arch()) > return 0; > > run_test(testcases, ARRAY_SIZE(testcases)); > if (argc == 2 && !strcmp(argv[1], "--run_hugetlb")) --run-hugetlb would be better I think. > run_test(hugetlb_testcases, ARRAY_SIZE(hugetlb_testcases)); > return 0; > } > -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [tip:x86/urgent] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border 2017-11-15 14:36 ` Kirill A. Shutemov (?) (?) @ 2017-11-16 10:46 ` tip-bot for Kirill A. Shutemov -1 siblings, 0 replies; 16+ messages in thread From: tip-bot for Kirill A. Shutemov @ 2017-11-16 10:46 UTC (permalink / raw) To: linux-tip-commits Cc: kirill.shutemov, hpa, tglx, luto, linux-kernel, npiggin, torvalds, mingo Commit-ID: 1e0f25dbf2464df8445dd40881f4d9e732434947 Gitweb: https://git.kernel.org/tip/1e0f25dbf2464df8445dd40881f4d9e732434947 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> AuthorDate: Wed, 15 Nov 2017 17:36:06 +0300 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 16 Nov 2017 11:43:11 +0100 x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border In case of 5-level paging, the kernel does not place any mapping above 47-bit, unless userspace explicitly asks for it. Userspace can request an allocation from the full address space by specifying the mmap address hint above 47-bit. Nicholas noticed that the current implementation violates this interface: If user space requests a mapping at the end of the 47-bit address space with a length which causes the mapping to cross the 47-bit border (DEFAULT_MAP_WINDOW), then the vma is partially in the address space below and above. Sanity check the mmap address hint so that start and end of the resulting vma are on the same side of the 47-bit border. If that's not the case fall back to the code path which ignores the address hint and allocate from the regular address space below 47-bit. To make the checks consistent, mask out the address hints lower bits (either PAGE_MASK or huge_page_mask()) instead of using ALIGN() which can push them up to the next boundary. [ tglx: Moved the address check to a function and massaged comment and changelog ] Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20171115143607.81541-1-kirill.shutemov@linux.intel.com --- arch/x86/include/asm/elf.h | 1 + arch/x86/kernel/sys_x86_64.c | 10 +++++++--- arch/x86/mm/hugetlbpage.c | 11 ++++++++--- arch/x86/mm/mmap.c | 46 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 62 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 3a091ce..0d157d2 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void) extern unsigned long task_size_32bit(void); extern unsigned long task_size_64bit(int full_addr_space); extern unsigned long get_mmap_base(int is_legacy); +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len); #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index a63fe77..676774b 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, if (len > TASK_SIZE) return -ENOMEM; + /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) return addr; @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, /* requesting a specific address */ if (addr) { - addr = PAGE_ALIGN(addr); + addr &= PAGE_MASK; + if (!mmap_address_hint_valid(addr, len)) + goto get_unmapped_area; + vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vm_start_gap(vma))) + if (!vma || addr + len <= vm_start_gap(vma)) return addr; } +get_unmapped_area: info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 8ae0000..00b2966 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, if (len > TASK_SIZE) return -ENOMEM; + /* No address checking. See comment at mmap_address_hint_valid() */ if (flags & MAP_FIXED) { if (prepare_hugepage_range(file, addr, len)) return -EINVAL; @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } if (addr) { - addr = ALIGN(addr, huge_page_size(h)); + addr &= huge_page_mask(h); + if (!mmap_address_hint_valid(addr, len)) + goto get_unmapped_area; + vma = find_vma(mm, addr); - if (TASK_SIZE - len >= addr && - (!vma || addr + len <= vm_start_gap(vma))) + if (!vma || addr + len <= vm_start_gap(vma)) return addr; } + +get_unmapped_area: if (mm->get_unmapped_area == arch_get_unmapped_area) return hugetlb_get_unmapped_area_bottomup(file, addr, len, pgoff, flags); diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index a996798..62285fe 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma) return "[mpx]"; return NULL; } + +/** + * mmap_address_hint_valid - Validate the address hint of mmap + * @addr: Address hint + * @len: Mapping length + * + * Check whether @addr and @addr + @len result in a valid mapping. + * + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE. + * + * On 64bit with 5-level page tables another sanity check is required + * because mappings requested by mmap(@addr, 0) which cross the 47-bit + * virtual address boundary can cause the following theoretical issue: + * + * An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr + * is below the border of the 47-bit address space and @addr + @len is + * above the border. + * + * With 4-level paging this request succeeds, but the resulting mapping + * address will always be within the 47-bit virtual address space, because + * the hint address does not result in a valid mapping and is + * ignored. Hence applications which are not prepared to handle virtual + * addresses above 47-bit work correctly. + * + * With 5-level paging this request would be granted and result in a + * mapping which crosses the border of the 47-bit virtual address + * space. If the application cannot handle addresses above 47-bit this + * will lead to misbehaviour and hard to diagnose failures. + * + * Therefore ignore address hints which would result in a mapping crossing + * the 47-bit virtual address boundary. + * + * Note, that in the same scenario with MAP_FIXED the behaviour is + * different. The request with @addr < 47-bit and @addr + @len > 47-bit + * fails on a 4-level paging machine but succeeds on a 5-level paging + * machine. It is reasonable to expect that an application does not rely on + * the failure of such a fixed mapping request, so the restriction is not + * applied. + */ +bool mmap_address_hint_valid(unsigned long addr, unsigned long len) +{ + if (TASK_SIZE - len < addr) + return false; + + return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW); +} ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border 2017-11-15 14:36 ` Kirill A. Shutemov @ 2017-11-16 13:23 ` Michal Hocko -1 siblings, 0 replies; 16+ messages in thread From: Michal Hocko @ 2017-11-16 13:23 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel On Wed 15-11-17 17:36:06, Kirill A. Shutemov wrote: > In case of 5-level paging, the kernel does not place any mapping above > 47-bit, unless userspace explicitly asks for it. > > Userspace can request an allocation from the full address space by > specifying the mmap address hint above 47-bit. > > Nicholas noticed that the current implementation violates this interface: > > If user space requests a mapping at the end of the 47-bit address space > with a length which causes the mapping to cross the 47-bit border > (DEFAULT_MAP_WINDOW), then the vma is partially in the address space > below and above. > > Sanity check the mmap address hint so that start and end of the resulting > vma are on the same side of the 47-bit border. If that's not the case fall > back to the code path which ignores the address hint and allocate from the > regular address space below 47-bit. > > [ tglx: Moved the address check to a function and massaged comment and > changelog ] > > Reported-by: Nicholas Piggin <npiggin@gmail.com> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> FWIW Acked-by: Michal Hocko <mhocko@suse.com> > --- > arch/x86/include/asm/elf.h | 1 + > arch/x86/kernel/sys_x86_64.c | 10 +++++++--- > arch/x86/mm/hugetlbpage.c | 11 ++++++++--- > arch/x86/mm/mmap.c | 46 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 62 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index 3a091cea36c5..0d157d2a1e2a 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void) > extern unsigned long task_size_32bit(void); > extern unsigned long task_size_64bit(int full_addr_space); > extern unsigned long get_mmap_base(int is_legacy); > +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len); > > #ifdef CONFIG_X86_32 > > diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c > index a63fe77b3217..676774b9bb8d 100644 > --- a/arch/x86/kernel/sys_x86_64.c > +++ b/arch/x86/kernel/sys_x86_64.c > @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > if (len > TASK_SIZE) > return -ENOMEM; > > + /* No address checking. See comment at mmap_address_hint_valid() */ > if (flags & MAP_FIXED) > return addr; > > @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > > /* requesting a specific address */ > if (addr) { > - addr = PAGE_ALIGN(addr); > + addr &= PAGE_MASK; > + if (!mmap_address_hint_valid(addr, len)) > + goto get_unmapped_area; > + > vma = find_vma(mm, addr); > - if (TASK_SIZE - len >= addr && > - (!vma || addr + len <= vm_start_gap(vma))) > + if (!vma || addr + len <= vm_start_gap(vma)) > return addr; > } > +get_unmapped_area: > > info.flags = VM_UNMAPPED_AREA_TOPDOWN; > info.length = len; > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index 8ae0000cbdb3..00b296617ca4 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > if (len > TASK_SIZE) > return -ENOMEM; > > + /* No address checking. See comment at mmap_address_hint_valid() */ > if (flags & MAP_FIXED) { > if (prepare_hugepage_range(file, addr, len)) > return -EINVAL; > @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > } > > if (addr) { > - addr = ALIGN(addr, huge_page_size(h)); > + addr &= huge_page_mask(h); > + if (!mmap_address_hint_valid(addr, len)) > + goto get_unmapped_area; > + > vma = find_vma(mm, addr); > - if (TASK_SIZE - len >= addr && > - (!vma || addr + len <= vm_start_gap(vma))) > + if (!vma || addr + len <= vm_start_gap(vma)) > return addr; > } > + > +get_unmapped_area: > if (mm->get_unmapped_area == arch_get_unmapped_area) > return hugetlb_get_unmapped_area_bottomup(file, addr, len, > pgoff, flags); > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index a99679826846..62285fe77b0f 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma) > return "[mpx]"; > return NULL; > } > + > +/** > + * mmap_address_hint_valid - Validate the address hint of mmap > + * @addr: Address hint > + * @len: Mapping length > + * > + * Check whether @addr and @addr + @len result in a valid mapping. > + * > + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE. > + * > + * On 64bit with 5-level page tables another sanity check is required > + * because mappings requested by mmap(@addr, 0) which cross the 47-bit > + * virtual address boundary can cause the following theoretical issue: > + * > + * An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr > + * is below the border of the 47-bit address space and @addr + @len is > + * above the border. > + * > + * With 4-level paging this request succeeds, but the resulting mapping > + * address will always be within the 47-bit virtual address space, because > + * the hint address does not result in a valid mapping and is > + * ignored. Hence applications which are not prepared to handle virtual > + * addresses above 47-bit work correctly. > + * > + * With 5-level paging this request would be granted and result in a > + * mapping which crosses the border of the 47-bit virtual address > + * space. If the application cannot handle addresses above 47-bit this > + * will lead to misbehaviour and hard to diagnose failures. > + * > + * Therefore ignore address hints which would result in a mapping crossing > + * the 47-bit virtual address boundary. > + * > + * Note, that in the same scenario with MAP_FIXED the behaviour is > + * different. The request with @addr < 47-bit and @addr + @len > 47-bit > + * fails on a 4-level paging machine but succeeds on a 5-level paging > + * machine. It is reasonable to expect that an application does not rely on > + * the failure of such a fixed mapping request, so the restriction is not > + * applied. > + */ > +bool mmap_address_hint_valid(unsigned long addr, unsigned long len) > +{ > + if (TASK_SIZE - len < addr) > + return false; > + > + return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW); > +} > -- > 2.15.0 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border @ 2017-11-16 13:23 ` Michal Hocko 0 siblings, 0 replies; 16+ messages in thread From: Michal Hocko @ 2017-11-16 13:23 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Ingo Molnar, x86, Thomas Gleixner, H. Peter Anvin, Linus Torvalds, Andy Lutomirski, Nicholas Piggin, linux-mm, linux-kernel On Wed 15-11-17 17:36:06, Kirill A. Shutemov wrote: > In case of 5-level paging, the kernel does not place any mapping above > 47-bit, unless userspace explicitly asks for it. > > Userspace can request an allocation from the full address space by > specifying the mmap address hint above 47-bit. > > Nicholas noticed that the current implementation violates this interface: > > If user space requests a mapping at the end of the 47-bit address space > with a length which causes the mapping to cross the 47-bit border > (DEFAULT_MAP_WINDOW), then the vma is partially in the address space > below and above. > > Sanity check the mmap address hint so that start and end of the resulting > vma are on the same side of the 47-bit border. If that's not the case fall > back to the code path which ignores the address hint and allocate from the > regular address space below 47-bit. > > [ tglx: Moved the address check to a function and massaged comment and > changelog ] > > Reported-by: Nicholas Piggin <npiggin@gmail.com> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> FWIW Acked-by: Michal Hocko <mhocko@suse.com> > --- > arch/x86/include/asm/elf.h | 1 + > arch/x86/kernel/sys_x86_64.c | 10 +++++++--- > arch/x86/mm/hugetlbpage.c | 11 ++++++++--- > arch/x86/mm/mmap.c | 46 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 62 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index 3a091cea36c5..0d157d2a1e2a 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -309,6 +309,7 @@ static inline int mmap_is_ia32(void) > extern unsigned long task_size_32bit(void); > extern unsigned long task_size_64bit(int full_addr_space); > extern unsigned long get_mmap_base(int is_legacy); > +extern bool mmap_address_hint_valid(unsigned long addr, unsigned long len); > > #ifdef CONFIG_X86_32 > > diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c > index a63fe77b3217..676774b9bb8d 100644 > --- a/arch/x86/kernel/sys_x86_64.c > +++ b/arch/x86/kernel/sys_x86_64.c > @@ -188,6 +188,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > if (len > TASK_SIZE) > return -ENOMEM; > > + /* No address checking. See comment at mmap_address_hint_valid() */ > if (flags & MAP_FIXED) > return addr; > > @@ -197,12 +198,15 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > > /* requesting a specific address */ > if (addr) { > - addr = PAGE_ALIGN(addr); > + addr &= PAGE_MASK; > + if (!mmap_address_hint_valid(addr, len)) > + goto get_unmapped_area; > + > vma = find_vma(mm, addr); > - if (TASK_SIZE - len >= addr && > - (!vma || addr + len <= vm_start_gap(vma))) > + if (!vma || addr + len <= vm_start_gap(vma)) > return addr; > } > +get_unmapped_area: > > info.flags = VM_UNMAPPED_AREA_TOPDOWN; > info.length = len; > diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c > index 8ae0000cbdb3..00b296617ca4 100644 > --- a/arch/x86/mm/hugetlbpage.c > +++ b/arch/x86/mm/hugetlbpage.c > @@ -158,6 +158,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > if (len > TASK_SIZE) > return -ENOMEM; > > + /* No address checking. See comment at mmap_address_hint_valid() */ > if (flags & MAP_FIXED) { > if (prepare_hugepage_range(file, addr, len)) > return -EINVAL; > @@ -165,12 +166,16 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > } > > if (addr) { > - addr = ALIGN(addr, huge_page_size(h)); > + addr &= huge_page_mask(h); > + if (!mmap_address_hint_valid(addr, len)) > + goto get_unmapped_area; > + > vma = find_vma(mm, addr); > - if (TASK_SIZE - len >= addr && > - (!vma || addr + len <= vm_start_gap(vma))) > + if (!vma || addr + len <= vm_start_gap(vma)) > return addr; > } > + > +get_unmapped_area: > if (mm->get_unmapped_area == arch_get_unmapped_area) > return hugetlb_get_unmapped_area_bottomup(file, addr, len, > pgoff, flags); > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index a99679826846..62285fe77b0f 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -174,3 +174,49 @@ const char *arch_vma_name(struct vm_area_struct *vma) > return "[mpx]"; > return NULL; > } > + > +/** > + * mmap_address_hint_valid - Validate the address hint of mmap > + * @addr: Address hint > + * @len: Mapping length > + * > + * Check whether @addr and @addr + @len result in a valid mapping. > + * > + * On 32bit this only checks whether @addr + @len is <= TASK_SIZE. > + * > + * On 64bit with 5-level page tables another sanity check is required > + * because mappings requested by mmap(@addr, 0) which cross the 47-bit > + * virtual address boundary can cause the following theoretical issue: > + * > + * An application calls mmap(addr, 0), i.e. without MAP_FIXED, where @addr > + * is below the border of the 47-bit address space and @addr + @len is > + * above the border. > + * > + * With 4-level paging this request succeeds, but the resulting mapping > + * address will always be within the 47-bit virtual address space, because > + * the hint address does not result in a valid mapping and is > + * ignored. Hence applications which are not prepared to handle virtual > + * addresses above 47-bit work correctly. > + * > + * With 5-level paging this request would be granted and result in a > + * mapping which crosses the border of the 47-bit virtual address > + * space. If the application cannot handle addresses above 47-bit this > + * will lead to misbehaviour and hard to diagnose failures. > + * > + * Therefore ignore address hints which would result in a mapping crossing > + * the 47-bit virtual address boundary. > + * > + * Note, that in the same scenario with MAP_FIXED the behaviour is > + * different. The request with @addr < 47-bit and @addr + @len > 47-bit > + * fails on a 4-level paging machine but succeeds on a 5-level paging > + * machine. It is reasonable to expect that an application does not rely on > + * the failure of such a fixed mapping request, so the restriction is not > + * applied. > + */ > +bool mmap_address_hint_valid(unsigned long addr, unsigned long len) > +{ > + if (TASK_SIZE - len < addr) > + return false; > + > + return (addr > DEFAULT_MAP_WINDOW) == (addr + len > DEFAULT_MAP_WINDOW); > +} > -- > 2.15.0 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-11-22 12:40 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-15 14:36 [PATCHv3 1/2] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border Kirill A. Shutemov 2017-11-15 14:36 ` Kirill A. Shutemov 2017-11-15 14:36 ` [PATCHv3 2/2] x86/selftests: Add test for mapping placement for 5-level paging Kirill A. Shutemov 2017-11-15 14:36 ` Kirill A. Shutemov 2017-11-16 10:46 ` [tip:x86/urgent] " tip-bot for Kirill A. Shutemov 2017-11-22 5:41 ` [PATCHv3 2/2] " Aneesh Kumar K.V 2017-11-22 5:41 ` Aneesh Kumar K.V 2017-11-22 8:11 ` Kirill A. Shutemov 2017-11-22 8:11 ` Kirill A. Shutemov 2017-11-22 11:36 ` Aneesh Kumar K.V 2017-11-22 11:36 ` Aneesh Kumar K.V 2017-11-22 12:40 ` Kirill A. Shutemov 2017-11-22 12:40 ` Kirill A. Shutemov 2017-11-16 10:46 ` [tip:x86/urgent] x86/mm: Prevent non-MAP_FIXED mapping across DEFAULT_MAP_WINDOW border tip-bot for Kirill A. Shutemov 2017-11-16 13:23 ` [PATCHv3 1/2] " Michal Hocko 2017-11-16 13:23 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.