linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix compatible mmap() return pointer over 4Gb
@ 2017-01-11 18:17 Dmitry Safonov
  2017-01-11 18:17 ` [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall Dmitry Safonov
  2017-01-11 18:17 ` [PATCH 2/2] selftests/x86: add test to check compat mmap() return addr Dmitry Safonov
  0 siblings, 2 replies; 8+ messages in thread
From: Dmitry Safonov @ 2017-01-11 18:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86

A fix for bug in mmap() that I referenced in [1].
Also selftest for it.

I would like to mark the fix as for stable v4.9 kernel if it'll
be accepted, as I try to support compatible 32-bit C/R
after v4.9 and working compatible mmap() is really wanted there.

[1]: https://marc.info/?l=linux-kernel&m=148311451525315

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org

Dmitry Safonov (2):
  x86/mm: don't mmap() over 4GB with compat syscall
  selftests/x86: add test to check compat mmap() return addr

 arch/x86/kernel/sys_x86_64.c                   |  37 ++++-
 tools/testing/selftests/x86/Makefile           |   2 +-
 tools/testing/selftests/x86/test_compat_mmap.c | 200 +++++++++++++++++++++++++
 3 files changed, 230 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
  2017-01-11 18:17 [PATCH 0/2] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
@ 2017-01-11 18:17 ` Dmitry Safonov
  2017-01-11 22:26   ` Andy Lutomirski
  2017-01-12 11:51   ` kbuild test robot
  2017-01-11 18:17 ` [PATCH 2/2] selftests/x86: add test to check compat mmap() return addr Dmitry Safonov
  1 sibling, 2 replies; 8+ messages in thread
From: Dmitry Safonov @ 2017-01-11 18:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86

During fixing CRIU bugs on ZDTM tests for 32-bit C/R, I found that
compatible ia32/x32 syscalls mmap() and mmap2() can return address
over 4Gb in x86_64 applications, which results in returning lower
4 bytes of address while dropping the higher bytes.
It happens because mmap() upper limit doesn't differ native/compat
syscalls for 64-bit task, it's: (TASK_UNMAPPED_BASE + random_factor)
which is: (PAGE_ALIGN(TASK_SIZE / 3)) + random_factor
(in case of legacy mmap it's just TASK_SIZE).
This patch limits higher address that can be mmaped with compat
syscalls in 64-bit applications with IA32_PAGE_OFFSET (+randomization).

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/kernel/sys_x86_64.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index a55ed63b9f91..0893725db6e6 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 static void find_start_end(unsigned long flags, unsigned long *begin,
 			   unsigned long *end)
 {
-	if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
+	if (!test_thread_flag(TIF_ADDR32)) {
 		/* This is usually used needed to map code in small
 		   model, so it needs to be in the first 31bit. Limit
 		   it to that.  This means we need to move the
@@ -109,14 +109,24 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
 		   malloc knows how to fall back to mmap. Give it 1GB
 		   of playground for now. -AK */
 		*begin = 0x40000000;
-		*end = 0x80000000;
-		if (current->flags & PF_RANDOMIZE) {
-			*begin = randomize_page(*begin, 0x02000000);
+
+		if (flags & MAP_32BIT) {
+			if (current->flags & PF_RANDOMIZE)
+				*begin = randomize_page(*begin, 0x02000000);
+			*end = 0x80000000;
+			return;
+		}
+		if (current->thread.status & TS_COMPAT) {
+			if (current->flags & PF_RANDOMIZE)
+				*begin = randomize_page(*begin,
+					1UL << mmap_rnd_compat_bits);
+			*end = IA32_PAGE_OFFSET;
+			return;
 		}
-	} else {
-		*begin = current->mm->mmap_legacy_base;
-		*end = TASK_SIZE;
 	}
+
+	*begin = current->mm->mmap_legacy_base;
+	*end = TASK_SIZE;
 }
 
 unsigned long
@@ -187,10 +197,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 			return addr;
 	}
 
+	if (current->thread.status & TS_COMPAT) {
+		if (current->flags & PF_RANDOMIZE) {
+			unsigned long rnd = 1UL << mmap_rnd_compat_bits;
+
+			info.high_limit =
+				randomize_page(IA32_PAGE_OFFSET - rnd, rnd);
+		} else {
+			info.high_limit = IA32_PAGE_OFFSET;
+		}
+	} else {
+		info.high_limit = mm->mmap_base;
+	}
 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
 	info.length = len;
 	info.low_limit = PAGE_SIZE;
-	info.high_limit = mm->mmap_base;
 	info.align_mask = 0;
 	info.align_offset = pgoff << PAGE_SHIFT;
 	if (filp) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] selftests/x86: add test to check compat mmap() return addr
  2017-01-11 18:17 [PATCH 0/2] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
  2017-01-11 18:17 ` [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall Dmitry Safonov
@ 2017-01-11 18:17 ` Dmitry Safonov
  1 sibling, 0 replies; 8+ messages in thread
From: Dmitry Safonov @ 2017-01-11 18:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: 0x7f454c46, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, x86

We can't just add segfault handler and use addr, returned by compat
mmap() syscall, because the lower 4 bytes can be the same as already
existed VMA. So, the test parses /proc/self/maps file, founds new
VMAs those appeared after compatible sys_mmap() and checks if mmaped
VMA is in that list.

On failure it prints:
[NOTE]	Allocated mmap 0x6f36a000, sized 0x400000
[NOTE]	New mapping appeared: 0x7f936f36a000
[FAIL]	Found VMA [0x7f936f36a000, 0x7f936f76a000] in maps file, that was allocated with compat syscall

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 tools/testing/selftests/x86/Makefile           |   2 +-
 tools/testing/selftests/x86/test_compat_mmap.c | 200 +++++++++++++++++++++++++
 2 files changed, 201 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/test_compat_mmap.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 8c1cb423cfe6..9c3e746a6064 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -10,7 +10,7 @@ TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_sysc
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			vdso_restorer
-TARGETS_C_64BIT_ONLY := fsgsbase
+TARGETS_C_64BIT_ONLY := fsgsbase test_compat_mmap
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
diff --git a/tools/testing/selftests/x86/test_compat_mmap.c b/tools/testing/selftests/x86/test_compat_mmap.c
new file mode 100644
index 000000000000..4736695afea0
--- /dev/null
+++ b/tools/testing/selftests/x86/test_compat_mmap.c
@@ -0,0 +1,200 @@
+/*
+ * Check that compat 32-bit mmap() returns address < 4Gb on 64-bit.
+ *
+ * Copyright (c) 2017 Dmitry Safonov (Virtuozzo)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#include <sys/mman.h>
+#include <sys/types.h>
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <signal.h>
+#include <stdlib.h>
+
+#define PAGE_SIZE 4096
+#define MMAP_SIZE (PAGE_SIZE*1024)
+#define MAX_VMAS 50
+#define BUF_SIZE 1024
+
+#ifndef __NR32_mmap2
+#define __NR32_mmap2 192
+#endif
+
+struct syscall_args32 {
+	uint32_t nr, arg0, arg1, arg2, arg3, arg4, arg5;
+};
+
+static void do_full_int80(struct syscall_args32 *args)
+{
+	asm volatile ("int $0x80"
+		      : "+a" (args->nr),
+			"+b" (args->arg0), "+c" (args->arg1), "+d" (args->arg2),
+			"+S" (args->arg3), "+D" (args->arg4),
+			"+rbp" (args->arg5)
+			: : "r8", "r9", "r10", "r11");
+}
+
+void *mmap2(void *addr, size_t len, int prot, int flags,
+	int fildes, off_t off)
+{
+	struct syscall_args32 s;
+
+	s.nr	= __NR32_mmap2;
+	s.arg0	= (uint32_t)(uintptr_t)addr;
+	s.arg1	= (uint32_t)len;
+	s.arg2	= prot;
+	s.arg3	= flags;
+	s.arg4	= fildes;
+	s.arg5	= (uint32_t)off;
+
+	do_full_int80(&s);
+
+	return (void *)(uintptr_t)s.nr;
+}
+
+struct vm_area {
+	unsigned long start;
+	unsigned long end;
+};
+
+static struct vm_area vmas_before_mmap[MAX_VMAS];
+static struct vm_area vmas_after_mmap[MAX_VMAS];
+
+static char buf[BUF_SIZE];
+
+int parse_maps(struct vm_area *vmas)
+{
+	FILE *maps;
+	int i;
+
+	maps = fopen("/proc/self/maps", "r");
+	if (maps == NULL) {
+		printf("[ERROR]\tFailed to open maps file: %m\n");
+		return -1;
+	}
+
+	for (i = 0; i < MAX_VMAS; i++) {
+		struct vm_area *v = &vmas[i];
+		char *end;
+
+		if (fgets(buf, BUF_SIZE, maps) == NULL)
+			break;
+
+		v->start = strtoul(buf, &end, 16);
+		v->end = strtoul(end + 1, NULL, 16);
+	}
+
+	if (i == MAX_VMAS) {
+		printf("[ERROR]\tNumber of VMAs is bigger than reserved array's size\n");
+		return -1;
+	}
+
+	if (fclose(maps)) {
+		printf("[ERROR]\tFailed to close maps file: %m\n");
+		return -1;
+	}
+	return 0;
+}
+
+int compare_vmas(struct vm_area *vmax, struct vm_area *vmay)
+{
+	if (vmax->start > vmay->start)
+		return 1;
+	if (vmax->start < vmay->start)
+		return -1;
+	if (vmax->end > vmay->end)
+		return 1;
+	if (vmax->end < vmay->end)
+		return -1;
+	return 0;
+}
+
+unsigned long vma_size(struct vm_area *v)
+{
+	return v->end - v->start;
+}
+
+int find_new_vma_like(struct vm_area *vma)
+{
+	int i, j = 0, found_alike = -1;
+
+	for (i = 0; i < MAX_VMAS && j < MAX_VMAS; i++, j++) {
+		int cmp = compare_vmas(&vmas_before_mmap[i],
+				&vmas_after_mmap[j]);
+
+		if (cmp == 0)
+			continue;
+		if (cmp < 0) /* Lost mapping */
+			j--;
+
+		printf("[NOTE]\tNew mapping appeared: %#lx\n",
+				vmas_after_mmap[j].start);
+		i--;
+		if (!compare_vmas(&vmas_after_mmap[j], vma))
+			return 0;
+
+		if (((vmas_after_mmap[j].start & 0xffffffff) == vma->start) &&
+				(vma_size(&vmas_after_mmap[j]) == vma_size(vma)))
+			found_alike = j;
+	}
+
+	/* Left new vmas in tail */
+	for (; i < MAX_VMAS; i++)
+		if (!compare_vmas(&vmas_after_mmap[j], vma))
+			return 0;
+
+	if (found_alike != -1) {
+		printf("[FAIL]\tFound VMA [%#lx, %#lx] in maps file, that was allocated with compat syscall\n",
+			vmas_after_mmap[found_alike].start,
+			vmas_after_mmap[found_alike].end);
+		return -1;
+	}
+
+	printf("[ERROR]\tCan't find [%#lx, %#lx] in maps file\n",
+		vma->start, vma->end);
+	return -1;
+}
+
+int main(int argc, char **argv)
+{
+	void *map;
+	struct vm_area vma;
+
+	if (parse_maps(vmas_before_mmap)) {
+		printf("[ERROR]\tFailed to parse maps file\n");
+		return 1;
+	}
+
+	map = mmap2(0, MMAP_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC,
+			MAP_PRIVATE | MAP_ANON, -1, 0);
+	if (map == MAP_FAILED)
+		printf("[WARN]\tmmap2 failed\n");
+	else
+		printf("[NOTE]\tAllocated mmap %p, sized %#x\n", map, MMAP_SIZE);
+
+	if (parse_maps(vmas_after_mmap)) {
+		printf("[ERROR]\tFailed to parse maps file\n");
+		return 1;
+	}
+
+	munmap(map, MMAP_SIZE);
+
+	vma.start = (unsigned long)(uintptr_t)map;
+	vma.end = vma.start + MMAP_SIZE;
+	if (find_new_vma_like(&vma))
+		return 1;
+
+	printf("[OK]\n");
+
+	return 0;
+}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
  2017-01-11 18:17 ` [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall Dmitry Safonov
@ 2017-01-11 22:26   ` Andy Lutomirski
  2017-01-12  9:46     ` Dmitry Safonov
                       ` (2 more replies)
  2017-01-12 11:51   ` kbuild test robot
  1 sibling, 3 replies; 8+ messages in thread
From: Andy Lutomirski @ 2017-01-11 22:26 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML

On Wed, Jan 11, 2017 at 10:17 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> During fixing CRIU bugs on ZDTM tests for 32-bit C/R, I found that
> compatible ia32/x32 syscalls mmap() and mmap2() can return address
> over 4Gb in x86_64 applications, which results in returning lower
> 4 bytes of address while dropping the higher bytes.
> It happens because mmap() upper limit doesn't differ native/compat
> syscalls for 64-bit task, it's: (TASK_UNMAPPED_BASE + random_factor)
> which is: (PAGE_ALIGN(TASK_SIZE / 3)) + random_factor
> (in case of legacy mmap it's just TASK_SIZE).
> This patch limits higher address that can be mmaped with compat
> syscalls in 64-bit applications with IA32_PAGE_OFFSET (+randomization).
>
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
> ---
>  arch/x86/kernel/sys_x86_64.c | 37 +++++++++++++++++++++++++++++--------
>  1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index a55ed63b9f91..0893725db6e6 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>  static void find_start_end(unsigned long flags, unsigned long *begin,
>                            unsigned long *end)
>  {
> -       if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
> +       if (!test_thread_flag(TIF_ADDR32)) {
>                 /* This is usually used needed to map code in small
>                    model, so it needs to be in the first 31bit. Limit
>                    it to that.  This means we need to move the
> @@ -109,14 +109,24 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>                    malloc knows how to fall back to mmap. Give it 1GB
>                    of playground for now. -AK */
>                 *begin = 0x40000000;
> -               *end = 0x80000000;
> -               if (current->flags & PF_RANDOMIZE) {
> -                       *begin = randomize_page(*begin, 0x02000000);
> +
> +               if (flags & MAP_32BIT) {
> +                       if (current->flags & PF_RANDOMIZE)
> +                               *begin = randomize_page(*begin, 0x02000000);
> +                       *end = 0x80000000;
> +                       return;
> +               }
> +               if (current->thread.status & TS_COMPAT) {
> +                       if (current->flags & PF_RANDOMIZE)
> +                               *begin = randomize_page(*begin,
> +                                       1UL << mmap_rnd_compat_bits);
> +                       *end = IA32_PAGE_OFFSET;
> +                       return;
>                 }
> -       } else {
> -               *begin = current->mm->mmap_legacy_base;
> -               *end = TASK_SIZE;
>         }
> +
> +       *begin = current->mm->mmap_legacy_base;
> +       *end = TASK_SIZE;
>  }
>
>  unsigned long
> @@ -187,10 +197,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>                         return addr;
>         }
>
> +       if (current->thread.status & TS_COMPAT) {

in_compat_syscall(), please.

Also, we need to verify that, if this is called execve(), it does the
right thing.

> +               if (current->flags & PF_RANDOMIZE) {
> +                       unsigned long rnd = 1UL << mmap_rnd_compat_bits;
> +
> +                       info.high_limit =
> +                               randomize_page(IA32_PAGE_OFFSET - rnd, rnd);
> +               } else {
> +                       info.high_limit = IA32_PAGE_OFFSET;
> +               }
> +       } else {
> +               info.high_limit = mm->mmap_base;
> +       }

This code was incomprehensible before and it's worse now.  Could you
try to clean it up a bit?  For example, a patch that simply folds
find_start_end() into its sole caller as the first patch in the series
without changing any semantics would probably help.

--Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
  2017-01-11 22:26   ` Andy Lutomirski
@ 2017-01-12  9:46     ` Dmitry Safonov
  2017-01-12 11:39     ` Dmitry Safonov
  2017-01-12 14:11     ` Dmitry Safonov
  2 siblings, 0 replies; 8+ messages in thread
From: Dmitry Safonov @ 2017-01-12  9:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML

On 01/12/2017 01:26 AM, Andy Lutomirski wrote:
> On Wed, Jan 11, 2017 at 10:17 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>> During fixing CRIU bugs on ZDTM tests for 32-bit C/R, I found that
>> compatible ia32/x32 syscalls mmap() and mmap2() can return address
>> over 4Gb in x86_64 applications, which results in returning lower
>> 4 bytes of address while dropping the higher bytes.
>> It happens because mmap() upper limit doesn't differ native/compat
>> syscalls for 64-bit task, it's: (TASK_UNMAPPED_BASE + random_factor)
>> which is: (PAGE_ALIGN(TASK_SIZE / 3)) + random_factor
>> (in case of legacy mmap it's just TASK_SIZE).
>> This patch limits higher address that can be mmaped with compat
>> syscalls in 64-bit applications with IA32_PAGE_OFFSET (+randomization).
>>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>> ---
>>  arch/x86/kernel/sys_x86_64.c | 37 +++++++++++++++++++++++++++++--------
>>  1 file changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
>> index a55ed63b9f91..0893725db6e6 100644
>> --- a/arch/x86/kernel/sys_x86_64.c
>> +++ b/arch/x86/kernel/sys_x86_64.c
>> @@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>>  static void find_start_end(unsigned long flags, unsigned long *begin,
>>                            unsigned long *end)
>>  {
>> -       if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
>> +       if (!test_thread_flag(TIF_ADDR32)) {
>>                 /* This is usually used needed to map code in small
>>                    model, so it needs to be in the first 31bit. Limit
>>                    it to that.  This means we need to move the
>> @@ -109,14 +109,24 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>>                    malloc knows how to fall back to mmap. Give it 1GB
>>                    of playground for now. -AK */
>>                 *begin = 0x40000000;
>> -               *end = 0x80000000;
>> -               if (current->flags & PF_RANDOMIZE) {
>> -                       *begin = randomize_page(*begin, 0x02000000);
>> +
>> +               if (flags & MAP_32BIT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin, 0x02000000);
>> +                       *end = 0x80000000;
>> +                       return;
>> +               }
>> +               if (current->thread.status & TS_COMPAT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin,
>> +                                       1UL << mmap_rnd_compat_bits);
>> +                       *end = IA32_PAGE_OFFSET;
>> +                       return;
>>                 }
>> -       } else {
>> -               *begin = current->mm->mmap_legacy_base;
>> -               *end = TASK_SIZE;
>>         }
>> +
>> +       *begin = current->mm->mmap_legacy_base;
>> +       *end = TASK_SIZE;
>>  }
>>
>>  unsigned long
>> @@ -187,10 +197,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>>                         return addr;
>>         }
>>
>> +       if (current->thread.status & TS_COMPAT) {
>
> in_compat_syscall(), please.

Indeed, forgot about the helper.

>
> Also, we need to verify that, if this is called execve(), it does the
> right thing.

Hmm, not sure I get it right.
A test for calling compat sys_execve() from and for 64-bit ELF?

>> +               if (current->flags & PF_RANDOMIZE) {
>> +                       unsigned long rnd = 1UL << mmap_rnd_compat_bits;
>> +
>> +                       info.high_limit =
>> +                               randomize_page(IA32_PAGE_OFFSET - rnd, rnd);
>> +               } else {
>> +                       info.high_limit = IA32_PAGE_OFFSET;
>> +               }
>> +       } else {
>> +               info.high_limit = mm->mmap_base;
>> +       }
>
> This code was incomprehensible before and it's worse now.  Could you
> try to clean it up a bit?  For example, a patch that simply folds
> find_start_end() into its sole caller as the first patch in the series
> without changing any semantics would probably help.

Well, yep, I also don't like how this code looks like.
That will need to add a parameter to find_start_end() whether
allocation is bottom-up or up-bottom.
I'll try to cleanup for v2.

>
> --Andy
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
  2017-01-11 22:26   ` Andy Lutomirski
  2017-01-12  9:46     ` Dmitry Safonov
@ 2017-01-12 11:39     ` Dmitry Safonov
  2017-01-12 14:11     ` Dmitry Safonov
  2 siblings, 0 replies; 8+ messages in thread
From: Dmitry Safonov @ 2017-01-12 11:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML

On 01/12/2017 01:26 AM, Andy Lutomirski wrote:
> On Wed, Jan 11, 2017 at 10:17 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>> During fixing CRIU bugs on ZDTM tests for 32-bit C/R, I found that
>> compatible ia32/x32 syscalls mmap() and mmap2() can return address
>> over 4Gb in x86_64 applications, which results in returning lower
>> 4 bytes of address while dropping the higher bytes.
>> It happens because mmap() upper limit doesn't differ native/compat
>> syscalls for 64-bit task, it's: (TASK_UNMAPPED_BASE + random_factor)
>> which is: (PAGE_ALIGN(TASK_SIZE / 3)) + random_factor
>> (in case of legacy mmap it's just TASK_SIZE).
>> This patch limits higher address that can be mmaped with compat
>> syscalls in 64-bit applications with IA32_PAGE_OFFSET (+randomization).
>>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>> ---
>>  arch/x86/kernel/sys_x86_64.c | 37 +++++++++++++++++++++++++++++--------
>>  1 file changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
>> index a55ed63b9f91..0893725db6e6 100644
>> --- a/arch/x86/kernel/sys_x86_64.c
>> +++ b/arch/x86/kernel/sys_x86_64.c
>> @@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>>  static void find_start_end(unsigned long flags, unsigned long *begin,
>>                            unsigned long *end)
>>  {
>> -       if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
>> +       if (!test_thread_flag(TIF_ADDR32)) {
>>                 /* This is usually used needed to map code in small
>>                    model, so it needs to be in the first 31bit. Limit
>>                    it to that.  This means we need to move the
>> @@ -109,14 +109,24 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>>                    malloc knows how to fall back to mmap. Give it 1GB
>>                    of playground for now. -AK */
>>                 *begin = 0x40000000;
>> -               *end = 0x80000000;
>> -               if (current->flags & PF_RANDOMIZE) {
>> -                       *begin = randomize_page(*begin, 0x02000000);
>> +
>> +               if (flags & MAP_32BIT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin, 0x02000000);
>> +                       *end = 0x80000000;
>> +                       return;
>> +               }
>> +               if (current->thread.status & TS_COMPAT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin,
>> +                                       1UL << mmap_rnd_compat_bits);
>> +                       *end = IA32_PAGE_OFFSET;
>> +                       return;
>>                 }
>> -       } else {
>> -               *begin = current->mm->mmap_legacy_base;
>> -               *end = TASK_SIZE;
>>         }
>> +
>> +       *begin = current->mm->mmap_legacy_base;
>> +       *end = TASK_SIZE;
>>  }
>>
>>  unsigned long
>> @@ -187,10 +197,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>>                         return addr;
>>         }
>>
>> +       if (current->thread.status & TS_COMPAT) {
>
> in_compat_syscall(), please.
>
> Also, we need to verify that, if this is called execve(), it does the
> right thing.

It definitely does not right things at this moment.
Will fix to v2.

>
>> +               if (current->flags & PF_RANDOMIZE) {
>> +                       unsigned long rnd = 1UL << mmap_rnd_compat_bits;
>> +
>> +                       info.high_limit =
>> +                               randomize_page(IA32_PAGE_OFFSET - rnd, rnd);
>> +               } else {
>> +                       info.high_limit = IA32_PAGE_OFFSET;
>> +               }
>> +       } else {
>> +               info.high_limit = mm->mmap_base;
>> +       }
>
> This code was incomprehensible before and it's worse now.  Could you
> try to clean it up a bit?  For example, a patch that simply folds
> find_start_end() into its sole caller as the first patch in the series
> without changing any semantics would probably help.
>
> --Andy
>


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
  2017-01-11 18:17 ` [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall Dmitry Safonov
  2017-01-11 22:26   ` Andy Lutomirski
@ 2017-01-12 11:51   ` kbuild test robot
  1 sibling, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2017-01-12 11:51 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: kbuild-all, linux-kernel, 0x7f454c46, Dmitry Safonov,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Borislav Petkov, x86

[-- Attachment #1: Type: text/plain, Size: 1771 bytes --]

Hi Dmitry,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.10-rc3 next-20170111]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dmitry-Safonov/Fix-compatible-mmap-return-pointer-over-4Gb/20170112-185732
config: x86_64-randconfig-x016-201702 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/kernel/sys_x86_64.c: In function 'find_start_end':
>> arch/x86/kernel/sys_x86_64.c:122:13: error: 'mmap_rnd_compat_bits' undeclared (first use in this function)
         1UL << mmap_rnd_compat_bits);
                ^~~~~~~~~~~~~~~~~~~~
   arch/x86/kernel/sys_x86_64.c:122:13: note: each undeclared identifier is reported only once for each function it appears in
   arch/x86/kernel/sys_x86_64.c: In function 'arch_get_unmapped_area_topdown':
   arch/x86/kernel/sys_x86_64.c:202:31: error: 'mmap_rnd_compat_bits' undeclared (first use in this function)
       unsigned long rnd = 1UL << mmap_rnd_compat_bits;
                                  ^~~~~~~~~~~~~~~~~~~~

vim +/mmap_rnd_compat_bits +122 arch/x86/kernel/sys_x86_64.c

   116				*end = 0x80000000;
   117				return;
   118			}
   119			if (current->thread.status & TS_COMPAT) {
   120				if (current->flags & PF_RANDOMIZE)
   121					*begin = randomize_page(*begin,
 > 122						1UL << mmap_rnd_compat_bits);
   123				*end = IA32_PAGE_OFFSET;
   124				return;
   125			}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29284 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall
  2017-01-11 22:26   ` Andy Lutomirski
  2017-01-12  9:46     ` Dmitry Safonov
  2017-01-12 11:39     ` Dmitry Safonov
@ 2017-01-12 14:11     ` Dmitry Safonov
  2 siblings, 0 replies; 8+ messages in thread
From: Dmitry Safonov @ 2017-01-12 14:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Dmitry Safonov, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Andy Lutomirski, Borislav Petkov, X86 ML

On 01/12/2017 01:26 AM, Andy Lutomirski wrote:
> On Wed, Jan 11, 2017 at 10:17 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>> During fixing CRIU bugs on ZDTM tests for 32-bit C/R, I found that
>> compatible ia32/x32 syscalls mmap() and mmap2() can return address
>> over 4Gb in x86_64 applications, which results in returning lower
>> 4 bytes of address while dropping the higher bytes.
>> It happens because mmap() upper limit doesn't differ native/compat
>> syscalls for 64-bit task, it's: (TASK_UNMAPPED_BASE + random_factor)
>> which is: (PAGE_ALIGN(TASK_SIZE / 3)) + random_factor
>> (in case of legacy mmap it's just TASK_SIZE).
>> This patch limits higher address that can be mmaped with compat
>> syscalls in 64-bit applications with IA32_PAGE_OFFSET (+randomization).
>>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>> ---
>>  arch/x86/kernel/sys_x86_64.c | 37 +++++++++++++++++++++++++++++--------
>>  1 file changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
>> index a55ed63b9f91..0893725db6e6 100644
>> --- a/arch/x86/kernel/sys_x86_64.c
>> +++ b/arch/x86/kernel/sys_x86_64.c
>> @@ -100,7 +100,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>>  static void find_start_end(unsigned long flags, unsigned long *begin,
>>                            unsigned long *end)
>>  {
>> -       if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT)) {
>> +       if (!test_thread_flag(TIF_ADDR32)) {
>>                 /* This is usually used needed to map code in small
>>                    model, so it needs to be in the first 31bit. Limit
>>                    it to that.  This means we need to move the
>> @@ -109,14 +109,24 @@ static void find_start_end(unsigned long flags, unsigned long *begin,
>>                    malloc knows how to fall back to mmap. Give it 1GB
>>                    of playground for now. -AK */
>>                 *begin = 0x40000000;
>> -               *end = 0x80000000;
>> -               if (current->flags & PF_RANDOMIZE) {
>> -                       *begin = randomize_page(*begin, 0x02000000);
>> +
>> +               if (flags & MAP_32BIT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin, 0x02000000);
>> +                       *end = 0x80000000;
>> +                       return;
>> +               }
>> +               if (current->thread.status & TS_COMPAT) {
>> +                       if (current->flags & PF_RANDOMIZE)
>> +                               *begin = randomize_page(*begin,
>> +                                       1UL << mmap_rnd_compat_bits);
>> +                       *end = IA32_PAGE_OFFSET;
>> +                       return;
>>                 }
>> -       } else {
>> -               *begin = current->mm->mmap_legacy_base;
>> -               *end = TASK_SIZE;
>>         }
>> +
>> +       *begin = current->mm->mmap_legacy_base;
>> +       *end = TASK_SIZE;
>>  }
>>
>>  unsigned long
>> @@ -187,10 +197,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>>                         return addr;
>>         }
>>
>> +       if (current->thread.status & TS_COMPAT) {
>
> in_compat_syscall(), please.
>
> Also, we need to verify that, if this is called execve(), it does the
> right thing.
>
>> +               if (current->flags & PF_RANDOMIZE) {
>> +                       unsigned long rnd = 1UL << mmap_rnd_compat_bits;
>> +
>> +                       info.high_limit =
>> +                               randomize_page(IA32_PAGE_OFFSET - rnd, rnd);
>> +               } else {
>> +                       info.high_limit = IA32_PAGE_OFFSET;
>> +               }
>> +       } else {
>> +               info.high_limit = mm->mmap_base;
>> +       }
>
> This code was incomprehensible before and it's worse now.  Could you
> try to clean it up a bit?  For example, a patch that simply folds
> find_start_end() into its sole caller as the first patch in the series
> without changing any semantics would probably help.

So, I debugged it a little more.
Here it is: we really should mmap here not just up to IA32_PAGE_OFFSET,
but also think about stack of compat applications. Need to have a gap
in the end of address space the same way it's counted in mmap_base().
And here is another bug (but less painful): for 32-bit applications,
that set 64-bit CS and do native x86_64 syscalls, it's not possible to
map over 4Gb (or around), 32-bit mmap_base.

I think there are two ways to fix these issues:
1. Introduce mmap_compat_base (and mmap_compat_legacy_base, I guess).
Initialize only one of mmap_{,compat}_base in arch_pick_mmap_layout()
and initialize the second only if 64-bit app does 32-bit mmap and
vice-versa. Use these two mmap bases accordingly.
2. Save random_factor in arch_pick_mmap_layout() and make mmap_base()
calculations during mmap() using compat or native task_size according
to in_compat_syscall() - the perf overhead in mmap_base() doesn't look
that great (but I may be wrong). Need also be careful to RLIMIT_STACK
and fall back to bottom-up allocations in case of infinity.

So, does (1) or (2) make sense, or maybe there is some simpler solution
to these bugs?

-- 
              Dmitry

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-12 14:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11 18:17 [PATCH 0/2] Fix compatible mmap() return pointer over 4Gb Dmitry Safonov
2017-01-11 18:17 ` [PATCH 1/2] x86/mm: don't mmap() over 4GB with compat syscall Dmitry Safonov
2017-01-11 22:26   ` Andy Lutomirski
2017-01-12  9:46     ` Dmitry Safonov
2017-01-12 11:39     ` Dmitry Safonov
2017-01-12 14:11     ` Dmitry Safonov
2017-01-12 11:51   ` kbuild test robot
2017-01-11 18:17 ` [PATCH 2/2] selftests/x86: add test to check compat mmap() return addr Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).